Skip to content

February 5, 2025

February 5, 2025

Present

Andrew Naylor, Andy Cheng, Armstrong Foundjem, Azza Ahmad, Ben Hawks, Briana Cervantes, Elizabeth Campolongo, Geoffrey Fox, Gregg Barrett, Gregor von Laszewski, Gyuri Papay, Hilmar Lapp, Hussain Ather, Karen Bennett, Marco Colombo, Marisa Ahmad, Matt Sinclair, Nhan Tran, Philip Harris, Piotr Luszczek, Victor Lu, Wes Brewer

Tentative Agenda

(Fast ML) scientific benchmarks and challenges

  • Special Presentation by Nhan V Tran of Fermilab on "(Fast ML) scientific benchmarks and challenges MLCommons Benchmarks
  • Recording is first talk in MLC Science WG - 2025/02/05 07:54 PST - Recording
  • This starts with FastML Inference benchmarks for HEP covering different interesting regions in latency-data-rate space 230322 mlcommons science.pdf [2207.07958] FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning Implementations of the fastml-science bechmark models, including a standard Keras (float) and QKeras (quantized) implementations. as discussed in FastML Presentation March 22, 2023
  • Browse the State-of-the-Art in Machine Learning | Papers With Code (12,423 benchmarks 5,358 tasks 154,766 papers with code) is impressive
  • On a smaller scale the particle physics Living Review HEP ML Living Review with a good taxonomy is another example
  • Codabench 280 Public Competitions, 16456 Users, 98071 Submissions is a benchmark site with a focus on relatively small problems. It has a leaderboard and a catalog of problems. Attractive exemplar user interface
  • Need to classify in two different dimensions – science domain and techniques; each technique would have similarities even if applied across domains. See slide 12
  • This taxonomy would have a tree structure
  • Nhan described DOE studies on transformational science enabled by AI. and DOE FASST Frontiers in Artificial Intelligence for Science, Security and Technology (FASST) | Department of Energy project
  • Geoffrey noted that this talk implicitly noted that Benchmarks can serve as measures of (compute or science) performance but they also have educational value either to illustrate basic AI/system ideas or to serve as an example on which to base new AI models
  • This is discussed in Bench24-December4-6-2024 for Time series with slides 36-38 discussing domains where present and characteristics of time series and their models
  • The value of patterns for a given technology type like time series, makes this clear as there are thousands of scientific fields and an example in one field can be modified for lots of other fields
  • Educational and Performance goals both need a taxonomy of application types so we can build performance/educational benchmarks that illustrate all the major types
  • There are separate taxonomies of applications and models and one should encourage different models to explore each application class
  • We want benchmarks to test both taxonomies
  • We can’t wait for DOE and NSF; need to move faster
  • Synergy with our white papers noted
  • Matt Sinclair noted this was a very interesting talk Nhan! Minor question: how would this list/website differ from the MLCommons Science GitHub? At some level, isn't that already a list of "good" Science benchmarks?
  • Science collection is deliberately small so far. These new plans motivate expanding it.
  • Gregg Barrett asked where can we find/put the list in order to suggest possible candidates? I think if the DOE and NSF could agree on us as a single source, that would already be useful and would perhaps give critical mass to it being a defacto list.
  • Victor Lu noted that it will be great to start from what Nian just presented and the benchmark carpentry paper effort to analyze what made benchmarks effective for science computing.
  • MLCommons is a great place to do this collaboration. Similar collaboration on AI safety: https://b2.dsri.org/pdf/B2-technical-outline.pdf

NSF HDR ML Challenge

HPC and Science Working Groups Merger

  • This was briefly discussed and Marisa will contact MLCommons support and work with Gregor on the Science + HPC working group GitHubs.
  • We discussed timing (Wedenesday, 11 am Eastern is not possible for everybody) and frequency; once a week or once a fortnight?
  • Geoffrey asked for members to send him input on this.
  • Victor noted the MLCommons Discord Server and Nhan joined this.
  • MLCommons channel at https://discord.com/

White Papers

  • Gregor worked on paper and all but two issues which require Christines input are fixed. He offered to help in an online meeting with Christine. However she is out of town till the 13th of February?
  • Gregor did not do much work on the other paper in written form

Any Other Business

  • Gregor attended OLCF meeting on frontiers power and energy consumption measurements, which was excellent. It was not a tutorial, but just a list of which tools are available on Frontier to get started. It also got into the details of some limitations of the AMD software to do that and that they work on fixing things. So this will be a moving target.
  • https://vimeo.com/1051746597
  • https://www.olcf.ornl.gov/wp-content/uploads/energy_consumption_2025.pdf
  • Although most of the tools are known, I learned more about ROCm-SMI and ROCmSMI-lib and their planed improvements by AMD
  • This is actually important as it shows which tools we need to list in the paper. E.g. every tool should be listed in the paper with its own paragraph as part of the existing tools section. Also it reminded me that the existance of https://github.com/ROCm/omnitrace is probably not well known? At least I have not looked at it for a long time. We also have other tools that I have not used such as rocprof. The nice thing is that the presentation reminded me how to enable the profiling from slurm and where to find the log files.
  • So I highly recommend this presentation as a starting point.
  • There are more frontiers presentations at https://docs.olcf.ornl.gov/training/training_archive.html