May 28, 2025
May 28, 2025
Present
Armstrong Foundjem, Christine Kirkpatrick, Gary Mazzaferro, Geoffrey Fox, Gregor von Laszewski, Howard Pritchard, Javier Toledo, Julian Samaroo, Juri Papay, Lee Sharma, Marco Colombo, Matt Sinclairi, Nhan Tran, Philip Harris, Piotr Luszczek, Satoshi Iwata, Shirley Moore, Tom Gibbs, Victor Lu,
Tentative Agenda
- Any New Members Introduction
- Quick report from the ASIA-US meeting May 20 9 pm
- Continuing discussion of New Benchmarks and the catalog of Science benchmarks based on https://docs.google.com/spreadsheets/d/1Ysk32dqkgdGfDW0rFaCpc8o1Cp6uhtJqbDFAIlhfb9o/edit?usp=sharing
- White Papers
- The Benchmark carpentry white paper https://www.overleaf.com/9828764221czxzxxcxmcrr#1f1c84
- Report from Victor Lu
- Any Other Business
Google Meet Notes
MLC Science WG - 2025/05/28 07:54 PDT - Notes by Gemini
Summary: We discussed AI note-taking concerns, the agenda which included the survey of AI benchmarks and the carpentry paper, and the progress of the carpentry paper focusing on scientific applications and benchmarks. Participants debated the definition of a scientific benchmark and explored creating a companion repository for a dynamic view of benchmarks, with potential collaboration with Nvidia being mentioned. The purpose and scope of the benchmark carpentry paper were discussed, leading to a suggestion to potentially create multiple papers, and Gregor von Laszewski encouraged contributions to the existing paper and to-do list.
New Members
- Johannes Blaschke Johannes Blaschke - HPC Workflow Performance Specialist (Computer Systems Engineer 4) - National Energy Research Scientific Computing Center (NERSC) | LinkedIn is a research professional with a Ph.D. in theoretical physics and 8 years experience working in higher education and at national laboratories. Worked on a broad range of projects in applied mathematics, and high-performance computing. Passionate about numerical methods, statistical physics, and inspiring good software development. He is a member of the Julia community.
- Julian Samaroo https://www.linkedin.com/in/julian-samaroo-9082587a/ is a research software engineer at MIT's JuliaLab and believes that code should be able to scale up from a laptop to a supercomputer, and back down to a smartphone, across CPUs and GPUs and whatever accelerator comes next. Too little code written today meets these goals, and it's my desire to change that.
- Tianhao Li from Duke University, https://www.linkedin.com/in/tianhaoli0x01/ with a strong research background in Trustworthy AI. After reviewing the mlcommons-benchmark-carpentry white paper, he ask if he could lead a subsection in Section III discussing the limitations of existing AI benchmarks, for example, data contamination https://arxiv.org/abs/2406.04244. Permission granted
Carpentry Paper Discussion
- Nhan introduced structure that Christine liked for introduction and Matt volunteered to help
- Scope/Challenge: Towards Democratizing ML Benchmarks *for Science*
- Unique challenges and opportunities for Science:
- Unique workloads
- Stakeholders (program managers, researchers, etc.)
- Grand Challenges?
- Carpentry:
- elements of a benchmark (datasets, code, metrics, constraints)
- existing templates and software conventions
- Existing benchmarks and their status w.r.t. our definitions of scientific ML benchmark
- The probable break up of current long paper into two was discussed
- Tom Gibbs noted simulation benchmarks were fixed but AI was rapidly changing.
- Usages were designing hardware, preparing NSF or DOE computer allocations
- Gary Mazzaferro noted that he divided benchmarks into two classifications
- Technical Benchmarks - determines products or services capabilities
- Competitive - compares how well (or poorly) an system meets a set of expectations
- Armstrong noted that there is this paper from the power WG: MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from µWatts to MWatts for Sustainable AI
- Philip Harris noted that there is a lot of the concern from scientists that Nvidia/company benchmarks are not the ideal ones for scientists, we are frequently finding the commerical benchmarks don't capture everybody's demands
- Christine Kirkpatrick noted that It would be very interesting to hear how those groups mentor/train people around using benchmarks - or if they just assume people have all the skills already. For the benchmarking carpentry aspect of our work.
- Also What Phil said also belongs in our intro IMHO
- Gregor discussed an Oak Ridge / AMD tutorial and Lee Sharma wondered it was related to A Community Roadmap for Scientific Workflows Research and Development with Oak Ridge involvement, but didn’t think that matches your description.
- Gregor von Laszewski will add to paper benchmarks change so fast and hardware changes fast
- Christine Kirkpatrick noted that she has a FAIR benchmarks section that she can add when we know which paper and where
- Nhan said great
Meeting with David Kanter May 21 after WG meeting
- Geoofrey and Juri met with David after the Science WG meeting. David is head of MLPerf.
- David discussed NREL using MLPerf benchmarks
- He was interested in discussion of application structure and how MLPerf benchmarks covered that structure.