July 23, 2025

Present

Armstrong Foundjem, Ben Hawks, Geoffrey Fox, Gregg Barrett, Howard Pritchard, Jeyan Thiyagalingam, Jong Choi, Juri Papay, Kirsten Morehouse, Marco Colombo, Matt Sinclair, Murali Emani, Nhan Tran, Philip Haris, Piotr Luszczek, Satoshi Iwata, Shirley Moore, Victor Lu

Apologies

Gregor von Laszewski (information below added prior to meeting)

Presentation

Benjamin Hawks presented a wonderful talk in two parts on Fermilab and Gregor’s work
Recording MLC Science WG - 2025/07/23 07:59 PDT - Recording
Presentation Overview/Updates: Benchmarking Project Updates
Presentation Benchmark Platform Discussion: Scientific Benchmark Hosting Platform Evaluation

Google Meet Notes

MLC Science WG - 2025/07/23 07:59 PDT - Notes by Gemini
Ben Hawks provided updates on the benchmarking collection project with Nhan Tran, detailing the V0.1 benchmark definition and a proposed rating system for benchmarks. Victor Lu, Jeyan Thiyagalingam, Philip Harris, and Matt Sinclair discussed generalizing scientific tasks and different approaches to benchmark definitions. Ben Hawks demonstrated a prototype website for the benchmark catalog and evaluated backend platforms, concluding that Hugging Face, possibly combined with other platforms, seems to be a good starting point.

Update on Paper Benchmark Carpentry (Gregor)

Definition of benchmark in paper
New Abstract included
HPC section assigned to Piotr

Update on benchmark collection (Gregor)

RESULT:

“make” now creates the PDF and md and puts it in a public accessible dir on github
Results are shown at
https://mlcommons-science.github.io/benchmark/
Modifications to yaml files can be done at
https://github.com/mlcommons-science/benchmark/tree/main/source

COMMENT: Due to the state of program Gregor was forced to halt writing the paper and rewrite the program. See: https://github.com/mlcommons-science/benchmark/graphs/contributors

It is unclear why the functioning code broke that Gregor delivered 2 weeks ago and was modified by others. Maybe it was based on wrong ChatGPT search results and improvements. This is the second time this happened. So Gregor asked others to stop improving the code. The previous code required complete rewrite as bugs were too complex and numerous.
Gregor rewrote program entirely (lots of effort)
- md generator improvements assigned after rewriting to students.
- Gregor added radar charts
YAML files
Gregor removed many temporary yaml files as it was unclear which were valid or not. Now had as originally agreed upon:
- benchmarks.yaml
- benchmarks-addon.yaml
Gregor improved many yaml entries
Students got assigned to improve the yaml entries (in progress)
- such as integration and checkin urls (in progress)
- add ratings (in progress)
- apply a syntax checker based on Gregor's definition (in progress)
As yaml files still contained many errors, Gregor wrote checkers to yaml file as issues not detected by students
unicode detection
valid url detector
several other checks
Gregor wrote bibtex fetcher based on urls that allows to easily integrate the result into the yaml file