June 3, 2025 (9.05 pm. ET for Asia-USA)

Annajiat Alim Rasel, Douglas Fils, Geoffrey Fox, Gregor von Laszewski, Nhan Tran, Satoshi Iwata, Victor Lu,

Annajiat Alim Rasel Annajiat Alim Rasel - Certified Instructor Trainer - - | The Carpentries LinkedIn is a senior lecturer in Computer Science and Engineering department at Brac University School of Data and Sciences, and in the Carpentries. He loves everything data and computing and listening to learn different view points. He has a lot of interests in ML and HPC..

MLC Science WG - 2025/06/04 01:50 BST - Notes by Gemini
Summary: Geoffrey Fox, Nhan Tran, Gregor von Laszewski, Victor Lu, and Annajiat Alim Rasel discussed ML Commons benchmarks, the challenges of scientific computing benchmarks, and varying definitions of what constitutes a benchmark. The group explored the structure of benchmark repositories, the potential of OpenML as a backend, and the creation of a taxonomy for scientific ML benchmarks. Nhan Tran proposed a white paper on this work, and Geoffrey Fox suggested that David Caner and Yuri lead an effort to consolidate benchmark definitions based on input from the participants.

The connection between the Carpentry paper, a potential breakoff paper on benchmarks and the current benchmark list was discussion
The importance of discussing definitions of benchmarks was made clear
Nhan brought up the web resource structure used by OpenML and Huggingface; these are instructive but confusing which was agreed to in the meeting. OpenML would not be a landing page but some of its features are useful.
Intersect360 webinar lessons (HPC different from an AI supercomputer [2504.16026] Trends in AI Supercomputers) Google Scholar and Arxiv discussed
Kanter’s interest in understanding the structure of MLPerf benchmarks, so one could explain how they span needs for procurements.
Science AI benchmarks are a subset of the full AI benchmark set
Nhan discussed CookieCutter https://github.com/drivendataorg/cookiecutter-data-science
Douglas Fils wondered if there has there been any discussion of using FAIR digital objects like RO-Crates for data connected with the metadata and other supporting documents?
Victor Lu noted that A relevant benchmark mirrors real-world use cases, data distributions, or performance constraints, so that benchmark results can meaningfully predict system behavior in practice