Skip to content

June 3, 2025 (9.05 pm. ET for Asia-USA)

June 3, 2025 (9.05 pm. ET for Asia-USA)

Present

Annajiat Alim Rasel, Douglas Fils, Geoffrey Fox, Gregor von Laszewski, Nhan Tran, Satoshi Iwata, Victor Lu,

New Member

Google Meet Notes

  • MLC Science WG - 2025/06/04 01:50 BST - Notes by Gemini
  • Summary: Geoffrey Fox, Nhan Tran, Gregor von Laszewski, Victor Lu, and Annajiat Alim Rasel discussed ML Commons benchmarks, the challenges of scientific computing benchmarks, and varying definitions of what constitutes a benchmark. The group explored the structure of benchmark repositories, the potential of OpenML as a backend, and the creation of a taxonomy for scientific ML benchmarks. Nhan Tran proposed a white paper on this work, and Geoffrey Fox suggested that David Caner and Yuri lead an effort to consolidate benchmark definitions based on input from the participants.

Discussion

  • The connection between the Carpentry paper, a potential breakoff paper on benchmarks and the current benchmark list was discussion
  • The importance of discussing definitions of benchmarks was made clear
  • Nhan brought up the web resource structure used by OpenML and Huggingface; these are instructive but confusing which was agreed to in the meeting. OpenML would not be a landing page but some of its features are useful.
  • Intersect360 webinar lessons (HPC different from an AI supercomputer [2504.16026] Trends in AI Supercomputers) Google Scholar and Arxiv discussed
  • Kanter’s interest in understanding the structure of MLPerf benchmarks, so one could explain how they span needs for procurements.
  • Science AI benchmarks are a subset of the full AI benchmark set
  • Nhan discussed CookieCutter https://github.com/drivendataorg/cookiecutter-data-science
  • Douglas Fils wondered if there has there been any discussion of using FAIR digital objects like RO-Crates for data connected with the metadata and other supporting documents?
  • Victor Lu noted that A relevant benchmark mirrors real-world use cases, data distributions, or performance constraints, so that benchmark results can meaningfully predict system behavior in practice