June 25, 2025

Present

Amin Maleki, Armstrong Foundjem, Gary Mazzaferro, Geoffrey Fox,Gregg Barrett, Gregor von Laszewski, Javier Toledo, Jeyan Thiyagalingam, Juri Papay, Kirsten Morehouse, Khojasteh Mirza, Marco Colombo, Philip Haris, Piotr Luszczek, Satoshi Iwata, Victor Lu, Wenhui Zhang

Tentative Agenda

Any New Members Introduction
Presentation by Kirsten Morehouse, Harvard
Continuing discussion of New Benchmarks and the catalog of Science benchmarks based on https://docs.google.com/spreadsheets/d/1Ysk32dqkgdGfDW0rFaCpc8o1Cp6uhtJqbDFAIlhfb9o/edit?usp=sharing
White Papers
The Benchmark carpentry white paper https://www.overleaf.com/9828764221czxzxxcxmcrr#1f1c84
New white paper on Science Benchmarks
Any Other Business

Google Meet Notes

MLC Science WG - 2025/06/25 07:59 PDT - Notes by Gemini
Summary: Geoffrey Fox welcomed new member Khojasteh Mirza. Kirsten Morehouse presented their work on bias probing in LLMs, introducing the Eco Levels framework and discussing lessons from social sciences, leading to a discussion with gyuri papay and Jeyan T (Jeyan) about collaboration with LLM developers and bias measurement. Gregor von Laszewski presented progress on the benchmark repository development using YAML files and discussed benchmark specification and data sets, prompting a discussion with Geoffrey Fox and Victor Lu about reconciling benchmark lists, database AI benchmarks, and the Julia community's benchmarking approach.

New Members

Khojasteh Mirza https://www.linkedin.com/in/khojasteh-mirza/ is a Data Scientist building biologically inspired artificial neural networks. Currently, he is working as an Associate Researcher in the rapidly evolving field of Machine Learning and Computational Neuroscience at Mount Sinai in NYC. He is passionate about applying AI and data science to solve real-world problems, especially in the healthcare domain. He has a Master's degree in Electrical and Computer Engineering from Cornell University.

Presentation by Kirsten Morehouse

Kirsten’s excellent presentation Morehouse_MLCommons_062425.pptx was followed by questions from Juri and Jeyan. See Gemini notes MLC Science WG - 2025/06/25 07:59 PDT - Notes by Gemini for more details.
Her paper is Position: Rethinking LLM Bias Probing Using Lessons from the Social Sciences
Gregg noted that he found that the issue is often a disagreement on the definition of bias.

Benchmark Collections

Gregor summarized results of his work with Fermilab interns working with Nhan Tran
https://github.com/mlcommons-science/benchmark/tree/main
https://github.com/mlcommons-science/benchmark/blob/main/content/tex/benchmarks.pdf
https://github.com/mlcommons-science/benchmark/blob/main/content/md/index.md
Gregg Barrett noted that Gregor had done a nice job on the GitHub.
Gregor noted that some improvements can be done, but in principal we now have framework to additional benchmarks and additional features. Note that the program is flexible to rename, add or delete things. Further, it also includes bibtex refs which is important for inclusion in the papers we write. This PDF is a template for using a quite sophisticated latex table while going beyond what we typically use, e.g. long table and landscape mode switch when rendering the PDF.
The Benchmark Carpentry paper is https://www.overleaf.com/project/67585323797c7e764c254a84 or https://www.overleaf.com/9828764221czxzxxcxmcrr#1f1c84 but not updated this week
Kirsten Morehous and Khojasteh Mirza joined team
Gary noted standard data sets allow comparison of benchmarks, i.e. benchmark the benchmarks
Benchmarks are https://docs.google.com/spreadsheets/d/1Haagoyc1cqmPca96fR9dHSsDsuO0j6bOCgXSVfTr0IM/edit?gid=0\&authuser=2#gid=0 or a different version https://docs.google.com/spreadsheets/d/1Ysk32dqkgdGfDW0rFaCpc8o1Cp6uhtJqbDFAIlhfb9o/edit?usp=sharing
Gregor will reconcile these versions
Victor suggested asking for a presentation on Julia benchmarks; Geoffrey will ask Alan Edelman.