June 11, 2025

Present

Armstrong Foundjem, Frank Liu, Geoffrey Fox, Gregor von Laszewski, Howard Pritchard, Iulia Ibanescu, Juri Papay, Kirsten Morehouse, Marco Colombo, Satoshi Iwata, Wenhui Zhang, Sebastian Lobentanzer, Shirley Moore, Vaibhavi Singh, Victor Lu,

Tentative Agenda

Any New Members Introduction
Continuing discussion of New Benchmarks and the catalog of Science benchmarks based on https://docs.google.com/spreadsheets/d/1Ysk32dqkgdGfDW0rFaCpc8o1Cp6uhtJqbDFAIlhfb9o/edit?usp=sharing
White Papers
The Benchmark carpentry white paper https://www.overleaf.com/9828764221czxzxxcxmcrr#1f1c84
New white paper on Science Benchmarks
Any Other Business

Google Meet Notes

MLC Science WG - 2025/06/11 07:59 PDT - Notes by Gemini
Summary: Juri Papay, Geoffrey Fox, Victor Lu, Gregor von Laszewski, Armstrong Foundjem, Sebastian Lobentanzer, Kirsten Morehouse, and Frank discussed funding losses for benchmarking, the landscape of existing hardware-focused benchmarks versus the need for system-level and scientific benchmarks, and the relevance of ML Perf. New members Sebastian Lobentanzer, Kirsten Morehouse, and Frank introduced themselves and their interests, with Kirsten Morehouse agreeing to give a presentation on bias research. Participants also covered progress on white papers defining benchmarks, the importance of surrogate modeling and digital twins in scientific benchmarking, and the potential for a wide spectrum of benchmarking initiatives.

New Members

Sebastian Lobentanzer, PI of “Accessible Biomedical AI Research” at Helmholtz Center Munich, Germany (Computational Health Center), Lab website: SLOLAB—Accessible Biomedical AI Research, Personal website (incl social links): Sebastian Lobentanzer
Kirsten Morehouse Kirsten Morehouse psychology Ph.D. candidate at Harvard University with a secondary in data science. She has a position paper preprint (forthcoming at ICML): [2503.00093] Rethinking LLM Bias Probing Using Lessons from the Social Sciences
She will give a talk on social biases in mainly LLM next meeting
Frank Liu receieved his PhD in Computer Engineering from the Fulton Schools of Engineering at Arizona State University in Spring 2024 FrankLiu.
Vaibhavi Singh incoming NYU Courant MS CS grad. Sharing my socials as requested by Geoffrey - Twitter - Vaibhavi Singh (@__Vaibhavi) / X , Linkedin - Vaibhavi Singh - Stealth Startup | LinkedIn
She is interested in Multimodal and Effective benchmarking. Previously worked at Adobe on computer vision projects.

Discussion

Gregor updated us on the carpentry paper and working with Nhan’s student interns cataloging benchmarks
There are ifferent definitions of dynamic and science benchmark.
Victor prepared a document to discuss Context aware Scientific benchmark
Sebastian noted the white paper on evaluation of AI systems in the wild (dynamic benchmarks): [2504.16778] Evaluation Framework for AI Systems in "the Wild"