June 11, 2025
June 11, 2025
Present
Armstrong Foundjem, Frank Liu, Geoffrey Fox, Gregor von Laszewski, Howard Pritchard, Iulia Ibanescu, Juri Papay, Kirsten Morehouse, Marco Colombo, Satoshi Iwata, Wenhui Zhang, Sebastian Lobentanzer, Shirley Moore, Vaibhavi Singh, Victor Lu,
Tentative Agenda
- Any New Members Introduction
- Continuing discussion of New Benchmarks and the catalog of Science benchmarks based on https://docs.google.com/spreadsheets/d/1Ysk32dqkgdGfDW0rFaCpc8o1Cp6uhtJqbDFAIlhfb9o/edit?usp=sharing
- White Papers
- The Benchmark carpentry white paper https://www.overleaf.com/9828764221czxzxxcxmcrr#1f1c84
- New white paper on Science Benchmarks
- Any Other Business
Google Meet Notes
- MLC Science WG - 2025/06/11 07:59 PDT - Notes by Gemini
- Summary: Juri Papay, Geoffrey Fox, Victor Lu, Gregor von Laszewski, Armstrong Foundjem, Sebastian Lobentanzer, Kirsten Morehouse, and Frank discussed funding losses for benchmarking, the landscape of existing hardware-focused benchmarks versus the need for system-level and scientific benchmarks, and the relevance of ML Perf. New members Sebastian Lobentanzer, Kirsten Morehouse, and Frank introduced themselves and their interests, with Kirsten Morehouse agreeing to give a presentation on bias research. Participants also covered progress on white papers defining benchmarks, the importance of surrogate modeling and digital twins in scientific benchmarking, and the potential for a wide spectrum of benchmarking initiatives.
New Members
- Sebastian Lobentanzer, PI of “Accessible Biomedical AI Research” at Helmholtz Center Munich, Germany (Computational Health Center), Lab website: SLOLAB—Accessible Biomedical AI Research, Personal website (incl social links): Sebastian Lobentanzer
- Kirsten Morehouse Kirsten Morehouse psychology Ph.D. candidate at Harvard University with a secondary in data science. She has a position paper preprint (forthcoming at ICML): [2503.00093] Rethinking LLM Bias Probing Using Lessons from the Social Sciences
- She will give a talk on social biases in mainly LLM next meeting
- Frank Liu receieved his PhD in Computer Engineering from the Fulton Schools of Engineering at Arizona State University in Spring 2024 FrankLiu.
- Vaibhavi Singh incoming NYU Courant MS CS grad. Sharing my socials as requested by Geoffrey - Twitter - Vaibhavi Singh (@__Vaibhavi) / X , Linkedin - Vaibhavi Singh - Stealth Startup | LinkedIn
- She is interested in Multimodal and Effective benchmarking. Previously worked at Adobe on computer vision projects.
Discussion
- Gregor updated us on the carpentry paper and working with Nhan’s student interns cataloging benchmarks
- There are ifferent definitions of dynamic and science benchmark.
- Victor prepared a document to discuss Context aware Scientific benchmark
- Sebastian noted the white paper on evaluation of AI systems in the wild (dynamic benchmarks): [2504.16778] Evaluation Framework for AI Systems in "the Wild"