Skip to content

April 2, 2025

April 2, 2025

Present

Azza Ahmad, Datta Nimmaturi, Fred Sala, Gary Mazzaferro, Gregg Barrett, Geoffrey Fox, Gregor von Laszewski, Howard Pritchard, Javier Toledo, Jihao Shi, Juri Papay, Matt Sinclair, Murali Emani, Nhan Tran, Philip Harris, Satoshi Iwata, Shantenu Jha, Shirley Moore, Tom Gibbs, Victor Lu

Apologies

Christine Kirkpatrick

Tentative Agenda

Google Meet Notes

  • Full notes are at Copy of MLC Science WG - 2025/04/02 07:56 PDT - Notes by Gemini.
  • Summary of these is:
  • Time zone confusion related to daylight saving time changes.
  • A humorous discussion about purchasing Jack Daniels whiskey due to a potential trade war.
  • Introductions of new members, Fred Sala and Datta Nimmaturi.
  • A presentation by Fred Sala on TPBench, a benchmark for AI performance in theoretical physics. This included discussions on reasoning models, data contamination, benchmark design, results, and future directions.
  • Questions and discussions related to TPBench, including data set creation, community contributions, training infrastructure, and scaling laws.
  • Meeting time and scheduling for future meetings.
  • Plans for future presentations, including one from the Aurora GPT team.
  • The main focus of the meeting was Fred Sala's presentation on TPBench and the related discussions about AI benchmarking in theoretical physics.

New Members

Presentation on TPBench

New Meeting Time

  • After Doodle poll, we suggested every other Tuesday at 9 pm starting April 22, meeting every two weeks. We passed this idea to David Kanter to approve.

Scientific benchmarks and challenges - follow up discussion

  • Need to follow up with MLComons How to fix table. Verify with Harshat which column we identified are not properly defined in benchmark.

Catalog of Science benchmarks

White Papers

  • Gregor has opportunity to create special collection with Frontiers in High Performance computing. Should we proceed with this, what is the topic.
  • Gregor asked for clarification what documents are needed
  • He assumes call that can be published
  • Maybe: High Performance COmputing and Machine Learning Benchmarks …
  • Paper 1: The Benchmark carpentry white paper https://www.overleaf.com/9828764221czxzxxcxmcrr#1f1c84
  • Meeting with Armstrong: Focus on Energy
  • Meeting with Victor: More concrete information requested. E.g. he likes to focus on complexity theory. Discussion took place that most HPC benchmarks started or include complexity theory, so it is important that he articulates not only what is done, but projects concrete example so we understand how this is different from regular activities.
  • Matt Sinclair: extend the classification. How about adding a 3rd picture and see if we should merge them or keep them separate?
  • Motivation for repeatable benchmarks: What we said has been validated by UFL (this is small benchmark of traffic camera analysis): Has similar benchmark characteristics as long running earthquauke prediction. UFL and UVA A100 are similarly “performing”