March 22, 2023

Gregg Barrett, Geoffrey Fox, Juri Papay, Christian Herwig, Mallikarjun Shankar, Wesley Brewer, Aristeidis Tsaris, Tom Gibbs, Gregor von Laszewski

After minor changes (from Arjun) the positioning statement https://docs.google.com/document/d/167m7FK6-Ud4M5gXta5cIc1hKqaRHkk2B1GyKasdeQLc/edit?pli=1#heading=h.1c3fuqvm4ia2 was agreed
with general consensus on a blog-like user interface illustrated at https://mlcommons-science.github.io/tags/ .
The adjective open in open division is rather ambiguous – it refers to “open for change” not “open-source” as latter is assumed.
It was noted that we would need a separate leaderboard for each benchmark as the success measure is not universally time as in main MLPerf but rather varies over the benchmarks.
Accuracy is acceptable as a secondary metric.
Gregg Barrettagreed with Arjun - clarity on what constitutes an "improvement" would helpful.
In a later meeting with David Kanter from MLCommons and Ritika Borkar from NVIDIA, our proposal was accepted and we agreed to go through our policy and web material making it consistent.

A fascinating presentation230322 mlcommons science.pdf was given by Christian Herwig on the FastML benchmarks described at https://arxiv.org/abs/2207.07958
These are inference tasks typical of those needed at accelerators (LHC) to select data (trigger) of interest that will be preserved for later study
Benchmarks chosen to cover different interesting regions in latency-datarate space found at https://github.com/fastmachinelearning/fastml-science
Physics experiments probe different latency and data-rate regime from current MLCommons benchmarks
Supervised classification of particle jets that could signal interesting heavy particle production
Sensor data compression
Beam control by Reinforcement learning
Industry dominated by smartphone so lower performance characteristics than physics experiments although these use industry networking etc and so have constraints related to industry.
High radiation environment and performance/watt important in LHC experiments
Interesting “Nutrition Facts” labelling to highlight key features of each benchmark
3 current benchmarks but expanding to eight summarized on last slide for characteristics: Data representation, ML type,, Latency Throughput, Target platform
Another benchmark is for surrogates for detector simulators which need to be reconfigurable as detectectors change with time
Eager to scale this effort to incorporate new scientific domains and new kinds of technical challenges.
The FPGA vs ASIC comparison is interesting in itself