2022 Summary

In 2022, the MLCommons Science Working Group transitioned from foundational goal-setting to the active development, refinement, and standardization of a diverse suite of scientific benchmarks.

Key Initiatives

Standardization and Compliance

A primary focus of the year was ensuring that scientific benchmarks were compliant with MLCommons standards while remaining scientifically relevant. This included: - Science Training Policy: Developing a tailored set of rules that mandate both performance and scientific accuracy, ensuring that "winning" a benchmark requires a scientifically valid result. - Reference Implementations: Evaluating the compliance of the Science benchmark suite and refining the process for adding new benchmarks to the ecosystem.

Expansion of the Benchmark Suite

The group significantly expanded its portfolio of benchmarks, moving toward a "surrogate-heavy" approach: - Refinement of Existing Benchmarks: Improvements were made to the Earthquake and Cloudmask benchmarks, and the STEMDL benchmark was enhanced through feature engineering and the exploration of Vision Transformers. - New Surrogate Models: The group introduced a wide array of new surrogates, including: - Environmental & Geophysical: Tsunami inundation and Diffusion solvers. - Engineering: CFD (Computational Fluid Dynamics) for helicopter rotor aerodynamics. - Physics & Energy: Fusion plasma simulations and Autotuning surrogates for BLAS members. - Health: Epidemic (Covid) simulation surrogates.

Energy and Power Analysis

Building on the FAIR principles of 2021, the group emphasized the need to measure energy consumption. Discussions highlighted that FLOP rates alone are insufficient, and that power-related metrics are critical for understanding the true cost and efficiency of scientific AI.

Data Quality and FAIRness

The group conducted a rigorous assessment of existing MLCommons benchmarks, identifying inconsistencies in system definition files. This led to: - Data Quality Improvements: Recommendations to MLCommons for better data standardization. - FAIR4ML: Support for the formation of the RDA Interest Group "FAIR for Machine Learning" (FAIR4ML) to bridge the gap between the FAIR data community and the ML community.

Community Engagement

The group increased its visibility through: - Conference Presence: A successful BOF (Birds of a Feather) session at SC22, which generated significant interest in scientific benchmarking. - Public Outreach: Drafting blog posts and coordinating with MLCommons PR to announce the availability of science benchmarks.

Summary of Progress

By the end of 2022, the Science WG had evolved from a small group of collaborators into a structured entity with a growing library of benchmarks, a clear policy for compliance, and a strong commitment to the intersection of scientific accuracy and computational performance.