2023 Summary

In 2023, the MLCommons Science Working Group expanded its scope from individual surrogate benchmarks to the broader landscape of Science Foundation Models and the integration of AI into complex HPC simulation workflows.

Key Initiatives

Science Foundation Models

The year saw a significant shift toward the study and cataloging of foundation models tailored for scientific domains: - Science FM Hub: The group began tracking and describing a collection of science foundation models. - Key Models: Discussions focused on high-impact models such as IBM-NASA's Prithvi and the Polymathic AI project from the Flatiron Institute, which aims to advance multi-disciplinary AI. - Domain-Specific AI: Exploration of specialized models for weather forecasting (e.g., Google's WeatherBench) and other physical sciences.

Hybrid AI-HPC Workflows

The group explored the "AI-in-the-loop" paradigm, where ML models are embedded directly into the timesteps of traditional scientific simulations: - HPE SmartSim: The introduction of the SmartSim library highlighted a data-first approach to coupling HPC and AI, using a distributed database (Redis) to manage communication between simulators and AI models. - Application to Ocean Modeling: Use cases such as using AI for turbulence closure in the MOM6 ocean model demonstrated the potential for reducing computational overhead while maintaining scientific accuracy.

The Origins of Benchmark Carpentry

The conceptual framework for "Benchmark Carpentry" began to emerge in 2023. The group started discussing the need for a standardized "craft" of benchmarking—moving beyond just providing a dataset to defining the full methodology of benchmark creation.

Technical and Academic Contributions

Resource Efficiency and Energy

The group continued its focus on the environmental and computational cost of AI, working on publications related to machine learning resource efficiency and the energy footprint of scientific AI.

Reproducibility and Traceability

Recognizing the "reproducibility crisis" in science, the group discussed the implementation of containerized environments and standardized workflows to ensure that benchmark results are traceable and reproducible across different HPC systems.

Community and Outreach

The group maintained a strong presence in the HPC community: - SC23 (Denver): The group led multiple Birds of a Feather (BOF) sessions, presenting on science foundation models and the strategic direction of the Science WG within MLCommons. - Collaborations: New partnerships were explored with industry leaders (e.g., HPE) and national laboratories to integrate software technology like SmartSim into community benchmarks.

Summary of Progress

By the end of 2023, the Science WG had evolved into a hub for discussing the intersection of foundation models, hybrid simulation-AI workflows, and the rigorous methodology of scientific benchmarking.