Papers

This page lists the white papers produced by the MLCommons Science Working Group and other academic papers referenced in our meeting notes.

Working Group Publications

AI Benchmarks Carpentry and Democratization (White Paper) - arXiv:2512.11588 | Overleaf Draft
Science Benchmarks and Ontologies - arXiv:2511.05614
Using Benchmarking Data to Inform Decisions Related to Machine Learning Resource Efficiency - Zenodo | Overleaf Draft
MLCommons Science FAIR Concept Paper (AI Readiness) - Google Doc

OceanGPT: A Large Language Model for Ocean Science Tasks - Papers With Code
K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization - arXiv:2306.05064
FORGE: Pre-Training Open Foundation Models for Science - ACM Digital Library
ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability - arXiv:2404.14712
ClimaX: A foundation model for weather and climate - arXiv:2301.10343
Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior - arXiv:2306.00258
On the Opportunities and Risks of Foundation Models - arXiv:2108.07258
Towards self-driving laboratories: The central role of density functional theory in the AI age - arXiv:2304.03272

MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI - arXiv:2410.12032
FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning - arXiv:2207.07958
Theoretical Physics Benchmark (TPBench): a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics - arXiv:2502.15815
GPQA: A Graduate-Level Google-Proof Q&A Benchmark - arXiv:2311.12022
Humanity's Last Exam - arXiv:2501.14249
NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems - arXiv:2304.04640
Evaluation Framework for AI Systems in "the Wild" - arXiv:2504.16778

HPC-GPT: Integrating Large Language Model for High-Performance Computing - ACM Digital Library
Digital Twins for Data Centers - IEEE Xplore
A digital twin framework for liquid-cooled supercomputers as demonstrated at exascale - arXiv:2410.05133
Towards the Development of an Exascale Network Digital Twin - OSTI
AI-coupled HPC Workflow Applications, Middleware and Performance - arXiv:2406.14315
In-Transit Data Transport Strategies for Coupled AI-Simulation Workflow Patterns - ACM Digital Library
Production Deployment of Machine-Learned Rotorcraft Surrogate Models on HPC - IEEE Xplore

Time-series forecasting with deep learning: a survey - Royal Society Publishing
FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead - arXiv:2304.02948
RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks - arXiv:2401.09093
Time Series Foundation Models and Deep Learning Architectures for Earthquake Temporal and Spatial Nowcasting - arXiv:2408.11990
TSMixer: An All-MLP Architecture for Time Series Forecasting - arXiv:2303.06053
Knowledge-Guided Machine Learning: A New Framework for Accelerating Scientific Discovery and Addressing Global Environmental Challenges - IEEE Xplore

Rethinking LLM Bias Probing Using Lessons from the Social Sciences - arXiv:2503.00093
Dialect prejudice predicts AI decisions about people's character, employability, and criminality - arXiv:2403.00742
Large language models propagate race-based medicine - Nature
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research - arXiv:2402.00159
An Attention Free Transformer - arXiv:2105.14103
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model - JMLR
Patterns and networks of language control in bilingual language production - Springer
Neural Networks are Decision Trees - arXiv:2210.05189
ConvNets Match Vision Transformers at Scale - arXiv:2310.16764
Trends in AI Supercomputers - arXiv:2504.16026
LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers - arXiv:2511.00116
MCBound Framework - University of Bologna

I have merged all the resources into a single, organized table, removing duplicates and grouping them by category as requested.

Resource Title	URL
Benchmarks & Technical
MLCommons Science Benchmarks	Link
Benchmark Outputs	Link
Context aware Scientific benchmark	Link
WeatherBench2 Research and Analysis	Link
Calorimeter	Link
Data aspect of Scientific AI Benchmarks	Link
Database Benchmarks	Link
benchmark_predictions_v2.xlsx	Link
COMBINED (Spreadsheet)	Link
SciMLBench_MLCommons_Science.pdf	Link
FoundationModelsPatternsBenchmarks.pdf	Link
MLCommons_SimAIBench_240723.pdf	Link
Charters & Policy
FAIR4ML-RDA-IG-Charter	Link
MLCommons Storage Working Group Description	Link
Research & Summaries
SciFM24 Summary	Link
Table 1: Foundation Model Summary	Link
LLNL D4DCT MLCommons Updated	Link
LLNL_D4DCT_MLCommons.pdf	Link
Workshop on Clusters, Clouds, and Data for Scientific Computing - 2024	Link
SONIC ML Commons.pdf	Link
AI for Science - The Fourth Paradigm.pdf	Link
DOE AI-Driven Science and Technology Initiatives.pdf	Link
Drafts & Integration
MLCommons Science Blog post draft	Link
mlcommons tasks for integration	Link
Presentations & Slides
MLCommons_Science.pptx	Link
MLCommons - Requirements	Link
AI4Science Foundation Models Overview	Link
MLCommons Benchmarks - Taxonomy	Link
Creating Open Data sets and Benchmarks	Link
Foundation Models and Patterns for Science Time Series	Link
Reference_Implementations_v1.pptx	Link
Other Resources
SBI-FAIR February 14 2022 Meeting (Resource)	Link
HPC WG Minutes (Resource)	Link
HPCFAIR Presentation Folder	Link