Papers
This page lists the white papers produced by the MLCommons Science Working Group and other academic papers referenced in our meeting notes.
Working Group Publications
- AI Benchmarks Carpentry and Democratization (White Paper) - arXiv:2512.11588 | Overleaf Draft
- Science Benchmarks and Ontologies - arXiv:2511.05614
- Using Benchmarking Data to Inform Decisions Related to Machine Learning Resource Efficiency - Zenodo | Overleaf Draft
- MLCommons Science FAIR Concept Paper (AI Readiness) - Google Doc
Referenced Papers
Foundation Models & LLMs for Science
- OceanGPT: A Large Language Model for Ocean Science Tasks - Papers With Code
- K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization - arXiv:2306.05064
- FORGE: Pre-Training Open Foundation Models for Science - ACM Digital Library
- ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability - arXiv:2404.14712
- ClimaX: A foundation model for weather and climate - arXiv:2301.10343
- Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior - arXiv:2306.00258
- On the Opportunities and Risks of Foundation Models - arXiv:2108.07258
- Towards self-driving laboratories: The central role of density functional theory in the AI age - arXiv:2304.03272
Benchmarking & Evaluation
- MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI - arXiv:2410.12032
- FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning - arXiv:2207.07958
- Theoretical Physics Benchmark (TPBench): a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics - arXiv:2502.15815
- GPQA: A Graduate-Level Google-Proof Q&A Benchmark - arXiv:2311.12022
- Humanity's Last Exam - arXiv:2501.14249
- NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems - arXiv:2304.04640
- Evaluation Framework for AI Systems in "the Wild" - arXiv:2504.16778
HPC & Digital Twins
- HPC-GPT: Integrating Large Language Model for High-Performance Computing - ACM Digital Library
- Digital Twins for Data Centers - IEEE Xplore
- A digital twin framework for liquid-cooled supercomputers as demonstrated at exascale - arXiv:2410.05133
- Towards the Development of an Exascale Network Digital Twin - OSTI
- AI-coupled HPC Workflow Applications, Middleware and Performance - arXiv:2406.14315
- In-Transit Data Transport Strategies for Coupled AI-Simulation Workflow Patterns - ACM Digital Library
- Production Deployment of Machine-Learned Rotorcraft Surrogate Models on HPC - IEEE Xplore
Time Series & Other Science AI
- Time-series forecasting with deep learning: a survey - Royal Society Publishing
- FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead - arXiv:2304.02948
- RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks - arXiv:2401.09093
- Time Series Foundation Models and Deep Learning Architectures for Earthquake Temporal and Spatial Nowcasting - arXiv:2408.11990
- TSMixer: An All-MLP Architecture for Time Series Forecasting - arXiv:2303.06053
- Knowledge-Guided Machine Learning: A New Framework for Accelerating Scientific Discovery and Addressing Global Environmental Challenges - IEEE Xplore
AI Ethics & General AI
- Rethinking LLM Bias Probing Using Lessons from the Social Sciences - arXiv:2503.00093
- Dialect prejudice predicts AI decisions about people's character, employability, and criminality - arXiv:2403.00742
- Large language models propagate race-based medicine - Nature
- Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research - arXiv:2402.00159
- An Attention Free Transformer - arXiv:2105.14103
- Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model - JMLR
- Patterns and networks of language control in bilingual language production - Springer
- Neural Networks are Decision Trees - arXiv:2210.05189
- ConvNets Match Vision Transformers at Scale - arXiv:2310.16764
- Trends in AI Supercomputers - arXiv:2504.16026
- LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers - arXiv:2511.00116
- MCBound Framework - University of Bologna
Project & Resource Documents
I have merged all the resources into a single, organized table, removing duplicates and grouping them by category as requested.
📚 External Project Resources (Google Docs & Drive)
| Resource Title | URL |
|---|---|
| Benchmarks & Technical | |
| MLCommons Science Benchmarks | Link |
| Benchmark Outputs | Link |
| Context aware Scientific benchmark | Link |
| WeatherBench2 Research and Analysis | Link |
| Calorimeter | Link |
| Data aspect of Scientific AI Benchmarks | Link |
| Database Benchmarks | Link |
| benchmark_predictions_v2.xlsx | Link |
| COMBINED (Spreadsheet) | Link |
| SciMLBench_MLCommons_Science.pdf | Link |
| FoundationModelsPatternsBenchmarks.pdf | Link |
| MLCommons_SimAIBench_240723.pdf | Link |
| Charters & Policy | |
| FAIR4ML-RDA-IG-Charter | Link |
| MLCommons Storage Working Group Description | Link |
| Research & Summaries | |
| SciFM24 Summary | Link |
| Table 1: Foundation Model Summary | Link |
| LLNL D4DCT MLCommons Updated | Link |
| LLNL_D4DCT_MLCommons.pdf | Link |
| Workshop on Clusters, Clouds, and Data for Scientific Computing - 2024 | Link |
| SONIC ML Commons.pdf | Link |
| AI for Science - The Fourth Paradigm.pdf | Link |
| DOE AI-Driven Science and Technology Initiatives.pdf | Link |
| Drafts & Integration | |
| MLCommons Science Blog post draft | Link |
| mlcommons tasks for integration | Link |
| Presentations & Slides | |
| MLCommons_Science.pptx | Link |
| MLCommons - Requirements | Link |
| AI4Science Foundation Models Overview | Link |
| MLCommons Benchmarks - Taxonomy | Link |
| Creating Open Data sets and Benchmarks | Link |
| Foundation Models and Patterns for Science Time Series | Link |
| Reference_Implementations_v1.pptx | Link |
| Other Resources | |
| SBI-FAIR February 14 2022 Meeting (Resource) | Link |
| HPC WG Minutes (Resource) | Link |
| HPCFAIR Presentation Folder | Link |