Skip to content

Papers

This page lists the white papers produced by the MLCommons Science Working Group and other academic papers referenced in our meeting notes.

Working Group Publications

Referenced Papers

Foundation Models & LLMs for Science

  • OceanGPT: A Large Language Model for Ocean Science Tasks - Papers With Code
  • K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization - arXiv:2306.05064
  • FORGE: Pre-Training Open Foundation Models for Science - ACM Digital Library
  • ORBIT: Oak Ridge Base Foundation Model for Earth System Predictability - arXiv:2404.14712
  • ClimaX: A foundation model for weather and climate - arXiv:2301.10343
  • Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior - arXiv:2306.00258
  • On the Opportunities and Risks of Foundation Models - arXiv:2108.07258
  • Towards self-driving laboratories: The central role of density functional theory in the AI age - arXiv:2304.03272

Benchmarking & Evaluation

  • MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from Microwatts to Megawatts for Sustainable AI - arXiv:2410.12032
  • FastML Science Benchmarks: Accelerating Real-Time Scientific Edge Machine Learning - arXiv:2207.07958
  • Theoretical Physics Benchmark (TPBench): a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics - arXiv:2502.15815
  • GPQA: A Graduate-Level Google-Proof Q&A Benchmark - arXiv:2311.12022
  • Humanity's Last Exam - arXiv:2501.14249
  • NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems - arXiv:2304.04640
  • Evaluation Framework for AI Systems in "the Wild" - arXiv:2504.16778

HPC & Digital Twins

  • HPC-GPT: Integrating Large Language Model for High-Performance Computing - ACM Digital Library
  • Digital Twins for Data Centers - IEEE Xplore
  • A digital twin framework for liquid-cooled supercomputers as demonstrated at exascale - arXiv:2410.05133
  • Towards the Development of an Exascale Network Digital Twin - OSTI
  • AI-coupled HPC Workflow Applications, Middleware and Performance - arXiv:2406.14315
  • In-Transit Data Transport Strategies for Coupled AI-Simulation Workflow Patterns - ACM Digital Library
  • Production Deployment of Machine-Learned Rotorcraft Surrogate Models on HPC - IEEE Xplore

Time Series & Other Science AI

  • Time-series forecasting with deep learning: a survey - Royal Society Publishing
  • FengWu: Pushing the Skillful Global Medium-range Weather Forecast beyond 10 Days Lead - arXiv:2304.02948
  • RWKV-TS: Beyond Traditional Recurrent Neural Network for Time Series Tasks - arXiv:2401.09093
  • Time Series Foundation Models and Deep Learning Architectures for Earthquake Temporal and Spatial Nowcasting - arXiv:2408.11990
  • TSMixer: An All-MLP Architecture for Time Series Forecasting - arXiv:2303.06053
  • Knowledge-Guided Machine Learning: A New Framework for Accelerating Scientific Discovery and Addressing Global Environmental Challenges - IEEE Xplore

AI Ethics & General AI

  • Rethinking LLM Bias Probing Using Lessons from the Social Sciences - arXiv:2503.00093
  • Dialect prejudice predicts AI decisions about people's character, employability, and criminality - arXiv:2403.00742
  • Large language models propagate race-based medicine - Nature
  • Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research - arXiv:2402.00159
  • An Attention Free Transformer - arXiv:2105.14103
  • Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model - JMLR
  • Patterns and networks of language control in bilingual language production - Springer
  • Neural Networks are Decision Trees - arXiv:2210.05189
  • ConvNets Match Vision Transformers at Scale - arXiv:2310.16764
  • Trends in AI Supercomputers - arXiv:2504.16026
  • LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers - arXiv:2511.00116
  • MCBound Framework - University of Bologna

Project & Resource Documents

I have merged all the resources into a single, organized table, removing duplicates and grouping them by category as requested.

📚 External Project Resources (Google Docs & Drive)

Resource Title URL
Benchmarks & Technical
MLCommons Science Benchmarks Link
Benchmark Outputs Link
Context aware Scientific benchmark Link
WeatherBench2 Research and Analysis Link
Calorimeter Link
Data aspect of Scientific AI Benchmarks Link
Database Benchmarks Link
benchmark_predictions_v2.xlsx Link
COMBINED (Spreadsheet) Link
SciMLBench_MLCommons_Science.pdf Link
FoundationModelsPatternsBenchmarks.pdf Link
MLCommons_SimAIBench_240723.pdf Link
Charters & Policy
FAIR4ML-RDA-IG-Charter Link
MLCommons Storage Working Group Description Link
Research & Summaries
SciFM24 Summary Link
Table 1: Foundation Model Summary Link
LLNL D4DCT MLCommons Updated Link
LLNL_D4DCT_MLCommons.pdf Link
Workshop on Clusters, Clouds, and Data for Scientific Computing - 2024 Link
SONIC ML Commons.pdf Link
AI for Science - The Fourth Paradigm.pdf Link
DOE AI-Driven Science and Technology Initiatives.pdf Link
Drafts & Integration
MLCommons Science Blog post draft Link
mlcommons tasks for integration Link
Presentations & Slides
MLCommons_Science.pptx Link
MLCommons - Requirements Link
AI4Science Foundation Models Overview Link
MLCommons Benchmarks - Taxonomy Link
Creating Open Data sets and Benchmarks Link
Foundation Models and Patterns for Science Time Series Link
Reference_Implementations_v1.pptx Link
Other Resources
SBI-FAIR February 14 2022 Meeting (Resource) Link
HPC WG Minutes (Resource) Link
HPCFAIR Presentation Folder Link