Index of Benchmarks
MMLU (Massive Multitask Language Understanding)
Computational Science & AI • Reasoning & Generalization
Avg rating: 3.83/5
ClimateLearn - Weather Forcasting
Climate & Earth Science • Sequence Prediction/Forecasting
Avg rating: 5.00/5
ClimateLearn - Downscaling
Climate & Earth Science • Regression
Avg rating: 5.00/5
ClimateLearn - Climate Projection
Climate & Earth Science • Regression
Avg rating: 5.00/5
CFDBench (Fluid Dynamics)
Mathematics • Regression
Avg rating: 3.33/5
SatImgNet
Climate & Earth Science • Multimodal Reasoning
Avg rating: 3.83/5
Quantum Computing Benchmarks (QML)
Computational Science & AI • Classification
Avg rating: 3.17/5
OCP (Open Catalyst Project)
Chemistry, Materials Science • Regression
Avg rating: 4.17/5
MedQA
Biology & Medicine • Reasoning & Generalization
Avg rating: 3.50/5
Materials Project
Materials Science • Regression
Avg rating: 1.92/5
GPQA Diamond
Biology & Medicine, Chemistry, High Energy Physics • Reasoning & Generalization
Avg rating: 3.83/5
ARC-Challenge (Advanced Reasoning Challenge)
Computational Science & AI • Reasoning & Generalization
Avg rating: 4.83/5
FrontierMath
Mathematics • Reasoning & Generalization
Avg rating: 2.50/5
SciCode
Computational Science & AI • Generative
Avg rating: 4.50/5
AIME (American Invitational Mathematics Examination)
Mathematics • Reasoning & Generalization
Avg rating: 2.33/5
PRM800K
Mathematics • Reasoning & Generalization
Avg rating: 3.83/5
CURIE (Scientific Long-Context Understanding, Reasoning and Information Extraction)
Materials Science, High Energy Physics, Biology & Medicine, Chemistry, Climate & Earth Science • Reasoning & Generalization
Avg rating: 3.33/5
FEABench (Finite Element Analysis Benchmark): Evaluating Language Models on Multiphysics Reasoning Ability
Mathematics • Reasoning & Generalization
Avg rating: 3.83/5
SPIQA (Scientific Paper Image Question Answering)
Computational Science & AI • Multimodal Reasoning
Avg rating: 3.67/5
BaisBench (Biological AI Scientist Benchmark) - Question Answering
Biology & Medicine • Reasoning & Generalization
Avg rating: 4.00/5
BaisBench (Biological AI Scientist Benchmark) - Cell Type Annotation
Biology & Medicine • Classification
Avg rating: 4.00/5
MOLGEN
Chemistry • Generative
Avg rating: 4.83/5
Open Graph Benchmark (OGB) - Biology
Biology & Medicine • Sequence Prediction/Forecasting
Avg rating: 4.83/5
Jet Classification
High Energy Physics • Classification
Avg rating: 4.17/5
Irregular Sensor Data Compression
High Energy Physics • Generative
Avg rating: 4.17/5
Beam Control
High Energy Physics • Reinforcement Learning/Control
Avg rating: 3.00/5
Ultrafast jet classification at the HL-LHC
High Energy Physics • Classification
Avg rating: 3.17/5
Quench detection
High Energy Physics • Anomaly Detection
Avg rating: 2.17/5
Intelligent experiments through real-time AI
High Energy Physics • Classification
Avg rating: 3.00/5
Neural Architecture Codesign for Fast Physics Applications
High Energy Physics • Classification
Avg rating: 3.83/5
Smart Pixels for LHC
High Energy Physics • Classification
Avg rating: 3.33/5
HEDM (BraggNN)
Materials Science • Classification
Avg rating: 3.17/5
4D-STEM
Materials Science • Classification
Avg rating: 3.17/5
In-Situ High-Speed Computer Vision
High Energy Physics • Classification
Avg rating: 1.50/5
MLPerf HPC - Cosmoflow
High Energy Physics • Regression
Avg rating: 4.17/5
MLPerf HPC - DeepCAM
Climate & Earth Science • Classification
Avg rating: 4.17/5
MLPerf HPC - Open Catalyst Project DimeNet++
Chemistry • Regression
Avg rating: 4.17/5
MLPerf HPC - OpenFold
Biology & Medicine • Sequence Prediction/Forecasting
Avg rating: 4.17/5
MLCommons Science - CloudMask
Climate & Earth Science • Classification
Avg rating: 5.00/5
MLCommons Science - Earthquake
Climate & Earth Science • Sequence Prediction/Forecasting
Avg rating: 5.00/5
MLCommons Science - Candle UNO
Biology & Medicine • Classification
Avg rating: 5.00/5
MLCommons Science - STEMDL
Materials Science • Classification
Avg rating: 5.00/5
LHC New Physics Dataset
High Energy Physics • Anomaly Detection
Avg rating: 3.33/5
MLCommons Medical AI - Pancreas Segmentation (DFCI)
Biology & Medicine • Classification
Avg rating: 4.33/5
MLCommons Medical AI - Brain Tumor Segmentation (BraTS)
Biology & Medicine • Classification
Avg rating: 4.33/5
MLCommons Medical AI - Surgical Workflow Phase Recognition (SurgMLCube)
Biology & Medicine • Classification
Avg rating: 4.33/5
CaloChallenge 2022
High Energy Physics • Generative
Avg rating: 4.50/5
PDEBench
Computational Science & AI, Climate & Earth Science, Mathematics • Regression
Avg rating: 4.50/5
Delta Squared-DFT
Chemistry, Materials Science • Regression
Avg rating: 3.83/5
Urban Data Layer (UDL) - PM2.5 Concentration Prediction
Climate & Earth Science • Regression
Avg rating: 4.50/5
Urban Data Layer (UDL) - Built-up Area Classification
Climate & Earth Science • Classification
Avg rating: 4.50/5
Urban Data Layer (UDL) - Administrative Boundaries Identification
Climate & Earth Science • Classification
Avg rating: 4.50/5
Urban Data Layer (UDL) - El Nino Anomaly Detection
Climate & Earth Science • Anomaly Detection
Avg rating: 4.50/5
LLMs for Crop Science
Climate & Earth Science • Reasoning & Generalization
Avg rating: 4.67/5
DUNE
High Energy Physics • Anomaly Detection
Avg rating: 2.83/5
MassSpecGym - De novo molecule generation
Chemistry • Generative
Avg rating: 3.75/5
MassSpecGym - Molecule Retrieval
Chemistry • Regression
Avg rating: 3.75/5
MassSpecGym - Spectrum Simulation
Chemistry • Regression
Avg rating: 3.75/5
HDR ML Anomaly Challenge - Gravitational Waves
High Energy Physics • Anomaly Detection
Avg rating: 4.17/5
HDR ML Anomaly Challenge - Butterfly
Biology & Medicine • Anomaly Detection
Avg rating: 3.00/5
HDR ML Anomaly Challenge - Sea Level Rise
Climate & Earth Science • Anomaly Detection
Avg rating: 3.83/5
Single Qubit Readout on QICK System
Computational Science & AI • Classification
Avg rating: 3.50/5
GPQA: A Graduate-Level Google-Proof Question and Answer Benchmark
Biology & Medicine, High Energy Physics, Chemistry • Reasoning & Generalization
Avg rating: 3.67/5
SeafloorAI
Climate & Earth Science • Classification
Avg rating: 4.33/5
SeafloorGenAI
Climate & Earth Science • Reasoning & Generalization
Avg rating: 4.33/5
SuperCon3D - Property Prediction
Materials Science • Regression
Avg rating: 4.17/5
SuperCon3D - Inverse Crystal Structure Generation
Materials Science • Generative
Avg rating: 4.17/5
GeSS - Track Pileup
High Energy Physics • Classification
Avg rating: 4.33/5
GeSS - Track Signal
High Energy Physics • Classification
Avg rating: 4.33/5
GeSS - DrugOOD
Biology & Medicine • Classification
Avg rating: 4.33/5
GeSS - QMOF
Materials Science • Regression
Avg rating: 4.33/5
Vocal Call Locator (VCL)
Biology & Medicine • Regression
Avg rating: 3.83/5
SPIQA (LLM)
Computational Science & AI • Multimodal Reasoning
Avg rating: 4.42/5
The Well
Biology & Medicine, Computational Science & AI, High Energy Physics • Sequence Prediction/Forecasting
Avg rating: 4.00/5