Skip to content

MLCommons Science Working Group AI Benchmarks Collection

List

Initializing search

MLCommons Science Benchmarks

MLCommons Science Working Group AI Benchmarks Collection
Report
List
Table
Cards

MLCommons Science Working Group AI Benchmarks Collection

MLCommons Science Benchmarks

MLCommons Science Working Group AI Benchmarks Collection
Report
List
Table
Cards

Index of Benchmarks

4D-STEM
AIME (American Invitational Mathematics Examination)
ARC-Challenge (Advanced Reasoning Challenge)
BaisBench (Biological AI Scientist Benchmark) - Cell Type Annotation
BaisBench (Biological AI Scientist Benchmark) - Question Answering
Beam Control
CaloChallenge 2022
CFDBench (Fluid Dynamics)
ClimateLearn - Climate Projection
ClimateLearn - Downscaling
ClimateLearn - Weather Forcasting
CURIE (Scientific Long-Context Understanding, Reasoning and Information Extraction)
Delta Squared-DFT
DUNE
FEABench (Finite Element Analysis Benchmark): Evaluating Language Models on Multiphysics Reasoning Ability
FrontierMath
GeSS - DrugOOD
GeSS - QMOF
GeSS - Track Pileup
GeSS - Track Signal
GPQA Diamond
GPQA: A Graduate-Level Google-Proof Question and Answer Benchmark
HDR ML Anomaly Challenge - Butterfly
HDR ML Anomaly Challenge - Gravitational Waves
HDR ML Anomaly Challenge - Sea Level Rise
HEDM (BraggNN)
In-Situ High-Speed Computer Vision
Intelligent experiments through real-time AI
Irregular Sensor Data Compression
Jet Classification
LHC New Physics Dataset
LLMs for Crop Science
MassSpecGym - De novo molecule generation
MassSpecGym - Molecule Retrieval
MassSpecGym - Spectrum Simulation
Materials Project
MedQA
MLCommons Medical AI - Surgical Workflow Phase Recognition (SurgMLCube)
MLCommons Medical AI - Brain Tumor Segmentation (BraTS)
MLCommons Medical AI - Pancreas Segmentation (DFCI)
MLCommons Science - Candle UNO
MLCommons Science - CloudMask
MLCommons Science - Earthquake
MLCommons Science - STEMDL
MLPerf HPC - Cosmoflow
MLPerf HPC - DeepCAM
MLPerf HPC - Open Catalyst Project DimeNet++
MLPerf HPC - OpenFold
MMLU (Massive Multitask Language Understanding)
MOLGEN
Neural Architecture Codesign for Fast Physics Applications
OCP (Open Catalyst Project)
Open Graph Benchmark (OGB) - Biology
PDEBench
PRM800K
Quantum Computing Benchmarks (QML)
Quench detection
SatImgNet
SciCode
SeafloorAI
SeafloorGenAI
Single Qubit Readout on QICK System
Smart Pixels for LHC
SPIQA (LLM)
SPIQA (Scientific Paper Image Question Answering)
SuperCon3D - Inverse Crystal Structure Generation
SuperCon3D - Property Prediction
The Well
Ultrafast jet classification at the HL-LHC
Urban Data Layer (UDL) - Administrative Boundaries Identification
Urban Data Layer (UDL) - Built-up Area Classification
Urban Data Layer (UDL) - El Nino Anomaly Detection
Urban Data Layer (UDL) - PM2.5 Concentration Prediction
Vocal Call Locator (VCL)

Made with Material for MkDocs