GPQA Diamond
← Back to all benchmarks
Keywords
Citation
- David Rein, Betty Li Hou, and Asa Cooper Stickland. Gpqa: a graduate-level google-proof q and a benchmark. 2023. URL: https://arxiv.org/abs/2311.12022.
@misc{rein2023gpqagraduatelevelgoogleproofqa,
title={GPQA: A Graduate-Level Google-Proof Q and A Benchmark},
author={Rein, David and Hou, Betty Li and Stickland, Asa Cooper},
year={2023},
url={https://arxiv.org/abs/2311.12022}
}
Ratings
CategoryRating
Software
5.00
Python version and requirements specified on Github site
Specification
2.00
No system constraints or I/O specified
Dataset
5.00
Easily able to access dataset. Comes with predefined splits as mentioned in the paper
Metrics
5.00
Each question has a correct answer, representing the tested model's performance.
Reference Solution
1.00
Common models such as GPT-3.5 were compared. They are not open and don't provide requirements
Documentation
5.00
All information is listed in the associated paper
Average rating: 3.83/5
Radar plot
Edit: edit this entry