MedQA

← Back to all benchmarks

Date: 2020-09-28

Name: MedQA

Domain: Biology & Medicine

Focus: Medical board exam QA

Task Types: Multiple choice

Metrics: Accuracy

Models: Neural reader, Retrieval-based QA systems

AI/ML Motif: Reasoning & Generalization

Resources

Benchmark: Visit

Datasets: Github

Keywords

USMLE diagnostic QA medical knowledge multilingual

Citation

Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang, and Peter Szolovits. What disease does this patient have? a large-scale open domain question answering dataset from medical exams. 2020. URL: https://arxiv.org/abs/2009.13081, arXiv:2009.13081.

@misc{jin2020diseasedoespatienthave,
    archiveprefix = {arXiv},
    author        = {Di Jin and Eileen Pan and Nassim Oufattole and Wei-Hung Weng and Hanyi Fang and Peter Szolovits},
    eprint        = {2009.13081},
    primaryclass  = {cs.CL},
    title         = {What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams},
    url           = {https://arxiv.org/abs/2009.13081},
    year          = {2020}
  }

Ratings

CategoryRating

Software

5.00

All code available on the github

Specification

3.00

Task is clearly defined as multiple-choice QA for medical board exams; input and output formats are explicit; task scope is rigorous and structured. System constraints not specified.

Dataset

4.00

Dataset is publicly available (GitHub, paper, Hugging Face), well-structured. However, versioning and metadata could be more standardized to fully meet FAIR criteria.

Metrics

5.00

Uses clear, quantitative metric (accuracy), standard for multiple-choice benchmarks; easily comparable across models.

Reference Solution

0.00

No reference solution mentioned.

Documentation

4.00

Paper is available. Evaluation criteria are not mentioned.

Average rating: 3.50/5

Radar plot

$MedQA radar$

Edit: edit this entry