Skip to content

SPIQA (LLM)

← Back to all benchmarks

Date: 2024-12-13

Name: SPIQA LLM

Domain: Computational Science & AI

Focus: Evaluating LLMs on image-based scientific paper figure QA tasks LLM Adapter performance

Task Types: Multimodal QA

Metrics: Accuracy, F1 score

Models: LLaVA, MiniGPT-4, Owl-LLM adapter variants

AI/ML Motif: Multimodal Reasoning

Resources

Benchmark: Visit

Keywords

Citation

  • Shraman Pramanick, Rama Chellappa, and Subhashini Venugopalan. Spiqa: a dataset for multimodal question answering on scientific papers. 2025. URL: https://arxiv.org/abs/2407.09413, arXiv:2407.09413.
@misc{pramanick2025spiqadatasetmultimodalquestion,
  title={SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers}, 
  author={Shraman Pramanick and Rama Chellappa and Subhashini Venugopalan},
  year={2025},
  eprint={2407.09413},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2407.09413}, 
}

Ratings

CategoryRating
Software
5.00
Well-documented codebase available on Github
Specification
3.50
Task of QA over scientific figures is sufficient but not fully formalized in input/output terms. No hawrdware constraints.
Dataset
5.00
Full dataset available on Hugging Face with train/test/valid splits.
Metrics
4.00
Reports accuracy and F1; fair but no visual reasoning-specific metric.
Reference Solution
4.00
10 LLM adapter baselines; results included without constraints.
Documentation
5.00
Full paper available
Average rating: 4.42/5

Radar plot

SPIQA (LLM) radar

Edit: edit this entry