Skip to content

LLMs for Crop Science

← Back to all benchmarks

Date: 2024-11-13

Name: LLMs for Crop Science

Domain: Climate & Earth Science

Focus: Evaluating LLMs on crop trait QA and textual inference tasks with domain-specific prompts

Task Types: Question Answering, Inference

Metrics: Accuracy, F1 score

Models: GPT-3.5, GPT-4, Claude-3-opus, Qwen-max, LLama3-8B, InternLM2-7B, Qwen1.5-7B

AI/ML Motif: Reasoning & Generalization

Keywords

Citation

  • Hang Zhang, Jiawei Sun, Renqi Chen, Wei Liu, Zhonghang Yuan, Xinzhe Zheng, Zhefan Wang, Zhiyuan Yang, Hang Yan, Han-Sen Zhong, Xiqing Wang, Wanli Ouyang, Fan Yang, and Nanqing Dong. Empowering and assessing the utility of large language models in crop science. In The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track. 2024. URL: https://openreview.net/forum?id=hMj6jZ6JWU.
@inproceedings{zhang2024empowering,
title={Empowering and Assessing the Utility of Large Language Models in Crop Science},
author={Hang Zhang and Jiawei Sun and Renqi Chen and Wei Liu and Zhonghang Yuan and Xinzhe Zheng and Zhefan Wang and Zhiyuan Yang and Hang Yan and Han-Sen Zhong and Xiqing Wang and Wanli Ouyang and Fan Yang and Nanqing Dong},
booktitle={The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2024},
url={https://openreview.net/forum?id=hMj6jZ6JWU}
}

Ratings

CategoryRating
Software
5.00
Code for evaluation and training of multiple models is available and well documented. Environment details are provided.
Specification
4.00
Tasks are clearly defined (QA, inference) with structured input/output formats, though no system constraints are provided.
Dataset
5.00
Dataset adheres to all FAIR principles, is well-documented, and publicly available on Hugging Face. Train/Test splits are provided across two Huggingface datasets.
Metrics
4.00
Accuracy is mentioned in the README and webpage as an evaluation metric,
Reference Solution
5.00
A reference solution is available and well documented. Training code is provided for multiple open weight models.
Documentation
5.00
The benchmark is well documented with a detailed paper, README, and webpage. Instructions for reproducing results are clear.
Average rating: 4.67/5

Radar plot

LLMs for Crop Science radar

Edit: edit this entry