What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks

被引：0

作者：

Guo, Taicheng ^{[1
]}

Guo, Kehan ^{[1
]}

Nan, Bozhao ^{[1
]}

Liang, Zhenwen ^{[1
]}

Guo, Zhichun ^{[1
]}

Chawla, Nitesh V. ^{[1
]}

Wiest, Olaf ^{[1
]}

Zhang, Xiangliang ^{[1
]}

机构：

[1] Univ Notre Dame, Notre Dame, IN 46556 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

美国国家科学基金会;

关键词：

GENERATION; SMILES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged and have been applied in various kinds of areas such as science, finance and software engineering. However, the capability of LLMs to advance the field of chemistry remains unclear. In this paper, rather than pursuing state-of-the-art performance, we aim to evaluate capabilities of LLMs in a wide range of tasks across the chemistry domain. We identify three key chemistry-related capabilities including understanding, reasoning and explaining to explore in LLMs and establish a benchmark containing eight chemistry tasks. Our analysis draws on widely recognized datasets facilitating a broad exploration of the capacities of LLMs within the context of practical chemistry. Five LLMs (GPT-4, GPT-3.5, Davinci-003, Llama and Galactica) are evaluated for each chemistry task in zero-shot and few-shot in-context learning settings with carefully selected demonstration examples and specially crafted prompts. Our investigation found that GPT-4 outperformed other models and LLMs exhibit different competitive levels in eight chemistry tasks. In addition to the key findings from the comprehensive benchmark analysis, our work provides insights into the limitation of current LLMs and the impact of in-context learning settings on LLMs' performance across various chemistry tasks. The code and datasets used in this study are available at https://github.com/ChemFoundationModels/ChemLLMBench.

引用

页数：27

共 50 条

[21] Large language models for chemistry robotics
Naruki Yoshikawa
Marta Skreta
Kourosh Darvish
Sebastian Arellano-Rubach
Zhi Ji
Lasse Bjørn Kristensen
Andrew Zou Li
Yuchi Zhao
Haoping Xu
Artur Kuramshin
Alán Aspuru-Guzik
Florian Shkurti
Animesh Garg
Autonomous Robots, 2023, 47 : 1057 - 1086
[22] Large language models for chemistry robotics
Yoshikawa, Naruki
Skreta, Marta
Darvish, Kourosh
Arellano-Rubach, Sebastian
Ji, Zhi
Kristensen, Lasse Bjorn
Li, Andrew Zou
Zhao, Yuchi
Xu, Haoping
Kuramshin, Artur
Aspuru-Guzik, Alan
Shkurti, Florian
Garg, Animesh
AUTONOMOUS ROBOTS, 2023, 47 (08) : 1057 - 1086
[23] Large language models for reticular chemistry
Rampal, Nakul
Inizan, Theo Jaffrelot
Borgs, Christian
Chayes, Jennifer T.
Yaghi, Omar M.
NATURE REVIEWS MATERIALS, 2025,
[24] MMAD: THE FIRST-EVER COMPREHENSIVE BENCHMARK FOR MULTIMODAL LARGE LANGUAGE MODELS IN INDUSTRIAL ANOMALY DETECTION
Southern University of Science and Technology, China
不详
不详
不详
arXiv,
[25] CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models
Lyu, Yuanjie
Li, Zhiyu
Niu, Simin
Xiong, Feiyu
Tang, Bo
Wang, Wenjin
Wu, Hao
Liu, Huanyong
Xu, Tong
Chen, Enhong
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (02)
[26] Transmission Versus Truth, Imitation Versus Innovation: What Children Can Do That Large Language and Language-and-Vision Models Cannot (Yet)
Yiu, Eunice
Kosoy, Eliza
Gopnik, Alison
PERSPECTIVES ON PSYCHOLOGICAL SCIENCE, 2024, 19 (05) : 874 - 883
[27] Language Models Do Hard Arithmetic Tasks Easily and Hardly Do Easy Arithmetic Tasks
Gambardella, Andrew
Iwasawa, Yusuke
Matsuo, Yutaka
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 85 - 91
[28] A comprehensive survey of large language models and multimodal large models in medicine
Xiao, Hanguang
Zhou, Feizhong
Liu, Xingyue
Liu, Tianqi
Li, Zhipeng
Liu, Xin
Huang, Xiaoxuan
INFORMATION FUSION, 2025, 117
[29] TGEA 2.0: A Large-Scale Diagnostically Annotated Dataset with Benchmark Tasks for Text Generation of Pretrained Language Models
Ge, Huibin
Zhao, Xiaohu
Liu, Chuang
Zeng, Yulong
Liu, Qun
Xiong, Deyi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[30] Sources of Hallucination by Large Language Models on Inference Tasks
McKenna, Nick
Li, Tianyi
Cheng, Liang
Hosseini, Mohammad Javad
Johnson, Mark
Steedman, Mark
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2758 - 2774

← 1 2 3 4 5 →