What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks

被引：0

作者：

Guo, Taicheng ^{[1
]}

Guo, Kehan ^{[1
]}

Nan, Bozhao ^{[1
]}

Liang, Zhenwen ^{[1
]}

Guo, Zhichun ^{[1
]}

Chawla, Nitesh V. ^{[1
]}

Wiest, Olaf ^{[1
]}

Zhang, Xiangliang ^{[1
]}

机构：

[1] Univ Notre Dame, Notre Dame, IN 46556 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

美国国家科学基金会;

关键词：

GENERATION; SMILES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged and have been applied in various kinds of areas such as science, finance and software engineering. However, the capability of LLMs to advance the field of chemistry remains unclear. In this paper, rather than pursuing state-of-the-art performance, we aim to evaluate capabilities of LLMs in a wide range of tasks across the chemistry domain. We identify three key chemistry-related capabilities including understanding, reasoning and explaining to explore in LLMs and establish a benchmark containing eight chemistry tasks. Our analysis draws on widely recognized datasets facilitating a broad exploration of the capacities of LLMs within the context of practical chemistry. Five LLMs (GPT-4, GPT-3.5, Davinci-003, Llama and Galactica) are evaluated for each chemistry task in zero-shot and few-shot in-context learning settings with carefully selected demonstration examples and specially crafted prompts. Our investigation found that GPT-4 outperformed other models and LLMs exhibit different competitive levels in eight chemistry tasks. In addition to the key findings from the comprehensive benchmark analysis, our work provides insights into the limitation of current LLMs and the impact of in-context learning settings on LLMs' performance across various chemistry tasks. The code and datasets used in this study are available at https://github.com/ChemFoundationModels/ChemLLMBench.

引用

页数：27

共 50 条

[41] The Two Word Test as a semantic benchmark for large language models
Riccardi, Nicholas
Yang, Xuan
Desai, Rutvik H.
SCIENTIFIC REPORTS, 2024, 14 (01):
[42] Establishing vocabulary tests as a benchmark for evaluating large language models
Martinez, Gonzalo
Conde, Javier
Merino-Gomez, Elena
Bermudez-Margaretto, Beatriz
Hernandez, Jose Alberto
Reviriego, Pedro
Brysbaert, Marc
PLOS ONE, 2024, 19 (12):
[43] Why Personalized Large Language Models Fail to Do What Ethics is All About
Laacke, Sebastian
Gauckler, Charlotte
AMERICAN JOURNAL OF BIOETHICS, 2023, 23 (10): : 60 - 63
[44] Leveraging large language models for predictive chemistry
Kevin Maik Jablonka
Philippe Schwaller
Andres Ortega-Guerrero
Berend Smit
Nature Machine Intelligence, 2024, 6 : 161 - 169
[45] HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models
Li, Junyi
Cheng, Xiaoxue
Zhao, Wayne Xin
Nie, Jian-Yun
Wen, Ji-Rong
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 6449 - 6464
[46] Augmenting large language models with chemistry tools
Bran, Andres M.
Cox, Sam
Schilter, Oliver
Baldassari, Carlo
White, Andrew D.
Schwaller, Philippe
NATURE MACHINE INTELLIGENCE, 2024, 6 (05) : 525 - 535
[47] Leveraging large language models for predictive chemistry
Jablonka, Kevin Maik
Schwaller, Philippe
Ortega-Guerrero, Andres
Smit, Berend
NATURE MACHINE INTELLIGENCE, 2024, 6 (02) : 122 - 123
[48] Can Large Language Models Replace Therapists? Evaluating Performance at Simple Cognitive Behavioral Therapy Tasks
Hodson, Nathan
Williamson, Simon
JMIR AI, 2024, 3
[49] What Do Language Models Hear? Probing for Auditory Representations in Language Models
Ngo, Jerry
Kim, Yoon
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 5435 - 5448
[50] Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Kumar, Sachin
Balachandran, Vidhisha
Njoo, Lucille
Anastasopoulos, Antonios
Tsvetkov, Yulia
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3299 - 3321

← 1 2 3 4 5 →