What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks

被引:0
|
作者
Guo, Taicheng [1 ]
Guo, Kehan [1 ]
Nan, Bozhao [1 ]
Liang, Zhenwen [1 ]
Guo, Zhichun [1 ]
Chawla, Nitesh V. [1 ]
Wiest, Olaf [1 ]
Zhang, Xiangliang [1 ]
机构
[1] Univ Notre Dame, Notre Dame, IN 46556 USA
基金
美国国家科学基金会;
关键词
GENERATION; SMILES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged and have been applied in various kinds of areas such as science, finance and software engineering. However, the capability of LLMs to advance the field of chemistry remains unclear. In this paper, rather than pursuing state-of-the-art performance, we aim to evaluate capabilities of LLMs in a wide range of tasks across the chemistry domain. We identify three key chemistry-related capabilities including understanding, reasoning and explaining to explore in LLMs and establish a benchmark containing eight chemistry tasks. Our analysis draws on widely recognized datasets facilitating a broad exploration of the capacities of LLMs within the context of practical chemistry. Five LLMs (GPT-4, GPT-3.5, Davinci-003, Llama and Galactica) are evaluated for each chemistry task in zero-shot and few-shot in-context learning settings with carefully selected demonstration examples and specially crafted prompts. Our investigation found that GPT-4 outperformed other models and LLMs exhibit different competitive levels in eight chemistry tasks. In addition to the key findings from the comprehensive benchmark analysis, our work provides insights into the limitation of current LLMs and the impact of in-context learning settings on LLMs' performance across various chemistry tasks. The code and datasets used in this study are available at https://github.com/ChemFoundationModels/ChemLLMBench.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] A comprehensive evaluation of large Language models on benchmark biomedical text processing tasks
    Jahan, Israt
    Laskar, Md Tahmid Rahman
    Peng, Chun
    Huang, Jimmy Xiangji
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 171
  • [2] What can you do with a large language model?
    Bakken, Suzanne
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (06) : 1217 - 1218
  • [3] WHAT CAN BIOTECHNOLOGY DO FOR CHEMISTRY - WHAT CAN CHEMISTRY DO FOR BIOTECHNOLOGY
    LEUENBERGER, HGW
    CHIMIA, 1993, 47 (04) : 67 - 68
  • [4] Large language models: What could they do for neurology?
    Lajoie, Guillaume
    JOURNAL OF THE NEUROLOGICAL SCIENCES, 2023, 455
  • [5] Large Language Models Can Accomplish Business Process Management Tasks
    Grohs, Michael
    Abb, Luka
    Elsayed, Nourhan
    Rehse, Jana-Rebecca
    BUSINESS PROCESS MANAGEMENT WORKSHOPS, BPM 2023, 2024, 492 : 453 - 465
  • [6] Do Large Language Models Understand Chemistry? A Conversation with ChatGPT
    Nascimento, Cayque Monteiro Castro
    Pimentel, Andre Silva
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2023, 63 (06) : 1649 - 1655
  • [7] What do Users Really Ask Large Language Models?
    Trippas, Johanne R.
    Al Lawati, Sara Fahad Dawood
    Mackenzie, Joel
    Gallagher, Luke
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2703 - 2707
  • [8] Do Large Language Models Know What Humans Know?
    Trott, Sean
    Jones, Cameron
    Chang, Tyler
    Michaelov, James
    Bergen, Benjamin
    COGNITIVE SCIENCE, 2023, 47 (07)
  • [9] CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models
    Shi, Dan
    You, Chaobin
    Huang, Jiantao
    Li, Taihao
    Xiong, Deyi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18952 - 18960
  • [10] LVLM-EHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
    Xu, Peng
    Shao, Wenqi
    Zhang, Kaipeng
    Gao, Peng
    Liu, Shuo
    Lei, Meng
    Meng, Fanqing
    Huang, Siyuan
    Qiao, Yu
    Luo, Ping
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 1877 - 1893