Unveiling the power of language models in chemical research question answering

被引:0
|
作者
Chen, Xiuying [1 ,2 ]
Wang, Tairan [2 ]
Guo, Taicheng [3 ]
Guo, Kehan [3 ]
Zhou, Juexiao [2 ]
Li, Haoyang [2 ]
Song, Zirui [1 ]
Gao, Xin [2 ]
Zhang, Xiangliang [2 ,3 ]
机构
[1] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[2] King Abdullah Univ Sci & Technol, Jeddah, Saudi Arabia
[3] Univ Notre Dame, Notre Dame, IN USA
来源
COMMUNICATIONS CHEMISTRY | 2025年 / 8卷 / 01期
关键词
D O I
10.1038/s42004-024-01394-x
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
While the abilities of language models are thoroughly evaluated in areas like general domains and biomedicine, academic chemistry remains less explored. Chemical QA tools also play a crucial role in both education and research by effectively translating complex chemical information into an understandable format. Addressing this gap, we introduce ScholarChemQA, a large-scale QA dataset constructed from chemical papers. Specifically, the questions are from paper titles with a question mark, and the multi-choice answers are reasoned out based on the corresponding abstracts. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. Correspondingly, we introduce a ChemMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data. Experiments show that Large Language Models (LLMs) still have significant room for improvement in the field of chemistry. Moreover, ChemMatch significantly outperforms recent similar-scale baselines: https://github.com/iriscxy/chemmatch.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] UniGen: A Unified Generative Framework for Retrieval and Question Answering with Large Language Models
    Li, Xiaoxi
    Zhou, Yujia
    Dou, Zhicheng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 8, 2024, : 8688 - 8696
  • [32] Open-Domain Question Answering over Tables with Large Language Models
    Liang, Xinyi
    Hu, Rui
    Liu, Yu
    Zhu, Konglin
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 347 - 358
  • [33] Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering
    Chaturvedi, Akshay
    Bhar, Swarnadeep
    Saha, Soumadeep
    Garain, Utpal
    Asher, Nicholas
    COMPUTATIONAL LINGUISTICS, 2023, 50 (01) : 119 - 155
  • [34] Large Language Models for Scientific Question Answering: An Extensive Analysis of the SciQA Benchmark
    Lehmann, Jens
    Meloni, Antonello
    Motta, Enrico
    Osborne, Francesco
    Recupero, Diego Reforgiato
    Salatino, Angelo Antonio
    Vandati, Sahar
    SEMANTIC WEB, PT I, ESWC 2024, 2024, 14664 : 199 - 217
  • [35] Research on Engineering Management Question-answering System in the Communication Industry Based on Large Language Models and Knowledge Graphs
    Jiang, Yingdi
    Yao, Jiarui
    Li, Fangfei
    Zhang, Yan
    PROCEEDINGS OF THE 2024 THE 7TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, ICMVA 2024, 2024, : 100 - 105
  • [36] Knowledge Graph Enhancement for Improved Natural Language Health Question Answering using Large Language Models
    Jamil, Hasan M.
    Oduro-Afriyie, Joel
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT 36TH INTERNATIONAL CONFERENCE, SSDBM 2024, 2024,
  • [37] Research on question retrieval method for community question answering
    Sun, Yong
    Song, Junfang
    Song, Xiangyu
    Hou, Jiazheng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (16) : 24309 - 24325
  • [38] LAS: Language Agnostic System for Question Answering
    Basaj, Dominika
    Rychalska, Barbara
    Wroblewska, Anna
    2018 FIFTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2018, : 260 - 263
  • [39] Experimenting with a question answering system for the Arabic language
    Hammo, B
    Abuleil, S
    Lytinen, S
    Evens, M
    COMPUTERS AND THE HUMANITIES, 2004, 38 (04): : 397 - 415
  • [40] HybridPrompt: Bridging Language Models and Human Priors in Prompt Tuning for Visual Question Answering
    Ma, Zhiyuan
    Yu, Zhihuan
    Li, Jianjun
    Li, Guohui
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13371 - 13379