Unveiling the power of language models in chemical research question answering

被引:0
|
作者
Chen, Xiuying [1 ,2 ]
Wang, Tairan [2 ]
Guo, Taicheng [3 ]
Guo, Kehan [3 ]
Zhou, Juexiao [2 ]
Li, Haoyang [2 ]
Song, Zirui [1 ]
Gao, Xin [2 ]
Zhang, Xiangliang [2 ,3 ]
机构
[1] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[2] King Abdullah Univ Sci & Technol, Jeddah, Saudi Arabia
[3] Univ Notre Dame, Notre Dame, IN USA
来源
COMMUNICATIONS CHEMISTRY | 2025年 / 8卷 / 01期
关键词
D O I
10.1038/s42004-024-01394-x
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
While the abilities of language models are thoroughly evaluated in areas like general domains and biomedicine, academic chemistry remains less explored. Chemical QA tools also play a crucial role in both education and research by effectively translating complex chemical information into an understandable format. Addressing this gap, we introduce ScholarChemQA, a large-scale QA dataset constructed from chemical papers. Specifically, the questions are from paper titles with a question mark, and the multi-choice answers are reasoned out based on the corresponding abstracts. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. Correspondingly, we introduce a ChemMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data. Experiments show that Large Language Models (LLMs) still have significant room for improvement in the field of chemistry. Moreover, ChemMatch significantly outperforms recent similar-scale baselines: https://github.com/iriscxy/chemmatch.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] A Bidirectional Question-Answering System using Large Language Models and Knowledge Graphs
    Han, Lifan
    Wang, Xin
    Li, Zhao
    Zhang, Heyi
    Chen, Zirui
    WEB AND BIG DATA, APWEB-WAIM 2023 INTERNATIONAL WORKSHOPS-KGMA 2023 AND SEMIBDMA 2023, 2024, 2094 : 3 - 10
  • [42] MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering
    Vladika, Juraj
    Schneider, Phillip
    Matthes, Florian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 14459 - 14469
  • [43] Semantic Parsing for Question and Answering over Scholarly Knowledge Graph with Large Language Models
    Le-Minh Nguyen
    Le-Nguyen Khang
    Kieu Que Anh
    Nguyen Dieu Hien
    Nagai, Yukari
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2024, 2024, 14741 : 284 - 298
  • [44] PSYCHOLOGICAL-RESEARCH ON QUESTION ANSWERING AND QUESTION ASKING
    GRAESSER, AC
    DISCOURSE PROCESSES, 1990, 13 (03) : 259 - 260
  • [45] Developing a question answering system for the Slovene language
    Čeh, Ines
    Ojsteršek, Milan
    WSEAS Transactions on Information Science and Applications, 2009, 6 (09): : 1533 - 1543
  • [46] A Novel Question Answering System for Albanian Language
    Trandafili, Evis
    Mece, Elinda Kajo
    Kica, Kristjan
    Paci, Hakik
    ADVANCES IN INTERNET, DATA & WEB TECHNOLOGIES, 2018, 17 : 514 - 524
  • [47] Precisiating Natural Language for a question answering system
    Thint, Marcus
    Beg, M. M. Sufyan
    Qin, Zengehang
    WMSCI 2007: 11TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL I, PROCEEDINGS, 2007, : 165 - +
  • [48] Advancing Faithfulness of Large Language Models in Goal-Oriented Dialogue Question Answering
    Sticha, Abigail
    Braunschweiler, Norbert
    Doddipatla, Rama
    Knill, Kate
    PROCEEDINGS OF THE 6TH CONFERENCE ON ACM CONVERSATIONAL USER INTERFACES, CUI 2024, 2024,
  • [49] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
    Yang, Antoine
    Miech, Antoine
    Sivic, Josef
    Laptev, Ivan
    Schmid, Cordelia
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [50] A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19
    Oniani, David
    Wang, Yanshan
    ACM-BCB 2020 - 11TH ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2020,