DRAK: Unlocking Molecular Insights with Domain-Specific Retrieval-Augmented Knowledge in LLMs

被引：0

作者：

Liu, Jinzhe ^{[1
,2
]}

Huang, Xiangsheng ^{[3
]}

Chen, Zhuo ^{[4
]}

Fang, Yin ^{[4
]}

机构：

[1] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Beijing, Peoples R China

[3] Chinese Acad Sci, Xiongan Inst Innovat, Hebei Key Lab Cognit Intelligence, Baoding, Peoples R China

[4] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Peoples R China

来源：

NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024 | 2025年 / 15360卷

关键词：

Retrieval-augmented knowledge; Knowledge injection; Biomolecular domain; LANGUAGE;

D O I：

10.1007/978-981-97-9434-8_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large Language Models (LLMs) typically manifest knowledge gap in specialized applications due to pre-training on generalized textual corpora. Although fine-tuning and modality alignment aim to bridge this gap, their inability to provide comprehensive knowledge coverage leads to LLMs delivering imprecise responses. To address these challenges, we introduce a scalable and adaptable non-parametric knowledge injection framework, Domain-specific Retrieval-Augmented Knowledge (DRAK), aimed at bolstering LLMs' knowledge reasoning ability through context examples. DRAK integrates retrieval enhancement and structured knowledge graph recall of high-quality instances, utilizing retrieved examples to unlock LLMs' context-relevant molecular learning capabilities, offering a universal solution for specific domains. Our validation of DRAK's effectiveness and generalizability in the biomolecular domain, achieving superior performance across twelve tasks involving both molecule-oriented and bioinformatics texts within the Mol-Instructions dataset. This demonstration of DRAK's ability to unearth molecular insights establishes a standardized approach for LLMs in navigating the complexities of knowledge-intensive challenges.

引用

页码：255 / 267

页数：13

共 50 条

[1] CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge
Tihanyi, Norbert
Ferrag, Mohamed Amine
Jain, Ridhi
Bisztray, Tamas
Debbah, Merouane
2024 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE, CSR, 2024, : 296 - 302
[2] Empowering LLMs by hybrid retrieval-augmented generation for domain-centric Q&A in smart manufacturing
Wan, Yuwei
Chen, Zheyuan
Liu, Ying
Chen, Chong
Packianather, Michael
ADVANCED ENGINEERING INFORMATICS, 2025, 65
[3] Retrieval-augmented Generation across Heterogeneous Knowledge
Yu, Wenhao
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 52 - 58
[4] Learning Customized Visual Models with Retrieval-Augmented Knowledge
Liu, Haotian
Son, Kilho
Yang, Jianwei
Liu, Ce
Gao, Jianfeng
Lee, Yong Jae
Li, Chunyuan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15148 - 15158
[5] Systematic Analysis of Retrieval-Augmented Generation-Based LLMs for Medical Chatbot Applications
Bora, Arunabh
Cuayahuitl, Heriberto
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (04): : 2355 - 2374
[6] Domain-Specific Manufacturing Analytics Framework: An Integrated Architecture with Retrieval-Augmented Generation and Ollama-Based Models for Manufacturing Execution Systems Environments
Choi, Hangseo
Jeong, Jongpil
PROCESSES, 2025, 13 (03)
[7] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Lewis, Patrick
Perez, Ethan
Piktus, Aleksandra
Petroni, Fabio
Karpukhin, Vladimir
Goyal, Naman
Kuttler, Heinrich
Lewis, Mike
Yih, Wen-tau
Rocktaschel, Tim
Riedel, Sebastian
Kiela, Douwe
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[8] GenUI(ne) CRS: UI Elements and Retrieval-Augmented Generation in Conversational Recommender Systems with LLMs
Maes, Ulysse
Michiels, Lien
Smets, Annelien
PROCEEDINGS OF THE EIGHTEENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2024, 2024, : 1177 - 1179
[9] Decoupling Knowledge from Memorization: Retrieval-augmented Prompt Learning
Chen, Xiang
Li, Lei
Zhang, Ningyu
Liang, Xiaozhuan
Deng, Shumin
Tan, Chuanqi
Huang, Fei
Si, Luo
Chen, Huajun
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[10] Generating Test Scenarios from NL Requirements using Retrieval-Augmented LLMs: An Industrial Study
Arora, Chetan
Herda, Tomas
Homm, Verena
32ND IEEE INTERNATIONAL REQUIREMENTS ENGINEERING CONFERENCE, RE 2024, 2024, : 240 - 251

← 1 2 3 4 5 →