An Accurate and Efficient Approach to Knowledge Extraction from Scientific Publications Using Structured Ontology Models, Graph Neural Networks, and Large Language Models

被引:1
|
作者
Ivanisenko, Timofey V. [1 ,2 ]
Demenkov, Pavel S. [1 ,2 ]
Ivanisenko, Vladimir A. [1 ,2 ]
机构
[1] Novosibirsk State Univ, Artificial Intelligence Res Ctr, Pirogova St 1, Novosibirsk 630090, Russia
[2] Russian Acad Sci, Siberian Branch, Inst Cytol & Genet, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
关键词
text-mining; ANDSystem; deep learning; GNN; LLM; knowledge graph; FUNCTIONAL MODULES; GENE NETWORKS; RECONSTRUCTION; ASSOCIATION; COMPLEXES; SEROTONIN; BIOLOGY; BINDING; SYSTEMS; SLEEP;
D O I
10.3390/ijms252111811
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The rapid growth of biomedical literature makes it challenging for researchers to stay current. Integrating knowledge from various sources is crucial for studying complex biological systems. Traditional text-mining methods often have limited accuracy because they don't capture semantic and contextual nuances. Deep-learning models can be computationally expensive and typically have low interpretability, though efforts in explainable AI aim to mitigate this. Furthermore, transformer-based models have a tendency to produce false or made-up information-a problem known as hallucination-which is especially prevalent in large language models (LLMs). This study proposes a hybrid approach combining text-mining techniques with graph neural networks (GNNs) and fine-tuned large language models (LLMs) to extend biomedical knowledge graphs and interpret predicted edges based on published literature. An LLM is used to validate predictions and provide explanations. Evaluated on a corpus of experimentally confirmed protein interactions, the approach achieved a Matthews correlation coefficient (MCC) of 0.772. Applied to insomnia, the approach identified 25 interactions between 32 human proteins absent in known knowledge bases, including regulatory interactions between MAOA and 5-HT2C, binding between ADAM22 and 14-3-3 proteins, which is implicated in neurological diseases, and a circadian regulatory loop involving RORB and NR1D1. The hybrid GNN-LLM method analyzes biomedical literature efficiency to uncover potential molecular interactions for complex disorders. It can accelerate therapeutic target discovery by focusing expert verification on the most relevant automatically extracted information.
引用
收藏
页数:27
相关论文
共 50 条
  • [21] Bridging Domains in Chronic Lower Back Pain: Large Language Models and Ontology-Driven Strategies for Knowledge Graph Construction
    Anderson, Paul
    Lin, Damon
    Davidson, Jean
    Migler, Theresa
    Ho, Iris
    Koenig, Cooper
    Bittner, Madeline
    Kaplan, Samuel
    Paraiso, Mayumi
    Buhn, Nasreen
    Stokes, Emily
    Hunt, C. Anthony
    Ropella, Glen
    Lotz, Jeffrey
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING, PT II, IWBBIO 2024, 2024, 14849 : 14 - 30
  • [22] Queryfy: from knowledge graphs to questions using open Large Language Models
    Brei, Felix
    Meyer, Lars-Peter
    Martin, Michael
    IT-INFORMATION TECHNOLOGY, 2025,
  • [23] Building footprint extraction from Digital Surface Models using Neural Networks
    Davydova, Ksenia
    Cui, Shiyong
    Reinartz, Peter
    IMAGE AND SIGNAL PROCESSING FOR REMOTE SENSING XXII, 2016, 10004
  • [24] Towards Minimal Edits in Automated Program Repair: A Hybrid Framework Integrating Graph Neural Networks and Large Language Models
    Xu, Zhenyu
    Sheng, Victor S.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT V, 2024, 15020 : 402 - 416
  • [25] Hybrid-LLM-GNN: integrating large language models and graph neural networks for enhanced materials property prediction
    Li, Youjia
    Gupta, Vishu
    Kilic, Muhammed Nur Talha
    Choudhary, Kamal
    Wines, Daniel
    Liao, Wei-keng
    Choudhary, Alok
    Agrawal, Ankit
    DIGITAL DISCOVERY, 2025, 4 (02): : 376 - 383
  • [26] KRAGEN: a knowledge graph-enhanced RAG framework for biomedical problem solving using large language models
    Matsumoto, Nicholas
    Moran, Jay
    Choi, Hyunjun
    Hernandez, Miguel E.
    Venkatesan, Mythreye
    Wang, Paul
    Moore, Jason H.
    BIOINFORMATICS, 2024, 40 (06)
  • [27] Parameter-efficient fine-tuning of large language models using semantic knowledge tuning
    Prottasha, Nusrat Jahan
    Mahmud, Asif
    Sobuj, Md. Shohanur Islam
    Bhat, Prakash
    Kowsher, Md
    Yousefi, Niloofar
    Garibay, Ozlem Ozmen
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [28] ProtoCode: Leveraging large language models (LLMs) for automated generation of machine-readable PCR protocols from scientific publications
    Jiang, Shuo
    Evans-Yamamoto, Daniel
    Bersenev, Dennis
    Palaniappan, Sucheendra K.
    Yachie-Kinoshita, Ayako
    SLAS TECHNOLOGY, 2024, 29 (03): : 100134
  • [29] Event Extraction and Semantic Representation from Spanish Workers' Statute Using Large Language Models
    Terron, Gabriela Arguelles
    Chozas, Patricia Martin
    Doncel, Victor Rodriguez
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 379 : 329 - 334
  • [30] A case study for automated attribute extraction from legal documents using large language models
    Adhikary, Subinay
    Sen, Procheta
    Roy, Dwaipayan
    Ghosh, Kripabandhu
    ARTIFICIAL INTELLIGENCE AND LAW, 2024,