An Accurate and Efficient Approach to Knowledge Extraction from Scientific Publications Using Structured Ontology Models, Graph Neural Networks, and Large Language Models

被引:1
|
作者
Ivanisenko, Timofey V. [1 ,2 ]
Demenkov, Pavel S. [1 ,2 ]
Ivanisenko, Vladimir A. [1 ,2 ]
机构
[1] Novosibirsk State Univ, Artificial Intelligence Res Ctr, Pirogova St 1, Novosibirsk 630090, Russia
[2] Russian Acad Sci, Siberian Branch, Inst Cytol & Genet, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
关键词
text-mining; ANDSystem; deep learning; GNN; LLM; knowledge graph; FUNCTIONAL MODULES; GENE NETWORKS; RECONSTRUCTION; ASSOCIATION; COMPLEXES; SEROTONIN; BIOLOGY; BINDING; SYSTEMS; SLEEP;
D O I
10.3390/ijms252111811
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The rapid growth of biomedical literature makes it challenging for researchers to stay current. Integrating knowledge from various sources is crucial for studying complex biological systems. Traditional text-mining methods often have limited accuracy because they don't capture semantic and contextual nuances. Deep-learning models can be computationally expensive and typically have low interpretability, though efforts in explainable AI aim to mitigate this. Furthermore, transformer-based models have a tendency to produce false or made-up information-a problem known as hallucination-which is especially prevalent in large language models (LLMs). This study proposes a hybrid approach combining text-mining techniques with graph neural networks (GNNs) and fine-tuned large language models (LLMs) to extend biomedical knowledge graphs and interpret predicted edges based on published literature. An LLM is used to validate predictions and provide explanations. Evaluated on a corpus of experimentally confirmed protein interactions, the approach achieved a Matthews correlation coefficient (MCC) of 0.772. Applied to insomnia, the approach identified 25 interactions between 32 human proteins absent in known knowledge bases, including regulatory interactions between MAOA and 5-HT2C, binding between ADAM22 and 14-3-3 proteins, which is implicated in neurological diseases, and a circadian regulatory loop involving RORB and NR1D1. The hybrid GNN-LLM method analyzes biomedical literature efficiency to uncover potential molecular interactions for complex disorders. It can accelerate therapeutic target discovery by focusing expert verification on the most relevant automatically extracted information.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Structured information extraction from scientific text with large language models
    John Dagdelen
    Alexander Dunn
    Sanghoon Lee
    Nicholas Walker
    Andrew S. Rosen
    Gerbrand Ceder
    Kristin A. Persson
    Anubhav Jain
    Nature Communications, 15
  • [2] Structured information extraction from scientific text with large language models
    Dagdelen, John
    Dunn, Alexander
    Lee, Sanghoon
    Walker, Nicholas
    Rosen, Andrew S.
    Ceder, Gerbrand
    Persson, Kristin A.
    Jain, Anubhav
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [3] Accelerating knowledge graph and ontology engineering with large language models
    Shimizu, Cogan
    Hitzler, Pascal
    JOURNAL OF WEB SEMANTICS, 2025, 85
  • [4] A review of graph neural networks and pretrained language models for knowledge graph reasoning
    Ma, Jiangtao
    Liu, Bo
    Li, Kunlin
    Li, Chenliang
    Zhang, Fan
    Luo, Xiangyang
    Qiao, Yaqiong
    NEUROCOMPUTING, 2024, 609
  • [5] Knowledge extraction from artificial neural networks models
    Boger, Z
    Guterman, H
    SMC '97 CONFERENCE PROCEEDINGS - 1997 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: CONFERENCE THEME: COMPUTATIONAL CYBERNETICS AND SIMULATION, 1997, : 3030 - 3035
  • [6] Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph
    Nechakhin, Vladyslav
    D'Souza, Jennifer
    Eger, Steffen
    INFORMATION, 2024, 15 (06)
  • [7] Accelerating Neural Networks for Large Language Models and Graph Processing with Silicon Photonics
    Afifi, Salma
    Sunny, Febin
    Nikdast, Mandi
    Pasricha, Sudeep
    2024 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2024,
  • [8] Large language models recover scientific collaboration networks from text
    Jeyaram, Rathin
    Ward, Robert N.
    Santolini, Marc
    APPLIED NETWORK SCIENCE, 2024, 9 (01)
  • [9] Investigations on Scientific Literature Meta Information Extraction Using Large Language Models
    Guo, Menghao
    Wu, Fan
    Jiang, Jinling
    Yan, Xiaoran
    Chen, Guangyong
    Li, Wenhui
    Zhao, Yunhong
    Sun, Zeyi
    2023 IEEE INTERNATIONAL CONFERENCE ON KNOWLEDGE GRAPH, ICKG, 2023, : 249 - 254
  • [10] Comparative Analysis of Large Language Models in Structured Information Extraction from Job Postings
    Sioziou, Kyriaki
    Zervas, Panagiotis
    Giotopoulos, Kostas
    Tzimas, Giannis
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2024, 2024, 2141 : 82 - 92