An Accurate and Efficient Approach to Knowledge Extraction from Scientific Publications Using Structured Ontology Models, Graph Neural Networks, and Large Language Models

被引:1
|
作者
Ivanisenko, Timofey V. [1 ,2 ]
Demenkov, Pavel S. [1 ,2 ]
Ivanisenko, Vladimir A. [1 ,2 ]
机构
[1] Novosibirsk State Univ, Artificial Intelligence Res Ctr, Pirogova St 1, Novosibirsk 630090, Russia
[2] Russian Acad Sci, Siberian Branch, Inst Cytol & Genet, Prospekt Lavrentyeva 10, Novosibirsk 630090, Russia
关键词
text-mining; ANDSystem; deep learning; GNN; LLM; knowledge graph; FUNCTIONAL MODULES; GENE NETWORKS; RECONSTRUCTION; ASSOCIATION; COMPLEXES; SEROTONIN; BIOLOGY; BINDING; SYSTEMS; SLEEP;
D O I
10.3390/ijms252111811
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The rapid growth of biomedical literature makes it challenging for researchers to stay current. Integrating knowledge from various sources is crucial for studying complex biological systems. Traditional text-mining methods often have limited accuracy because they don't capture semantic and contextual nuances. Deep-learning models can be computationally expensive and typically have low interpretability, though efforts in explainable AI aim to mitigate this. Furthermore, transformer-based models have a tendency to produce false or made-up information-a problem known as hallucination-which is especially prevalent in large language models (LLMs). This study proposes a hybrid approach combining text-mining techniques with graph neural networks (GNNs) and fine-tuned large language models (LLMs) to extend biomedical knowledge graphs and interpret predicted edges based on published literature. An LLM is used to validate predictions and provide explanations. Evaluated on a corpus of experimentally confirmed protein interactions, the approach achieved a Matthews correlation coefficient (MCC) of 0.772. Applied to insomnia, the approach identified 25 interactions between 32 human proteins absent in known knowledge bases, including regulatory interactions between MAOA and 5-HT2C, binding between ADAM22 and 14-3-3 proteins, which is implicated in neurological diseases, and a circadian regulatory loop involving RORB and NR1D1. The hybrid GNN-LLM method analyzes biomedical literature efficiency to uncover potential molecular interactions for complex disorders. It can accelerate therapeutic target discovery by focusing expert verification on the most relevant automatically extracted information.
引用
收藏
页数:27
相关论文
共 50 条
  • [41] Large language models for data extraction from unstructured and semi-structured electronic health records: a multiple model performance evaluation
    Ntinopoulos, Vasileios
    Biefer, Hector Rodriguez Cetina
    Tudorache, Igor
    Papadopoulos, Nestoras
    Odavic, Dragan
    Risteski, Petar
    Haeussler, Achim
    Dzemali, Omer
    BMJ HEALTH & CARE INFORMATICS, 2025, 32 (01)
  • [42] Incorporating Domain Knowledge Into Language Models by Using Graph Convolutional Networks for Assessing Semantic Textual Similarity: Model Development and Performance Comparison
    Chang, David
    Lin, Eric
    Brandt, Cynthia
    Taylor, Richard Andrew
    JMIR MEDICAL INFORMATICS, 2021, 9 (11)
  • [43] Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models
    Xu, Xuenan
    Zhang, Pingyue
    Yang, Ming
    Zhang, Ji
    Wu, Mengyue
    INTERSPEECH 2024, 2024, : 4808 - 4812
  • [44] High-Throughput Extraction of Phase-Property Relationships from Literature Using Natural Language Processing and Large Language Models
    Montanelli, Luca
    Venugopal, Vineeth
    Olivetti, Elsa A.
    Latypov, Marat I.
    INTEGRATING MATERIALS AND MANUFACTURING INNOVATION, 2024, 13 (2) : 396 - 405
  • [45] Analyzing the importance of network topology in AADT estimation: insights from travel demand models using graph neural networks
    Zhen, Hao
    Yang, Jidong J.
    TRANSPORTATION, 2024,
  • [46] Extraction of piecewise-linear analog circuit models from trained neural networks using hidden neuron clustering
    Doboli, S
    Gothoskar, G
    Doboli, A
    DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, PROCEEDINGS, 2003, : 1098 - 1099
  • [47] Predictive Modelling for Sensitive Social Media Contents Using Entropy-FlowSort and Artificial Neural Networks Initialized by Large Language Models
    Galamiton, Narcisan
    Bacus, Suzette
    Fuentes, Noreen
    Ugang, Janeth
    Villarosa, Rica
    Wenceslao, Charldy
    Ocampo, Lanndon
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [48] Extracting phenotypes from clinical descriptions using large language models: a comparison between automated and manual approach.
    Berardelli, Silvia
    Gazzo, Andrea
    De Paoli, Federica
    Limongelli, Ivan
    Rizzo, Ettore
    Magni, Paolo
    Zucca, Susanna
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2024, 32 : 1630 - 1631
  • [49] Amplifying commonsense knowledge via bi-directional relation integrated graph-based contrastive pre-training from large language models☆
    Yu, Liu
    Tian, Fenghui
    Kuang, Ping
    Zhou, Fan
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (03)
  • [50] DeepEpiIL13: Deep Learning for Rapid and Accurate Prediction of IL-13-Inducing Epitopes Using Pretrained Language Models and Multiwindow Convolutional Neural Networks
    Chuang, Cheng-Che
    Liu, Yu-Chen
    Ou, Yu-Yen
    ACS OMEGA, 2025, 10 (09): : 9675 - 9683