Integrating deep learning architectures for enhanced biomedical relation extraction: a pipeline approach

被引:0
|
作者
Sarol, M. Janina [1 ]
Hong, Gibong [2 ]
Guerra, Evan [2 ]
Kilicoglu, Halil [2 ]
机构
[1] Univ Illinois, Informat Programs, 614 E Daniel St, Champaign, IL 61820 USA
[2] Univ Illinois, Sch Informat Sci, 501 Daniel St, Champaign, IL 61820 USA
来源
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION | 2024年 / 2024卷
基金
美国国家卫生研究院;
关键词
NORMALIZATION; RECOGNITION; RESOURCE; CORPUS; ENTITY;
D O I
10.1093/database/baae079
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Biomedical relation extraction from scientific publications is a key task in biomedical natural language processing (NLP) and can facilitate the creation of large knowledge bases, enable more efficient knowledge discovery, and accelerate evidence synthesis. In this paper, building upon our previous effort in the BioCreative VIII BioRED Track, we propose an enhanced end-to-end pipeline approach for biomedical relation extraction (RE) and novelty detection (ND) that effectively leverages existing datasets and integrates state-of-the-art deep learning methods. Our pipeline consists of four tasks performed sequentially: named entity recognition (NER), entity linking (EL), RE, and ND. We trained models using the BioRED benchmark corpus that was the basis of the shared task. We explored several methods for each task and combinations thereof: for NER, we compared a BERT-based sequence labeling model that uses the BIO scheme with a span classification model. For EL, we trained a convolutional neural network model for diseases and chemicals and used an existing tool, PubTator 3.0, for mapping other entity types. For RE and ND, we adapted the BERT-based, sentence-bound PURE model to bidirectional and document-level extraction. We also performed extensive hyperparameter tuning to improve model performance. We obtained our best performance using BERT-based models for NER, RE, and ND, and the hybrid approach for EL. Our enhanced and optimized pipeline showed substantial improvement compared to our shared task submission, NER: 93.53 (+3.09), EL: 83.87 (+9.73), RE: 46.18 (+15.67), and ND: 38.86 (+14.9). While the performances of the NER and EL models are reasonably high, RE and ND tasks remain challenging at the document level. Further enhancements to the dataset could enable more accurate and useful models for practical use. We provide our models and code at https://github.com/janinaj/e2eBioMedRE/.Database URL: https://github.com/janinaj/e2eBioMedRE/
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Information-Based Learning of Deep Architectures for Feature Extraction
    Melacci, Stefano
    Lippi, Marco
    Gori, Marco
    Maggini, Marco
    IMAGE ANALYSIS AND PROCESSING (ICIAP 2013), PT II, 2013, 8157 : 101 - 110
  • [22] Learning entity-oriented representation for biomedical relation extraction
    Hu, Ying
    Chen, Yanping
    Qin, Yongbin
    Huang, Ruizhang
    JOURNAL OF BIOMEDICAL INFORMATICS, 2023, 147
  • [23] A pattern-first pipeline approach for entity and relation extraction
    Chen, Zheng
    Guo, Changyu
    NEUROCOMPUTING, 2022, 494 : 182 - 191
  • [24] Integrating shortest dependency path and sentence sequence into a deep learning framework for relation extraction in clinical text
    Li, Zhiheng
    Yang, Zhihao
    Shen, Chen
    Xu, Jun
    Zhang, Yaoyun
    Xu, Hua
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (Suppl 1)
  • [25] Integrating shortest dependency path and sentence sequence into a deep learning framework for relation extraction in clinical text
    Zhiheng Li
    Zhihao Yang
    Chen Shen
    Jun Xu
    Yaoyun Zhang
    Hua Xu
    BMC Medical Informatics and Decision Making, 19
  • [26] ReOnto: A Neuro-Symbolic Approach for Biomedical Relation Extraction
    Jain, Monika
    Singh, Kuldeep
    Mutharaju, Raghava
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT IV, 2023, 14172 : 230 - 247
  • [27] Relation Extraction in Biomedical Texts: A Cross-Sentence Approach
    Li, Zhijing
    Tian, Liwei
    Jiang, Yiping
    Huang, Yucheng
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2024, 21 (06) : 2156 - 2166
  • [28] Versatile Deep Learning Pipeline for Transferable Chemical Data Extraction
    Alshehri, Abdulelah S.
    Horstmann, Kai A.
    You, Fengqi
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (15) : 5888 - 5899
  • [29] Biomedical Relation Extraction via Syntax-Enhanced Contrastive Networks
    Du, Wei
    Yuan, Jianyuan
    Liu, Xiaoxia
    Lin, Hongfei
    Zhang, Yijia
    HEALTH INFORMATION PROCESSING, CHIP 2023, 2023, 1993 : 129 - 144
  • [30] Biomedical relation extraction via knowledge-enhanced reading comprehension
    Chen, Jing
    Hu, Baotian
    Peng, Weihua
    Chen, Qingcai
    Tang, Buzhou
    BMC BIOINFORMATICS, 2022, 23 (01)