Optimized biomedical entity relation extraction method with data augmentation and classification using GPT-4 and Gemini

被引:0
|
作者
Phan, Cong-Phuoc [1 ]
Phan, Ben [1 ]
Chiang, Jung-Hsien [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, 1 Univ Rd, Tainan 701, Taiwan
关键词
D O I
10.1093/database/baae104
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Despite numerous research efforts by teams participating in the BioCreative VIII Track 01 employing various techniques to achieve the high accuracy of biomedical relation tasks, the overall performance in this area still has substantial room for improvement. Large language models bring a new opportunity to improve the performance of existing techniques in natural language processing tasks. This paper presents our improved method for relation extraction, which involves integrating two renowned large language models: Gemini and GPT-4. Our new approach utilizes GPT-4 to generate augmented data for training, followed by an ensemble learning technique to combine the outputs of diverse models to create a more precise prediction. We then employ a method using Gemini responses as input to fine-tune the BioNLP-PubMed-Bert classification model, which leads to improved performance as measured by precision, recall, and F1 scores on the same test dataset used in the challenge evaluation.Database URL: https://biocreative.bioinformatics.udel.edu/tasks/biocreative-viii/track-1/
引用
收藏
页数:8
相关论文
共 50 条
  • [21] An Optimized Point Cloud Classification and Object Extraction Method Using Graph Cuts
    Guo, Bo
    Zuo, Xiaohan
    IEEE ACCESS, 2020, 8 : 188515 - 188525
  • [22] Classification of Large Biomedical Data using ANNs based on BFGS method
    Livieris, I. E.
    Apostolopoulou, M. S.
    Sotiropoulos, D. G.
    Sioutas, S. A.
    Pintelas, P.
    13TH PANHELLENIC CONFERENCE ON INFORMATICS, PROCEEDINGS, 2009, : 87 - +
  • [23] Using GPT-4 in parameter selection of polymer informatics: improving predictive accuracy amidst data scarcity and 'Ugly Duckling' dilemma
    Hatakeyama-Sato, Kan
    Watanabe, Seigo
    Yamane, Naoki
    Igarashi, Yasuhiko
    Oyaizu, Kenichi
    DIGITAL DISCOVERY, 2023, 2 (05): : 1548 - 1557
  • [24] Innovative Sentiment Analysis and Prediction of Stock Price Using FinBERT, GPT-4 and Logistic Regression: A Data-Driven Approach
    Shobayo, Olamilekan
    Adeyemi-Longe, Sidikat
    Popoola, Olusogo
    Ogunleye, Bayode
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (11)
  • [25] Hyperspectral image classification using a deep relation network with random replacement data augmentation
    Lu, Xinhua
    Hao, Jiaxuan
    Wang, Hua
    Qiao, Jianliang
    Huang, Junbo
    REMOTE SENSING LETTERS, 2024, 15 (08) : 805 - 815
  • [26] Entity relation joint extraction method for manufacturing industry knowledge data based on improved BERT algorithm
    Han, Jiao
    Jia, Kang
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2024, 27 (06): : 7941 - 7954
  • [27] Developing an ICD-10 Coding Assistant: Pilot Study Using RoBERTa and GPT-4 for Term Extraction and Description- Based Code Selection
    Puts, Sander
    Zegers, Catharina M. L.
    Dekker, Andre
    Bermejo, Inigo
    JMIR FORMATIVE RESEARCH, 2025, 9
  • [28] How Does a Generative Large Language Model Perform on Domain-Specific Information Extraction?―A Comparison between GPT-4 and a Rule-Based Method on Band Gap Extraction
    Wang, Xin
    Huang, Liangliang
    Xu, Shuozhi
    Lu, Kun
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (20) : 7895 - 7904
  • [29] Data classification using KNN-fuzzy method optimized by a genetic algorithm
    Rosa, JLA
    Ebecken, NFF
    APPLICATIONS OF HIGH-PERFORMANCE COMPUTING IN ENGINEERING VII, 2002, 7 : 169 - 178
  • [30] A new image classification method using CNN transfer learning and web data augmentation
    Han, Dongmei
    Liu, Qigang
    Fan, Weiguo
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 95 : 43 - 56