PreAlgPro: Prediction of allergenic proteins with pre-trained protein language model and efficient neutral network

被引:1
|
作者
Zhang, Lingrong [1 ]
Liu, Taigang [1 ]
机构
[1] Shanghai Ocean Univ, Coll Informat Technol, Shanghai 201306, Peoples R China
关键词
Pre-trained protein language model; Allergenic proteins; Deep learning; Model interpretability; DATABASE;
D O I
10.1016/j.ijbiomac.2024.135762
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Allergy is a prevalent phenomenon, involving allergens such as nuts and milk. Avoiding exposure to allergens is the most effective preventive measure against allergic reactions. However, current homology-based methods for identifying allergenic proteins encounter challenges when dealing with non-homologous data. Traditional machine learning approaches rely on manually extracted features, which lack important protein functional characteristics, including evolutionary information. Consequently, there is still considerable room for improvement in existing methods. In this study, we present PreAlgPro, a method for identifying allergenic proteins based on pre-trained protein language models and deep learning techniques. Specifically, we employed the ProtT5 model to extract protein embedding features, replacing the manual feature extraction step. Furthermore, we devised an Attention-CNN neural network architecture to identify potential features that contribute to the classification of allergenic proteins. The performance of our model was evaluated on four independent test sets, and the experimental results demonstrate that PreAlgPro surpasses existing state-of-the-art methods. Additionally, we collected allergenic protein samples to validate the robustness of the model and conducted an analysis of model interpretability.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Efficient word segmentation for enhancing Chinese spelling check in pre-trained language model
    Li, Fangfang
    Jiang, Jie
    Tang, Dafu
    Shan, Youran
    Duan, Junwen
    Zhang, Shichao
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (01) : 603 - 632
  • [22] DaDL-SChlo: protein subchloroplast localization prediction based on generative adversarial networks and pre-trained protein language model
    Wang, Xiao
    Han, Lijun
    Wang, Rong
    Chen, Haoran
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (03)
  • [23] A Technique to Pre-trained Neural Network Language Model Customization to Software Development Domain
    Dudarin, Pavel, V
    Tronin, Vadim G.
    Svyatov, Kirill, V
    ARTIFICIAL INTELLIGENCE: (RCAI 2019), 2019, 1093 : 169 - 176
  • [24] Enhancing Language Generation with Effective Checkpoints of Pre-trained Language Model
    Park, Jeonghyeok
    Zhao, Hai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2686 - 2694
  • [25] Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs)
    Alanazi, Wafa
    Meng, Di
    Pollastri, Gianluca
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2025, 26 (01)
  • [26] POOE: predicting oomycete effectors based on a pre-trained large protein language model
    Zhao, Miao
    Lei, Chenping
    Zhou, Kewei
    Huang, Yan
    Fu, Chen
    Yang, Shiping
    Zhang, Ziding
    MSYSTEMS, 2024, 9 (01)
  • [27] EFFICIENT TEXT ANALYSIS WITH PRE-TRAINED NEURAL NETWORK MODELS
    Cui, Jia
    Lu, Heng
    Wang, Wenjie
    Kang, Shiyin
    He, Liqiang
    Li, Guangzhi
    Yu, Dong
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 671 - 676
  • [28] SsciBERT: a pre-trained language model for social science texts
    Si Shen
    Jiangfeng Liu
    Litao Lin
    Ying Huang
    Lin Zhang
    Chang Liu
    Yutong Feng
    Dongbo Wang
    Scientometrics, 2023, 128 : 1241 - 1263
  • [29] A Pre-trained Clinical Language Model for Acute Kidney Injury
    Mao, Chengsheng
    Yao, Liang
    Luo, Yuan
    2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020), 2020, : 531 - 532
  • [30] Few-Shot NLG with Pre-Trained Language Model
    Chen, Zhiyu
    Eavani, Harini
    Chen, Wenhu
    Liu, Yinyin
    Wang, William Yang
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 183 - 190