PreAlgPro: Prediction of allergenic proteins with pre-trained protein language model and efficient neutral network

被引:1
|
作者
Zhang, Lingrong [1 ]
Liu, Taigang [1 ]
机构
[1] Shanghai Ocean Univ, Coll Informat Technol, Shanghai 201306, Peoples R China
关键词
Pre-trained protein language model; Allergenic proteins; Deep learning; Model interpretability; DATABASE;
D O I
10.1016/j.ijbiomac.2024.135762
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Allergy is a prevalent phenomenon, involving allergens such as nuts and milk. Avoiding exposure to allergens is the most effective preventive measure against allergic reactions. However, current homology-based methods for identifying allergenic proteins encounter challenges when dealing with non-homologous data. Traditional machine learning approaches rely on manually extracted features, which lack important protein functional characteristics, including evolutionary information. Consequently, there is still considerable room for improvement in existing methods. In this study, we present PreAlgPro, a method for identifying allergenic proteins based on pre-trained protein language models and deep learning techniques. Specifically, we employed the ProtT5 model to extract protein embedding features, replacing the manual feature extraction step. Furthermore, we devised an Attention-CNN neural network architecture to identify potential features that contribute to the classification of allergenic proteins. The performance of our model was evaluated on four independent test sets, and the experimental results demonstrate that PreAlgPro surpasses existing state-of-the-art methods. Additionally, we collected allergenic protein samples to validate the robustness of the model and conducted an analysis of model interpretability.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] LPBERT: A Protein-Protein Interaction Prediction Method Based on a Pre-Trained Language Model
    Hu, An
    Kuang, Linai
    Yang, Dinghai
    APPLIED SCIENCES-BASEL, 2025, 15 (06):
  • [2] Pre-trained protein language model sheds new light on the prediction of Arabidopsis protein–protein interactions
    Kewei Zhou
    Chenping Lei
    Jingyan Zheng
    Yan Huang
    Ziding Zhang
    Plant Methods, 19
  • [3] Hyperbolic Pre-Trained Language Model
    Chen, Weize
    Han, Xu
    Lin, Yankai
    He, Kaichen
    Xie, Ruobing
    Zhou, Jie
    Liu, Zhiyuan
    Sun, Maosong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
  • [4] Pre-trained protein language model sheds new light on the prediction of Arabidopsis protein-protein interactions
    Zhou, Kewei
    Lei, Chenping
    Zheng, Jingyan
    Huang, Yan
    Zhang, Ziding
    PLANT METHODS, 2023, 19 (01)
  • [5] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [6] Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning
    Liu, Yufan
    Tian, Boxue
    BRIEFINGS IN BIOINFORMATICS, 2024, 25 (01)
  • [7] Adder Encoder for Pre-trained Language Model
    Ding, Jianbang
    Zhang, Suiyun
    Li, Linlin
    CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 339 - 347
  • [8] PepNet: an interpretable neural network for anti-inflammatory and antimicrobial peptides prediction using a pre-trained protein language model
    Han, Jiyun
    Kong, Tongxin
    Liu, Juntao
    COMMUNICATIONS BIOLOGY, 2024, 7 (01)
  • [9] Pre-trained language models for keyphrase prediction: A review
    Umair, Muhammad
    Sultana, Tangina
    Lee, Young-Koo
    ICT EXPRESS, 2024, 10 (04): : 871 - 890
  • [10] Towards Efficient Pre-Trained Language Model via Feature Correlation Distillation
    Huang, Kun
    Guo, Xin
    Wang, Meng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,