PreAlgPro: Prediction of allergenic proteins with pre-trained protein language model and efficient neutral network

被引：1

作者：

Zhang, Lingrong ^{[1
]}

Liu, Taigang ^{[1
]}

机构：

[1] Shanghai Ocean Univ, Coll Informat Technol, Shanghai 201306, Peoples R China

来源：

INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES | 2024年 / 280卷

关键词：

Pre-trained protein language model; Allergenic proteins; Deep learning; Model interpretability; DATABASE;

D O I：

10.1016/j.ijbiomac.2024.135762

中图分类号：

Q5 [生物化学]; Q7 [分子生物学];

学科分类号：

071010 ; 081704 ;

摘要：

Allergy is a prevalent phenomenon, involving allergens such as nuts and milk. Avoiding exposure to allergens is the most effective preventive measure against allergic reactions. However, current homology-based methods for identifying allergenic proteins encounter challenges when dealing with non-homologous data. Traditional machine learning approaches rely on manually extracted features, which lack important protein functional characteristics, including evolutionary information. Consequently, there is still considerable room for improvement in existing methods. In this study, we present PreAlgPro, a method for identifying allergenic proteins based on pre-trained protein language models and deep learning techniques. Specifically, we employed the ProtT5 model to extract protein embedding features, replacing the manual feature extraction step. Furthermore, we devised an Attention-CNN neural network architecture to identify potential features that contribute to the classification of allergenic proteins. The performance of our model was evaluated on four independent test sets, and the experimental results demonstrate that PreAlgPro surpasses existing state-of-the-art methods. Additionally, we collected allergenic protein samples to validate the robustness of the model and conducted an analysis of model interpretability.

引用

页数：11

共 50 条

[21] Efficient word segmentation for enhancing Chinese spelling check in pre-trained language model
Li, Fangfang
Jiang, Jie
Tang, Dafu
Shan, Youran
Duan, Junwen
Zhang, Shichao
KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (01) : 603 - 632
[22] DaDL-SChlo: protein subchloroplast localization prediction based on generative adversarial networks and pre-trained protein language model
Wang, Xiao
Han, Lijun
Wang, Rong
Chen, Haoran
BRIEFINGS IN BIOINFORMATICS, 2023, 24 (03)
[23] A Technique to Pre-trained Neural Network Language Model Customization to Software Development Domain
Dudarin, Pavel, V
Tronin, Vadim G.
Svyatov, Kirill, V
ARTIFICIAL INTELLIGENCE: (RCAI 2019), 2019, 1093 : 169 - 176
[24] Enhancing Language Generation with Effective Checkpoints of Pre-trained Language Model
Park, Jeonghyeok
Zhao, Hai
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 2686 - 2694
[25] Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs)
Alanazi, Wafa
Meng, Di
Pollastri, Gianluca
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2025, 26 (01)
[26] POOE: predicting oomycete effectors based on a pre-trained large protein language model
Zhao, Miao
Lei, Chenping
Zhou, Kewei
Huang, Yan
Fu, Chen
Yang, Shiping
Zhang, Ziding
MSYSTEMS, 2024, 9 (01)
[27] EFFICIENT TEXT ANALYSIS WITH PRE-TRAINED NEURAL NETWORK MODELS
Cui, Jia
Lu, Heng
Wang, Wenjie
Kang, Shiyin
He, Liqiang
Li, Guangzhi
Yu, Dong
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 671 - 676
[28] SsciBERT: a pre-trained language model for social science texts
Si Shen
Jiangfeng Liu
Litao Lin
Ying Huang
Lin Zhang
Chang Liu
Yutong Feng
Dongbo Wang
Scientometrics, 2023, 128 : 1241 - 1263
[29] A Pre-trained Clinical Language Model for Acute Kidney Injury
Mao, Chengsheng
Yao, Liang
Luo, Yuan
2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020), 2020, : 531 - 532
[30] Few-Shot NLG with Pre-Trained Language Model
Chen, Zhiyu
Eavani, Harini
Chen, Wenhu
Liu, Yinyin
Wang, William Yang
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 183 - 190

← 1 2 3 4 5 →