pLM4CPPs: Protein Language Model-Based Predictor for Cell Penetrating Peptides

被引:0
|
作者
Kumar, Nandan [1 ]
Du, Zhenjiao [1 ]
Li, Yonghui [1 ]
机构
[1] Kansas State Univ, Dept Grain Sci & Ind, Manhattan, KS 66506 USA
关键词
RICH ANTIMICROBIAL PEPTIDES; DELIVERY; MECHANISMS; VEHICLES;
D O I
10.1021/acs.jcim.4c01338
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
Cell-penetrating peptides (CPPs) are short peptides capable of penetrating cell membranes, making them valuable for drug delivery and intracellular targeting. Accurate prediction of CPPs can streamline experimental validation in the lab. This study aims to assess pretrained protein language models (pLMs) for their effectiveness in representing CPPs and develop a reliable model for CPP classification. We evaluated peptide embeddings generated from BEPLER, CPCProt, SeqVec, various ESM variants (ESM, ESM-2 with expanded feature set, ESM-1b, and ESM-1v), ProtT5-XL UniRef50, ProtT5-XL BFD, and ProtBERT. We developed pLM4CCPs, a novel deep learning architecture using convolutional neural networks (CNNs) as the classifier for binary classification of CPPs. pLM4CCPs demonstrated superior performance over existing state-of-the-art CPP prediction models, achieving improvements in accuracy (ACC) by 4.9-5.5%, Matthews correlation coefficient (MCC) by 9.3-10.2%, and sensitivity (Sn) by 14.1-19.6%. Among all the tested models, ESM-1280 and ProtT5-XL BFD demonstrated the highest overall performance on the kelm data set. ESM-1280 achieved an ACC of 0.896, an MCC of 0.796, a Sn of 0.844, and a specificity (Sp) of 0.978. ProtT5-XL BFD exhibited superior performance with an ACC of 0.901, an MCC of 0.802, an Sn of 0.885, and an Sp of 0.917. pLM4CCPs combine predictions from multiple models to provide a consensus on whether a given peptide sequence is classified as a CPP or non-CPP. This approach will enhance prediction reliability by leveraging the strengths of each individual model. A user-friendly web server for bioactivity predictions, along with data sets, is available at https://ry2acnp6ep.us-east-1.awsapprunner.com. The source code and protocol for adapting pLM4CPPs can be accessed on GitHub at https://github.com/drkumarnandan/pLM4CPPs. This platform aims to advance CPP prediction and peptide functionality modeling, aiding researchers in exploring peptide functionality effectively.
引用
收藏
页码:1128 / 1139
页数:12
相关论文
共 44 条
  • [21] Cell-culture process optimization via model-based predictions of metabolism and protein glycosylation
    Reddy, Jayanth Venkatarama
    Raudenbush, Katherine
    Papoutsakis, Eleftherios Terry
    Ierapetritou, Marianthi
    BIOTECHNOLOGY ADVANCES, 2023, 67
  • [22] NLP4ReF: Requirements Classification and Forecasting: From Model-Based Design to Large Language Models
    Peer, Jordan
    Mordecai, Yaniv
    Reich, Yoram
    2024 IEEE AEROSPACE CONFERENCE, 2024,
  • [23] Full membrane spanning self-assembled monolayers as model systems for UHV-based studies of cell-penetrating peptides
    Franz, Johannes
    Graham, Daniel J.
    Schmueser, Lars
    Baio, Joe E.
    Lelle, Marco
    Peneva, Kalina
    Muellen, Klaus
    Castner, David G.
    Bonn, Mischa
    Weidner, Tobias
    BIOINTERPHASES, 2015, 10 (01)
  • [24] Model-based test execution from high-level natural language instructions using GPT-4
    Azimi, Mohammad Yusaf
    Yilmaz, Cemal
    SOFTWARE QUALITY JOURNAL, 2025, 33 (01)
  • [25] MetalTrans: A Biological Language Model-Based Approach for Predicting Disease-Associated Mutations in Protein Metal-Binding Sites
    Zhang, Ming
    Wang, Xiaohua
    Xu, Shanruo
    Ge, Fang
    Paixao, Ian Costa
    Song, Jiangning
    Yu, Dong-Jun
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (15) : 6216 - 6229
  • [26] GPTrans: A Biological Language Model-Based Approach for Predicting Disease-Associated Mutations in G Protein-Coupled Receptors
    Wang, Xiaohua
    Zhang, Ming
    Yang, Xibei
    Yu, Dong-Jun
    Ge, Fang
    JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2024, 64 (24) : 9626 - 9642
  • [27] LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model
    Pratyush, Pawel
    Bahmani, Soufia
    Pokharel, Suresh
    Ismail, Hamid D.
    Kc, Dukka B.
    BIOINFORMATICS, 2024, 40 (05)
  • [28] TULIP: A transformer-based unsupervised language model for interacting peptides and T cell receptors that generalizes to unseen epitopes
    Meynard-Piganeau, Barthelemy
    Feinauer, Christoph
    Weigt, Martin
    Walczak, Aleksandra M.
    Mora, Thierry
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (24)
  • [29] ESMR4FBP: A pLM-based regression prediction model for specific properties of food-derived peptides optimized multiple bionic metaheuristic algorithms
    Zhang, Ruihao
    Li, Yonghui
    Jiang, Qinbo
    Li, Yang
    Cai, Zhe
    Zhang, Hui
    FOOD CHEMISTRY, 2025, 464
  • [30] The Accurate Prediction of Antibody Deamidations by Combining High-Throughput Automated Peptide Mapping and Protein Language Model-Based Deep Learning
    Niu, Ben
    Lee, Benjamin
    Wang, Lili
    Chen, Wen
    Johnson, Jeffrey
    ANTIBODIES, 2024, 13 (03)