MetalTrans: A Biological Language Model-Based Approach for Predicting Disease-Associated Mutations in Protein Metal-Binding Sites

被引:1
|
作者
Zhang, Ming [1 ]
Wang, Xiaohua [1 ]
Xu, Shanruo [2 ]
Ge, Fang [3 ,4 ]
Paixao, Ian Costa [5 ,6 ,7 ]
Song, Jiangning [5 ,6 ,7 ]
Yu, Dong-Jun [8 ]
机构
[1] Jiangsu Univ Sci & Technol, Sch Comp, Zhenjiang 212100, Peoples R China
[2] Duke Kunshan Univ, Kunshan 215316, Jiangsu, Peoples R China
[3] Nanjing Univ Posts & Telecommun, State Key Lab Organ Elect & Informat Displays, Nanjing 210023, Peoples R China
[4] Nanjing Univ Posts & Telecommun, Inst Adv Mat IAM, Nanjing 210023, Peoples R China
[5] Monash Univ, Monash Biomed Discovery Inst, Melbourne, Vic 3800, Australia
[6] Monash Univ, Dept Biochem & Mol Biol, Melbourne, Vic 3800, Australia
[7] Monash Univ, Monash Data Futures Inst, Melbourne, Vic 3800, Australia
[8] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
基金
中国国家自然科学基金;
关键词
CAUSE NOONAN; METALLOPROTEINS; SELECTIVITY; RESOURCE; INSIGHTS; DATABASE; UNIPROT; UREE;
D O I
10.1021/acs.jcim.4c00739
中图分类号
R914 [药物化学];
学科分类号
100701 ;
摘要
The critical importance of accurately predicting mutations in protein metal-binding sites for advancing drug discovery and enhancing disease diagnostic processes cannot be overstated. In response to this imperative, MetalTrans emerges as an accurate predictor for disease-associated mutations in protein metal-binding sites. The core innovation of MetalTrans lies in its seamless integration of multifeature splicing with the Transformer framework, a strategy that ensures exhaustive feature extraction. Central to MetalTrans's effectiveness is its deep feature combination strategy, which merges evolutionary-scale modeling amino acid embeddings with ProtTrans embeddings, thus shedding light on the biochemical properties of proteins. Employing the Transformer component, MetalTrans leverages the self-attention mechanism to delve into higher-level representations. Utilizing mutation site information for feature fusion not only enriches the feature set but also sidesteps the common pitfall of overestimation linked to protein sequence-based predictions. This nuanced approach to feature fusion is a key differentiator, enabling MetalTrans to outperform existing methods significantly, as evidenced by comparative analyses. Our evaluations across varied metal binding site data sets (specifically Zn, Ca, Mg, and Mix) underscore MetalTrans's superior performance, which achieved the average AUC values of 0.971, 0.965, 0.980, and 0.945 on multiple 5-fold cross-validation, respectively. Remarkably, against the multichannel convolutional neural network method on a benchmark independent test set, MetalTrans demonstrated unparalleled robustness and superiority, boasting the AUC score of 0.998 on multiple 5-fold cross-validation. Our comprehensive examination of the predicted outcomes further confirms the effectiveness of the model.
引用
收藏
页码:6216 / 6229
页数:14
相关论文
共 34 条
  • [31] Laser ablation–inductively coupled plasma–mass spectrometry (LA-ICP-MS)–based strategies applied for the analysis of metal-binding protein in biological samples: an update on recent advances
    Jiahao Chen
    Ruixia Wang
    Minghao Ma
    Lirong Gao
    Bin Zhao
    Ming Xu
    Analytical and Bioanalytical Chemistry, 2022, 414 : 7023 - 7033
  • [32] Predicting the impact of missense mutations on an unresolved protein's stability, structure, and function: A case study of Alzheimer's disease-associated TREM2 R47H variant
    Pillai, Joshua
    Sung, Kijung
    Wu, Chengbiao
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2025, 27 : 564 - 574
  • [33] Laser ablation-inductively coupled plasma-mass spectrometry (LA-ICP-MS)-based strategies applied for the analysis of metal-binding protein in biological samples: an update on recent advances
    Chen, Jiahao
    Wang, Ruixia
    Ma, Minghao
    Gao, Lirong
    Zhao, Bin
    Xu, Ming
    ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2022, 414 (24) : 7023 - 7033
  • [34] HOTGpred: Enhancing human O-linked threonine glycosylation prediction using integrated pretrained protein language model-based features and multi-stage feature selection approach
    Pham N.T.
    Zhang Y.
    Rakkiyappan R.
    Manavalan B.
    Computers in Biology and Medicine, 2024, 179