A sequence-based computational method for prediction of MoRFs

被引:6
|
作者
Wang, Yu [1 ]
Guo, Yanzhi [1 ]
Pu, Xuemei [1 ]
Li, Menglong [1 ]
机构
[1] Sichuan Univ, Coll Chem, Chengdu 610064, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
MOLECULAR RECOGNITION FEATURES; INTRINSICALLY DISORDERED PROTEINS; SECONDARY STRUCTURE; WEB SERVER; BINDING; REGIONS; KNN;
D O I
10.1039/c6ra27161h
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Molecular recognition features (MoRFs) are relatively short segments (10-70 residues) within intrinsically disordered regions (IDRs) that can undergo disorder-to-order transitions during binding to partner proteins. Since MoRFs play key roles in important biological processes such as signaling and regulation, identifying them is crucial for a full understanding of the functional aspects of the IDRs. However, given the relative sparseness of MoRFs in protein sequences, the accuracy of the available MoRF predictors is often inadequate for practical usage, which leaves a significant need and room for improvement. In this work, we developed a novel sequence-based predictor for MoRFs using a support vector machine (SVM) algorithm. First, we constructed a comprehensive dataset of annotated MoRFs with the wide length between 10 and 70 residues. Our method firstly utilized the flanking regions to define the negative samples. Then, amino acid composition (AAC) and two previously unexplored features including composition, transition and distribution (CTD) and K nearest neighbors (KNN) score were used to characterize sequence information of MoRFs. Finally, using five-fold cross-validation, an overall accuracy of 75.75% was achieved through feature evaluation and optimization. When performed on an independent test set of 110 proteins, the method also yielded a promising accuracy of 64.98%. Additionally, through external validation on the negative samples, our method still shows comparative performance with other existing methods. We believe that this study will be useful in elucidating the mechanism of MoRFs and facilitating hypothesis-driven experimental design and validation.
引用
收藏
页码:18937 / 18945
页数:9
相关论文
共 50 条
  • [21] Accurate sequence-based prediction of catalytic residues
    Zhang, Tuo
    Zhang, Hua
    Chen, Ke
    Shen, Shiyi
    Ruan, Jishou
    Kurgan, Lukasz
    BIOINFORMATICS, 2008, 24 (20) : 2329 - 2338
  • [22] A Novel Sequence-Based Method for Phosphorylation Site Prediction with Feature Selection and Analysis
    He, Zhi-Song
    Shi, Xiao-He
    Kong, Xiang-Ying
    Zhu, Yu-Bei
    Chou, Kuo-Chen
    PROTEIN AND PEPTIDE LETTERS, 2012, 19 (01): : 70 - 78
  • [23] Sequence-based prediction in conceptual design of bridges
    Wang, Weiyuan
    Gero, John S.
    Journal of Computing in Civil Engineering, 1997, 2 (01): : 37 - 43
  • [24] Sequence-Based Prediction of Fuzzy Protein Interactions
    Miskei, Marton
    Horvath, Attila
    Vendruscolo, Michele
    Fuxreiter, Monika
    JOURNAL OF MOLECULAR BIOLOGY, 2020, 432 (07) : 2289 - 2303
  • [25] Sequence-Based Prediction of Metamorphic Behavior in Proteins
    Chen, Nanhao
    Das, Madhurima
    LiWang, Andy
    Wang, Lee-Ping
    BIOPHYSICAL JOURNAL, 2020, 119 (07) : 1380 - 1390
  • [26] Sequence-based Structured Prediction for Semantic Parsing
    Xiao, Chunyang
    Dymetman, Marc
    Gardent, Claire
    PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1341 - 1350
  • [27] Sequence-Based Prediction of Olfactory Receptor Responses
    Chepurwar, Shashank
    Gupta, Abhishek
    Haddad, Rafi
    Gupta, Nitin
    CHEMICAL SENSES, 2019, 44 (09) : 693 - 703
  • [28] SCPSSMpred: A general sequence-based method for ligand-binding site prediction
    Fang, Chun
    Noguchi, Tamotsu
    Yamana, Hayato
    IPSJ Transactions on Bioinformatics, 2013, 6 : 35 - 42
  • [29] Sequence-based prediction in conceptual design of bridges
    Wang, WY
    Gero, JS
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 1997, 11 (01) : 37 - 43
  • [30] Sequence-based feature prediction and annotation of proteins
    Juncker, Agnieszka S.
    Jensen, Lars J.
    Pierleoni, Andrea
    Bernsel, Andreas
    Tress, Michael L.
    Bork, Peer
    von Heijne, Gunnar
    Valencia, Alfonso
    Ouzounis, Christos A.
    Casadio, Rita
    Brunak, Soren
    GENOME BIOLOGY, 2009, 10 (02): : 206