A sequence-based computational method for prediction of MoRFs

被引:6
|
作者
Wang, Yu [1 ]
Guo, Yanzhi [1 ]
Pu, Xuemei [1 ]
Li, Menglong [1 ]
机构
[1] Sichuan Univ, Coll Chem, Chengdu 610064, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
MOLECULAR RECOGNITION FEATURES; INTRINSICALLY DISORDERED PROTEINS; SECONDARY STRUCTURE; WEB SERVER; BINDING; REGIONS; KNN;
D O I
10.1039/c6ra27161h
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Molecular recognition features (MoRFs) are relatively short segments (10-70 residues) within intrinsically disordered regions (IDRs) that can undergo disorder-to-order transitions during binding to partner proteins. Since MoRFs play key roles in important biological processes such as signaling and regulation, identifying them is crucial for a full understanding of the functional aspects of the IDRs. However, given the relative sparseness of MoRFs in protein sequences, the accuracy of the available MoRF predictors is often inadequate for practical usage, which leaves a significant need and room for improvement. In this work, we developed a novel sequence-based predictor for MoRFs using a support vector machine (SVM) algorithm. First, we constructed a comprehensive dataset of annotated MoRFs with the wide length between 10 and 70 residues. Our method firstly utilized the flanking regions to define the negative samples. Then, amino acid composition (AAC) and two previously unexplored features including composition, transition and distribution (CTD) and K nearest neighbors (KNN) score were used to characterize sequence information of MoRFs. Finally, using five-fold cross-validation, an overall accuracy of 75.75% was achieved through feature evaluation and optimization. When performed on an independent test set of 110 proteins, the method also yielded a promising accuracy of 64.98%. Additionally, through external validation on the negative samples, our method still shows comparative performance with other existing methods. We believe that this study will be useful in elucidating the mechanism of MoRFs and facilitating hypothesis-driven experimental design and validation.
引用
收藏
页码:18937 / 18945
页数:9
相关论文
共 50 条
  • [41] Recent advances in sequence-based protein structure prediction
    Dukka, B. K. C.
    BRIEFINGS IN BIOINFORMATICS, 2017, 18 (06) : 1021 - 1032
  • [42] Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity
    Qizhi Zhu
    Lihua Wang
    Ruyu Dai
    Wei Zhang
    Wending Tang
    Yannan Bin
    Zeliang Wang
    Junfeng Xia
    Interdisciplinary Sciences: Computational Life Sciences, 2021, 13 : 693 - 702
  • [43] Sequence-based prediction of transcription upregulation by auxin in plants
    Ponomarenko, Petr M.
    Ponomarenko, Mikhail P.
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2015, 13 (01)
  • [44] ThermoFinder: A sequence-based thermophilic proteins prediction framework
    Yu, Han
    Luo, Xiaozhou
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2024, 270
  • [45] Sequence-Based Prediction of Promiscuous Acyltransferase Activity in Hydrolases
    Department of Biotechnology & Enzyme Catalysis Institute of Biochemistry, University of Greifswald, Greifswald
    17487, Germany
    不详
    17487, Germany
    不详
    17487, Germany
    Adv Mater, 2020, 28 (11704-11709): : 11704 - 11709
  • [46] Sequence-based Prediction of Antimicrobial Peptides with CatBoost Classifier
    Yu, Jen-Chieh
    Ni, Kuan
    Chen, Ching-Tai
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2022), 2022, : 217 - 220
  • [47] Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity
    Zhu, Qizhi
    Wang, Lihua
    Dai, Ruyu
    Zhang, Wei
    Tang, Wending
    Bin, Yannan
    Wang, Zeliang
    Xia, Junfeng
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2021, 13 (04) : 693 - 702
  • [48] SOLpro: accurate sequence-based prediction of protein solubility
    Magnan, Christophe N.
    Randall, Arlo
    Baldi, Pierre
    BIOINFORMATICS, 2009, 25 (17) : 2200 - 2207
  • [49] Sequence-Based Viscosity Prediction for Rapid Antibody Engineering
    Estes, Bram
    Jain, Mani
    Jia, Lei
    Whoriskey, John
    Bennett, Brian
    Hsu, Hailing
    BIOMOLECULES, 2024, 14 (06)
  • [50] Sequence-based prediction of protein binding mode landscapes
    Horvath, Attila
    Miskei, Marton
    Ambrusl, Viktor
    Vendruscolo, Michele
    Fuxreiter, Monika
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (05)