Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection

被引:20
|
作者
Le, Nguyen Quoc Khanh [1 ]
Li, Wanru [2 ]
Cao, Yanshuang [2 ]
机构
[1] Taipei Med Univ, Coll Med, Profess Master Program Artificial Intelligence Med, Taipei 110, Taiwan
[2] Natl Univ Singapore, Inst Syst Sci, Singapore, Singapore
关键词
crystallization; feature selection; machine learning; protein sequence; prediction model; support vector machine; NETWORK;
D O I
10.1093/bib/bbad319
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (chi(2)) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Sequence-Based Classification Using Discriminatory Motif Feature Selection
    Xiong, Hao
    Capurso, Daniel
    Sen, Saunak
    Segal, Mark R.
    PLOS ONE, 2011, 6 (11):
  • [22] A Survey of Feature Selection for Vulnerability Prediction Using Feature-based Machine Learning
    Li, ZhanJun
    Shao, Yan
    ICMLC 2019: 2019 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, 2019, : 30 - 36
  • [23] A Novel Sequence-Based Method for Phosphorylation Site Prediction with Feature Selection and Analysis
    He, Zhi-Song
    Shi, Xiao-He
    Kong, Xiang-Ying
    Zhu, Yu-Bei
    Chou, Kuo-Chen
    PROTEIN AND PEPTIDE LETTERS, 2012, 19 (01): : 70 - 78
  • [24] SRPNet: stroke risk prediction based on two-level feature selection and deep fusion network
    Zhang, Daoliang
    Yu, Na
    Yang, Xiaodan
    De Marinis, Yang
    Liu, Zhi-Ping
    Gao, Rui
    FRONTIERS IN PHYSIOLOGY, 2024, 15
  • [25] A Gas Emission Prediction Model Based on Feature Selection and Improved Machine Learning
    Shao, Liangshan
    Zhang, Kun
    PROCESSES, 2023, 11 (03)
  • [26] Compact Genetic Algorithm-Based Feature Selection for Sequence-Based Prediction of Dengue-Human Protein Interactions
    Dey, Lopamudra
    Mukhopadhyay, Anirban
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2022, 19 (04) : 2137 - 2148
  • [27] VacPred: Sequence-based prediction of plant vacuole proteins using machine-learning techniques
    Yadav, Arvind Kumar
    Singla, Deepak
    JOURNAL OF BIOSCIENCES, 2020, 45 (01)
  • [28] VacPred: Sequence-based prediction of plant vacuole proteins using machine-learning techniques
    Arvind Kumar Yadav
    Deepak Singla
    Journal of Biosciences, 2020, 45
  • [29] Sequence-Based Prediction of Protein Phase Separation: The Role of Beta-Pairing Propensity
    Mullick, Pratik
    Trovato, Antonio
    BIOMOLECULES, 2022, 12 (12)
  • [30] A Two-Level Machine Learning Prediction Approach for RAC Compressive Strength
    Qi, Fei
    Li, Hangyu
    BUILDINGS, 2024, 14 (09)