Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection

被引:20
|
作者
Le, Nguyen Quoc Khanh [1 ]
Li, Wanru [2 ]
Cao, Yanshuang [2 ]
机构
[1] Taipei Med Univ, Coll Med, Profess Master Program Artificial Intelligence Med, Taipei 110, Taiwan
[2] Natl Univ Singapore, Inst Syst Sci, Singapore, Singapore
关键词
crystallization; feature selection; machine learning; protein sequence; prediction model; support vector machine; NETWORK;
D O I
10.1093/bib/bbad319
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (chi(2)) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Sarcopenia feature selection and risk prediction using machine learning
    Yoo, Jun-Il
    Park, Chan-Ho
    Kim, Hyeonmok
    JOURNAL OF BONE AND MINERAL RESEARCH, 2019, 34 : 145 - 145
  • [42] Prediction of Heart Failure by using Machine Learning and Feature Selection
    Aslam, Muhammad Haseeb
    Hussain, Syed Fawad
    2022 17TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES (ICET'22), 2022, : 160 - 165
  • [43] DeepSol: a deep learning framework for sequence-based protein solubility prediction
    Khurana, Sameer
    Rawi, Reda
    Kunji, Khalid
    Chuang, Gwo-Yu
    Bensmail, Halima
    Mall, Raghvendra
    BIOINFORMATICS, 2018, 34 (15) : 2605 - 2613
  • [44] Sequence-Based Prediction of Plant Allergenic Proteins: Machine Learning Classification Approach
    Nedyalkova, Miroslava
    Vasighi, Mahdi
    Azmoon, Amirreza
    Naneva, Ludmila
    Simeonov, Vasil
    ACS OMEGA, 2023, : 3698 - 3704
  • [45] Sequence representation approaches for sequence-based protein prediction tasks that use deep learning
    Cui, Feifei
    Zhang, Zilong
    Zou, Quan
    BRIEFINGS IN FUNCTIONAL GENOMICS, 2021, 20 (01) : 61 - 73
  • [46] Crowdfunding performance prediction using feature-selection-based machine learning models
    Feng, Yuanyue
    Luo, Yuhong
    Peng, Nianjiao
    Niu, Ben
    EXPERT SYSTEMS, 2024, 41 (10)
  • [47] Battery Health Prediction Using Fusion-Based Feature Selection and Machine Learning
    Hu, Xiaosong
    Che, Yunhong
    Lin, Xianke
    Onori, Simona
    IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2021, 7 (02) : 382 - 398
  • [48] CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
    Muhammad Rafid, Ali Haisam
    Toufikuzzaman, Md.
    Rahman, Mohammad Saifur
    Rahman, M. Sohel
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [49] CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
    Ali Haisam Muhammad Rafid
    Md. Toufikuzzaman
    Mohammad Saifur Rahman
    M. Sohel Rahman
    BMC Bioinformatics, 21
  • [50] ATGPred-FL: sequence-based prediction of autophagy proteins with feature representation learning
    Jiao, Shihu
    Chen, Zheng
    Zhang, Lichao
    Zhou, Xun
    Shi, Lei
    AMINO ACIDS, 2022, 54 (05) : 799 - 809