Machine learning approach to gene essentiality prediction: a review

被引:61
|
作者
Aromolaran, Olufemi [1 ,2 ]
Aromolaran, Damilare [3 ]
Isewon, Itunuoluwa [1 ]
Oyelade, Jelili [1 ,4 ]
机构
[1] Covenant Univ, Dept Comp & Informat Sci, Ota, Nigeria
[2] Nigerian Bioinformat & Genom Network, Abuja, Nigeria
[3] Covenant Univ, Comp Sci Dept, Ota, Nigeria
[4] Covenant Univ, Bioinformat Res Cluster CUBRe, Dept Comp & Informat Sci, Ota, Nigeria
关键词
essential genes; essential proteins; feature selection; supervised learning; conditional essentiality; conditionally essential genes; IDENTIFYING ESSENTIAL GENES; FLUX BALANCE ANALYSIS; GENOME-SCALE ANALYSIS; PLASMODIUM-FALCIPARUM; METABOLIC PATHWAYS; WHOLE-GENOME; IDENTIFICATION; DATABASE; PROTEINS; SEQUENCE;
D O I
10.1093/bib/bbab128
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Essential genes are critical for the growth and survival of any organism. The machine learning approach complements the experimental methods to minimize the resources required for essentiality assays. Previous studies revealed the need to discover relevant features that significantly classify essential genes, improve on the generalizability of prediction models across organisms, and construct a robust gold standard as the class label for the train data to enhance prediction. Findings also show that a significant limitation of the machine learning approach is predicting conditionally essential genes. The essentiality status of a gene can change due to a specific condition of the organism. This review examines various methods applied to essential gene prediction task, their strengths, limitations and the factors responsible for effective computational prediction of essential genes. We discussed categories of features and how they contribute to the classification performance of essentiality prediction models. Five categories of features, namely, gene sequence, protein sequence, network topology, homology and gene ontology-based features, were generated for Caenorhabditis elegans to perform a comparative analysis of their essentiality prediction capacity. Gene ontology-based feature category outperformed other categories of features majorly due to its high correlation with the genes' biological functions. However, the topology feature category provided the highest discriminatory power making it more suitable for essentiality prediction. The major limiting factor of machine learning to predict essential genes conditionality is the unavailability of labeled data for interest conditions that can train a classifier. Therefore, cooperative machine learning could further exploit models that can perform well in conditional essentiality predictions. Short abstract Identification of essential genes is imperative because it provides an understanding of the core structure and function, accelerating drug targets' discovery, among other functions. Recent studies have applied machine learning to complement the experimental identification of essential genes. However, several factors are limiting the performance of machine learning approaches. This review aims to present the standard procedure and resources available for predicting essential genes in organisms, and also highlight the factors responsible for the current limitation in using machine learning for conditional gene essentiality prediction. The choice of features and ML technique was identified as an important factor to predict essential genes effectively.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Machine Learning Approach for Preterm Birth Prediction Using Health Records: Systematic Review
    Sharifi-Heris, Zahra
    Laitala, Juho
    Airola, Antti
    Rahmani, Amir M.
    Bender, Miriam
    JMIR MEDICAL INFORMATICS, 2022, 10 (04) : 18 - 35
  • [42] Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests
    Hor, Chiou-Yi
    Yang, Chang-Biau
    Yang, Zih-Jie
    Tseng, Chiou-Ting
    EVOLUTIONARY BIOINFORMATICS, 2013, 9 : 387 - 416
  • [43] Machine learning approach to student performance prediction of online learning
    Wang, Jing
    Yu, Yun
    PLOS ONE, 2025, 20 (01):
  • [44] Prediction of Protein Essentiality by the Support Vector Machine with Statistical Tests
    Hor, Chiou-Yi
    Yang, Chang-Biau
    Yang, Zih-Jie
    Tseng, Chiou-Ting
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 96 - 101
  • [45] Review of machine learning and deep learning models for toxicity prediction
    Guo, Wenjing
    Liu, Jie
    Dong, Fan
    Song, Meng
    Li, Zoe
    Khan, Md Kamrul Hasan
    Patterson, Tucker A.
    Hong, Huixiao
    EXPERIMENTAL BIOLOGY AND MEDICINE, 2023, 248 (21) : 1952 - 1973
  • [46] Gene Essentiality Analyzed by In Vivo Transposon Mutagenesis and Machine Learning in a Stable Haploid Isolate of Candida albicans
    Segal, Ella Shtifman
    Gritsenko, Vladimir
    Levitan, Anton
    Yadav, Bhawna
    Dror, Naama
    Steenwyk, Jacob L.
    Silberberg, Yael
    Mielich, Kevin
    Rokas, Antonis
    Gow, Neil A. R.
    Kunze, Reinhard
    Sharan, Roded
    Berman, Judith
    MBIO, 2018, 9 (05): : 1 - 21
  • [47] The Effect of Machine Learning Algorithms on Metagenomics Gene Prediction
    Al-Ajlan, Amani
    El Allali, Achraf
    ICBRA 2018: PROCEEDINGS OF 2018 5TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS RESEARCH AND APPLICATIONS, 2018, : 16 - 21
  • [48] Prediction of Drug Targets for Specific Diseases Leveraging Gene Perturbation Data: A Machine Learning Approach
    Zhao, Kai
    Shi, Yujia
    So, Hon-Cheong
    PHARMACEUTICS, 2022, 14 (02)
  • [49] Editorial: Machine Learning Techniques on Gene Function Prediction
    Zou, Quan
    Sangaiah, Arun Kumar
    Mrozek, Dariusz
    FRONTIERS IN GENETICS, 2019, 10
  • [50] miES: predicting the essentiality of miRNAs with machine learning and sequence features
    Song, Fei
    Cui, Chunmei
    Gao, Lin
    Cui, Qinghua
    BIOINFORMATICS, 2019, 35 (06) : 1053 - 1054