Predicting essential genes in prokaryotic genomes using a linear method: ZUPLS

被引:20
|
作者
Song, Kai [1 ]
Tong, Tuopong [1 ]
Wu, Fang [1 ]
机构
[1] Tianjin Univ, Sch Chem Engn & Technol, Tianjin 300072, Peoples R China
基金
中国国家自然科学基金;
关键词
UNINFORMATIVE VARIABLE ELIMINATION; SHORT CODING SEQUENCES; MYCOBACTERIUM-TUBERCULOSIS; GRAM STAIN; RECOGNITION; SELECTION; PATTERN; BIOLOGY; BIAS;
D O I
10.1039/c3ib40241j
中图分类号
Q2 [细胞生物学];
学科分类号
071009 ; 090102 ;
摘要
An effective linear method, ZUPLS, was developed to improve the accuracy and speed of prokaryotic essential gene identification. ZUPLS only uses the Z-curve and other sequence-based features. Such features can be calculated readily from the DNA/amino acid sequences. Therefore, no well-studied biological network knowledge is required for using ZUPLS. This significantly simplifies essential gene identification, especially for newly sequenced species. ZUPLS can also select necessary features automatically by embedding the uninformative variable elimination tool into the partial least squares classifier. No optimized modelling parameters are needed. ZUPLS has been used, herein, to predict essential genes of 12 remotely related prokaryotes to test its performance. The cross-organism predictions yielded AUC (Area Under the Curve) scores between 0.8042 and 0.9319 by using E. coli genes as the training samples. Similarly, ZUPLS achieved AUC scores between 0.8111 and 0.9371 by using B. subtilis genes as the training samples. We also compared it with the best available results of the existing approaches for further testing. The improvement of the AUC score in predicting B. subtilis essential genes using E. coli genes was 0.13. Additionally, in predicting E. coli essential genes using P. aeruginosa genes, the significant improvement was 0.10. Similarly, the exceptional improvement of the average accuracy of M. pulmonis using M. genitalium and M. pulmonis genes was 14.7%. The combined superior feature extraction and selection power of ZUPLS enable it to give reliable prediction of essential genes for both Gram-positive/negative organisms and rich/poor culture media.
引用
收藏
页码:460 / 469
页数:10
相关论文
共 50 条
  • [21] Predicting Essential Genes of Escherichia coli based on Clustering Method
    Liu, Xiao
    He, Ting
    Guo, Zhirui
    Ren, Meixiang
    ICBBT 2019: 2019 11TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL TECHNOLOGY, 2019, : 45 - 49
  • [22] Comparative analysis of essential genes in prokaryotic genomic islands
    Xi Zhang
    Chong Peng
    Ge Zhang
    Feng Gao
    Scientific Reports, 5
  • [23] Comparative analysis of essential genes in prokaryotic genomic islands
    Zhang, Xi
    Peng, Chong
    Zhang, Ge
    Gao, Feng
    SCIENTIFIC REPORTS, 2015, 5
  • [24] Putative essential and core-essential genes in Mycoplasma genomes
    Yan Lin
    Randy Ren Zhang
    Scientific Reports, 1
  • [25] Putative essential and core-essential genes in Mycoplasma genomes
    Lin, Yan
    Zhang, Randy Ren
    SCIENTIFIC REPORTS, 2011, 1
  • [26] Probabilistic methods of identifying genes in prokaryotic genomes: Connections to the FIMM theory
    Azad, RK
    Borodovsky, M
    BRIEFINGS IN BIOINFORMATICS, 2004, 5 (02) : 118 - 130
  • [27] InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes
    Jingchun Sun
    Yan Sun
    Guohui Ding
    Qi Liu
    Chuan Wang
    Youyu He
    Tieliu Shi
    Yixue Li
    Zhongming Zhao
    BMC Bioinformatics, 8
  • [28] GeneTack database: genes with frameshifts in prokaryotic genomes and eukaryotic mRNA sequences
    Antonov, Ivan
    Baranov, Pavel
    Borodovsky, Mark
    NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D152 - D156
  • [29] InPrePPI: an integrated evaluation method based on genomic context for predicting protein-protein interactions in prokaryotic genomes
    Sun, Jingchun
    Sun, Yan
    Ding, Guohui
    Liu, Qi
    Wang, Chuan
    He, Youyu
    Shi, Tieliu
    Li, Yixue
    Zhao, Zhongming
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [30] Predicting Essential Proteins Using a New Method
    Tang, Xi-wei
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2017, PT II, 2017, 10362 : 301 - 308