Classification of Retroviruses Based on Genomic Data Using RVGC

被引:0
|
作者
Aamir, Khalid Mahmood [1 ]
Bilal, Muhammad [2 ]
Ramzan, Muhammad [1 ,3 ]
Khan, Muhammad Attique [4 ]
Nam, Yunyoung [5 ]
Kadry, Seifedine [6 ]
机构
[1] Univ Sargodha, Dept CS & IT, Sargodha 40100, Pakistan
[2] Univ Mianwali, Dept CS & IT, Mianwali 42200, Pakistan
[3] Univ Management & Technol, Sch Syst & Technol, Lahore 54782, Pakistan
[4] HITEC Univ Taxila, Dept Comp Sci, Taxila, Pakistan
[5] Soonchunhyang Univ, Dept Comp Sci & Engn, Asan, South Korea
[6] Noroff Univ Coll, Fac Appl Comp & Technol, Kristiansand, Norway
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2021年 / 69卷 / 03期
关键词
Retroviruses; machine learning; bioinformatics; classification; MULTIPLE SEQUENCE ALIGNMENT; SEARCH; EXPRESSION;
D O I
10.32604/cmc.2021.017835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retroviruses are a large group of infectious agents with similar virion structures and replication mechanisms. AIDS, cancer, neurologic disorders, and other clinical conditions can all be fatal due to retrovirus infections. Detection of retroviruses by genome sequence is a biological problem that benefits from computational methods. The National Center for Biotechnol-ogy Information (NCBI) promotes science and health by making biomedical and genomic data available to the public. This research aims to classify the different types of rotavirus genome sequences available at the NCBI. First, nucleotide pattern occurrences are counted in the given genome sequences at the preprocessing stage. Based on some significant results, the number of features used for classification is reduced to five. The classification shall be carried out in two phases. The first phase of classification shall select only two features. Unclassified data in the first phase is transferred to the next phase, where the final decision is taken with the remaining three features. Three data sets of animals and human retroviruses are selected; the training data set is used to minimize the classifier's number and training; the validation data set is used to validate the models. The performance of the classifier is analyzed using the test data set. Also, we use decision tree, naive Bayes, k-nearest neighbors, and vector support machines to compare results. The results show that the proposed approach performs better than the existing methods for the retrovirus's imbalanced genome-sequence dataset.
引用
收藏
页码:3829 / 3844
页数:16
相关论文
共 50 条
  • [41] Human endogenous retroviruses: our genomic fossils and companions
    Stein, Richard A.
    DePaola, Rosalie V.
    PHYSIOLOGICAL GENOMICS, 2023, 55 (06) : 249 - 258
  • [42] Human endogenous retroviruses and pathogenicity:: genomic considerations -: Response
    Löwer, R
    TRENDS IN MICROBIOLOGY, 1999, 7 (11) : 431 - 432
  • [43] Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data
    Mallik, Saurav
    Zhao, Zhongming
    BRIEFINGS IN BIOINFORMATICS, 2020, 21 (02) : 368 - 394
  • [44] Enhancing GALS processor performance using data classification based on data latency
    López, S
    Garnica, O
    Colmenar, JM
    INTEGRATED CIRCUIT AND SYSTEM DESIGN: POWER AND TIMING MODELING, OPTIMIZATION AND SIMULATION, 2004, 3254 : 623 - 632
  • [45] GMM-based classification of genomic sequences
    Akhtar, Mahmood
    Ambikairajah, Eliathamby
    Epps, Julien
    PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, 2007, : 103 - +
  • [46] ArrayCGH based classification of neuroblastoma into genomic subgroups
    De Brouwer, Sara
    Michels, Evi
    Vandesompele, Jo
    De Preter, Katleen
    Hoebeeck, Jasmien
    Vermeulen, Joelle
    Schramm, Alexander
    Molenaar, Jan J.
    Menten, Bjoern
    Marques, Barbara
    Stallings, Raymond L.
    Combaret, Valerie
    Devalck, Christine
    De Paepe, Anne
    Versteeg, Rogier
    Eggert, Angelika
    Laureys, Geneveive
    Van Roy, Nadine
    Speleman, Frank
    CHROMOSOME RESEARCH, 2007, 15 : 82 - 83
  • [47] Molecular classification of human gliomas using matrix-based comparative genomic hybridization
    Roerig, P
    Nessling, M
    Radlwimmer, B
    Joos, S
    Wrobel, G
    Schwaenen, C
    Reifenberger, G
    Lichter, P
    NEURO-ONCOLOGY, 2005, 7 (03) : 366 - 366
  • [48] Molecular classification of human gliomas using matrix-based comparative genomic hybridization
    Roerig, P
    Nessling, M
    Radlwimmer, B
    Joos, S
    Wrobel, G
    Schwaenen, C
    Reifenberger, G
    Lichter, P
    INTERNATIONAL JOURNAL OF CANCER, 2005, 117 (01) : 95 - 103
  • [49] Genomic pan-cancer classification using image-based deep learning
    Ye T.
    Li S.
    Zhang Y.
    Zhang, Yang (zhangyang07@hit.edu.cn), 1600, Elsevier B.V. (19): : 835 - 846
  • [50] Genomic pan-cancer classification using image-based deep learning
    Ye, Taoyu
    Li, Sen
    Zhang, Yang
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2021, 19 : 835 - 846