Classification of Retroviruses Based on Genomic Data Using RVGC

被引:0
|
作者
Aamir, Khalid Mahmood [1 ]
Bilal, Muhammad [2 ]
Ramzan, Muhammad [1 ,3 ]
Khan, Muhammad Attique [4 ]
Nam, Yunyoung [5 ]
Kadry, Seifedine [6 ]
机构
[1] Univ Sargodha, Dept CS & IT, Sargodha 40100, Pakistan
[2] Univ Mianwali, Dept CS & IT, Mianwali 42200, Pakistan
[3] Univ Management & Technol, Sch Syst & Technol, Lahore 54782, Pakistan
[4] HITEC Univ Taxila, Dept Comp Sci, Taxila, Pakistan
[5] Soonchunhyang Univ, Dept Comp Sci & Engn, Asan, South Korea
[6] Noroff Univ Coll, Fac Appl Comp & Technol, Kristiansand, Norway
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2021年 / 69卷 / 03期
关键词
Retroviruses; machine learning; bioinformatics; classification; MULTIPLE SEQUENCE ALIGNMENT; SEARCH; EXPRESSION;
D O I
10.32604/cmc.2021.017835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retroviruses are a large group of infectious agents with similar virion structures and replication mechanisms. AIDS, cancer, neurologic disorders, and other clinical conditions can all be fatal due to retrovirus infections. Detection of retroviruses by genome sequence is a biological problem that benefits from computational methods. The National Center for Biotechnol-ogy Information (NCBI) promotes science and health by making biomedical and genomic data available to the public. This research aims to classify the different types of rotavirus genome sequences available at the NCBI. First, nucleotide pattern occurrences are counted in the given genome sequences at the preprocessing stage. Based on some significant results, the number of features used for classification is reduced to five. The classification shall be carried out in two phases. The first phase of classification shall select only two features. Unclassified data in the first phase is transferred to the next phase, where the final decision is taken with the remaining three features. Three data sets of animals and human retroviruses are selected; the training data set is used to minimize the classifier's number and training; the validation data set is used to validate the models. The performance of the classifier is analyzed using the test data set. Also, we use decision tree, naive Bayes, k-nearest neighbors, and vector support machines to compare results. The results show that the proposed approach performs better than the existing methods for the retrovirus's imbalanced genome-sequence dataset.
引用
收藏
页码:3829 / 3844
页数:16
相关论文
共 50 条
  • [31] Classification of genomic data: Some aspects of feature selection
    Czekaj, Tomasz
    Wu, Wen
    Walczak, Beata
    TALANTA, 2008, 76 (03) : 564 - 574
  • [32] Genomic data provide insights into the classification of extant termites
    Simon Hellemans
    Mauricio M. Rocha
    Menglin Wang
    Johanna Romero Arias
    Duur K. Aanen
    Anne-Geneviève Bagnères
    Aleš Buček
    Tiago F. Carrijo
    Thomas Chouvenc
    Carolina Cuezzo
    Joice P. Constantini
    Reginaldo Constantino
    Franck Dedeine
    Jean Deligne
    Paul Eggleton
    Theodore A. Evans
    Robert Hanus
    Mark C. Harrison
    Myriam Harry
    Guy Josens
    Corentin Jouault
    Chicknayakanahalli M. Kalleshwaraswamy
    Esra Kaymak
    Judith Korb
    Chow-Yang Lee
    Frédéric Legendre
    Hou-Feng Li
    Nathan Lo
    Tomer Lu
    Kenji Matsuura
    Kiyoto Maekawa
    Dino P. McMahon
    Nobuaki Mizumoto
    Danilo E. Oliveira
    Michael Poulsen
    David Sillam-Dussès
    Nan-Yao Su
    Gaku Tokuda
    Edward L. Vargo
    Jessica L. Ware
    Jan Šobotník
    Rudolf H. Scheffrahn
    Eliana Cancello
    Yves Roisin
    Michael S. Engel
    Thomas Bourguignon
    Nature Communications, 15 (1)
  • [33] MODIS Data-based Crop Classification using Selective Hierarchical Classification
    Kim, Yeseul
    Lee, Kyung-Do
    Na, Sang-Il
    Hong, Suk-Young
    Park, No-Wook
    Yoo, Hee Young
    KOREAN JOURNAL OF REMOTE SENSING, 2016, 32 (03) : 235 - 244
  • [34] COVID-19 detection and classification for machine learning methods using human genomic data
    Ahemad M.T.
    Hameed M.A.
    Vankdothu R.
    Measurement: Sensors, 2022, 24
  • [35] Endogenous retrovirusesEndogenous retroviruses – Aiding and Abetting genomic plasticity
    M. V. Eiden
    Cellular and Molecular Life Sciences, 2008, 65 : 3325 - 3328
  • [36] Utilizing population-based genomic data to expedite the curation of genes and genomic regions for the ClinGen "dosage sensitivity unlikely" classification
    Good, Molly
    Andersen, Erica
    Clayton, Adam
    Zhao, Jian
    Martin, Christa
    Riggs, Erin
    MOLECULAR GENETICS AND METABOLISM, 2021, 132 : S228 - S229
  • [37] Genomic environment and digital expression of bovine endogenous retroviruses
    Garcia-Etxebarria, Koldo
    Jugo, Begona M.
    GENE, 2014, 548 (01) : 14 - 21
  • [38] Analyzing Genomic Data Using Tensor-based Orthogonal Polynomials
    Nafees, Saba
    Rice, Sean H.
    Phillips, Caleb
    ACM-BCB'18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY, AND HEALTH INFORMATICS, 2018, : 584 - 584
  • [39] Env-less endogenous retroviruses are genomic superspreaders
    Magiorkinis, Gkikas
    Gifford, Robert J.
    Katzourakis, Aris
    De Ranter, Joris
    Belshaw, Robert
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (19) : 7385 - 7390
  • [40] INVESTIGATION OF THE GENOMIC STRUCTURE OF TYPE-D RETROVIRUSES
    UCKERT, W
    WUNDERLICH, V
    STEIN, U
    KETTMANN, R
    BIERWOLF, D
    EUROPEAN JOURNAL OF CANCER & CLINICAL ONCOLOGY, 1985, 21 (11): : 1422 - 1422