Classification of Retroviruses Based on Genomic Data Using RVGC

被引:0
|
作者
Aamir, Khalid Mahmood [1 ]
Bilal, Muhammad [2 ]
Ramzan, Muhammad [1 ,3 ]
Khan, Muhammad Attique [4 ]
Nam, Yunyoung [5 ]
Kadry, Seifedine [6 ]
机构
[1] Univ Sargodha, Dept CS & IT, Sargodha 40100, Pakistan
[2] Univ Mianwali, Dept CS & IT, Mianwali 42200, Pakistan
[3] Univ Management & Technol, Sch Syst & Technol, Lahore 54782, Pakistan
[4] HITEC Univ Taxila, Dept Comp Sci, Taxila, Pakistan
[5] Soonchunhyang Univ, Dept Comp Sci & Engn, Asan, South Korea
[6] Noroff Univ Coll, Fac Appl Comp & Technol, Kristiansand, Norway
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2021年 / 69卷 / 03期
关键词
Retroviruses; machine learning; bioinformatics; classification; MULTIPLE SEQUENCE ALIGNMENT; SEARCH; EXPRESSION;
D O I
10.32604/cmc.2021.017835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retroviruses are a large group of infectious agents with similar virion structures and replication mechanisms. AIDS, cancer, neurologic disorders, and other clinical conditions can all be fatal due to retrovirus infections. Detection of retroviruses by genome sequence is a biological problem that benefits from computational methods. The National Center for Biotechnol-ogy Information (NCBI) promotes science and health by making biomedical and genomic data available to the public. This research aims to classify the different types of rotavirus genome sequences available at the NCBI. First, nucleotide pattern occurrences are counted in the given genome sequences at the preprocessing stage. Based on some significant results, the number of features used for classification is reduced to five. The classification shall be carried out in two phases. The first phase of classification shall select only two features. Unclassified data in the first phase is transferred to the next phase, where the final decision is taken with the remaining three features. Three data sets of animals and human retroviruses are selected; the training data set is used to minimize the classifier's number and training; the validation data set is used to validate the models. The performance of the classifier is analyzed using the test data set. Also, we use decision tree, naive Bayes, k-nearest neighbors, and vector support machines to compare results. The results show that the proposed approach performs better than the existing methods for the retrovirus's imbalanced genome-sequence dataset.
引用
收藏
页码:3829 / 3844
页数:16
相关论文
共 50 条
  • [21] Endogenous retroviruses - Aiding and abetting genomic plasticity
    Eiden, M. V.
    CELLULAR AND MOLECULAR LIFE SCIENCES, 2008, 65 (21) : 3327 - 3328
  • [22] A k-mer-Based Approach for Phylogenetic Classification of Taxa in Environmental Genomic Data
    Van Etten, Julia
    Stephens, Timothy G.
    Bhattacharya, Debashish
    SYSTEMATIC BIOLOGY, 2023, 72 (05) : 1101 - 1118
  • [23] An updated phylogeny, biogeography, and PhyloCode-based classification of Cornaceae based on three sets of genomic data
    Du, Zhi-Yuan
    Xiang, Qiu-Yun
    Cheng, Jin
    Zhou, Wenbin
    Wang, Qing-Feng
    Soltis, Douglas E.
    Soltis, Pamela S.
    AMERICAN JOURNAL OF BOTANY, 2023, 110 (02)
  • [24] METHOD FOR CLASSIFICATION OF 5' TERMINI OF RETROVIRUSES
    HASELTINE, WA
    KLEID, DG
    NATURE, 1978, 273 (5661) : 358 - 364
  • [25] Frequency based Classification of Activities using Accelerometer Data
    Sharma, Annapurna
    Purwar, Amit
    Lee, Young-Dong
    Lee, Young-Sook
    Chung, Wan-Young
    2008 IEEE INTERNATIONAL CONFERENCE ON MULTISENSOR FUSION AND INTEGRATION FOR INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 305 - +
  • [26] Classification of microarray data using kernel based classifiers
    Swati S.
    Kumar M.
    Mishra R.K.
    Revue d'Intelligence Artificielle, 2019, 33 (03) : 235 - 247
  • [27] Identification and classification of endogenous retroviruses in cattle
    Xiao, Rui
    Park, Kwangha
    Lee, Hoontaek
    Kim, Jinhoi
    Park, Chankyu
    JOURNAL OF VIROLOGY, 2008, 82 (01) : 582 - 587
  • [28] Classification of yeasts using array-based comparative genomic hybridization.
    Yoshimoto, H
    Ohuchi, R
    Ishiguro, T
    Tanaka, K
    Mizutani, S
    Tashiro, K
    Kuhara, S
    Kobayashi, O
    YEAST, 2003, 20 : S278 - S278
  • [29] Genomic data provide insights into the classification of extant termites
    Hellemans, Simon
    Rocha, Mauricio M.
    Wang, Menglin
    Arias, Johanna Romero
    Aanen, Duur K.
    Bagneres, Anne-Genevieve
    Bucek, Ales
    Carrijo, Tiago F.
    Chouvenc, Thomas
    Cuezzo, Carolina
    Constantini, Joice P.
    Constantino, Reginaldo
    Dedeine, Franck
    Deligne, Jean
    Eggleton, Paul
    Evans, Theodore A.
    Hanus, Robert
    Harrison, Mark C.
    Harry, Myriam
    Josens, Guy
    Jouault, Corentin
    Kalleshwaraswamy, Chicknayakanahalli M.
    Kaymak, Esra
    Korb, Judith
    Lee, Chow-Yang
    Legendre, Frederic
    Li, Hou-Feng
    Lo, Nathan
    Lu, Tomer
    Matsuura, Kenji
    Maekawa, Kiyoto
    McMahon, Dino P.
    Mizumoto, Nobuaki
    Oliveira, Danilo E.
    Poulsen, Michael
    Sillam-Dusses, David
    Su, Nan-Yao
    Tokuda, Gaku
    Vargo, Edward L.
    Ware, Jessica L.
    Sobotnik, Jan
    Scheffrahn, Rudolf H.
    Cancello, Eliana
    Roisin, Yves
    Engel, Michael S.
    Bourguignon, Thomas
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [30] GeneBrowser: an approach for integration and functional classification of genomic data
    Arrais, Joel
    Santos, Bruno
    Fernandes, Joao
    Carreto, Laura
    Santos, Manuel A. S.
    Oliveira, Jose Luis
    JOURNAL OF INTEGRATIVE BIOINFORMATICS, 2007, 4 (03)