Classification of Retroviruses Based on Genomic Data Using RVGC

被引:0
|
作者
Aamir, Khalid Mahmood [1 ]
Bilal, Muhammad [2 ]
Ramzan, Muhammad [1 ,3 ]
Khan, Muhammad Attique [4 ]
Nam, Yunyoung [5 ]
Kadry, Seifedine [6 ]
机构
[1] Univ Sargodha, Dept CS & IT, Sargodha 40100, Pakistan
[2] Univ Mianwali, Dept CS & IT, Mianwali 42200, Pakistan
[3] Univ Management & Technol, Sch Syst & Technol, Lahore 54782, Pakistan
[4] HITEC Univ Taxila, Dept Comp Sci, Taxila, Pakistan
[5] Soonchunhyang Univ, Dept Comp Sci & Engn, Asan, South Korea
[6] Noroff Univ Coll, Fac Appl Comp & Technol, Kristiansand, Norway
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2021年 / 69卷 / 03期
关键词
Retroviruses; machine learning; bioinformatics; classification; MULTIPLE SEQUENCE ALIGNMENT; SEARCH; EXPRESSION;
D O I
10.32604/cmc.2021.017835
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Retroviruses are a large group of infectious agents with similar virion structures and replication mechanisms. AIDS, cancer, neurologic disorders, and other clinical conditions can all be fatal due to retrovirus infections. Detection of retroviruses by genome sequence is a biological problem that benefits from computational methods. The National Center for Biotechnol-ogy Information (NCBI) promotes science and health by making biomedical and genomic data available to the public. This research aims to classify the different types of rotavirus genome sequences available at the NCBI. First, nucleotide pattern occurrences are counted in the given genome sequences at the preprocessing stage. Based on some significant results, the number of features used for classification is reduced to five. The classification shall be carried out in two phases. The first phase of classification shall select only two features. Unclassified data in the first phase is transferred to the next phase, where the final decision is taken with the remaining three features. Three data sets of animals and human retroviruses are selected; the training data set is used to minimize the classifier's number and training; the validation data set is used to validate the models. The performance of the classifier is analyzed using the test data set. Also, we use decision tree, naive Bayes, k-nearest neighbors, and vector support machines to compare results. The results show that the proposed approach performs better than the existing methods for the retrovirus's imbalanced genome-sequence dataset.
引用
收藏
页码:3829 / 3844
页数:16
相关论文
共 50 条
  • [1] Classification and selection of biomarkers in genomic data using LASSO
    Ghosh, D
    Chinnaiyan, AM
    JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2005, (02): : 147 - 154
  • [2] A Pathway-Based Kernel Boosting Method for Sample Classification Using Genomic Data
    Zeng, Li
    Yu, Zhaolong
    Zhao, Hongyu
    GENES, 2019, 10 (09)
  • [3] Classification of Multi-Genomic Data using MapReduce Paradigm
    Pahadia, Mayank
    Srivastava, Akash
    Srivastava, Divyang
    Patil, Nagamma
    2015 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION & AUTOMATION (ICCCA), 2015, : 678 - 682
  • [4] Classification and deep-learning–based prediction of Alzheimer disease subtypes by using genomic data
    Daichi Shigemizu
    Shintaro Akiyama
    Mutsumi Suganuma
    Motoki Furutani
    Akiko Yamakawa
    Yukiko Nakano
    Kouichi Ozaki
    Shumpei Niida
    Translational Psychiatry, 13
  • [5] Classification and deep-learning-based prediction of Alzheimer disease subtypes by using genomic data
    Shigemizu, Daichi
    Akiyama, Shintaro
    Suganuma, Mutsumi
    Furutani, Motoki
    Yamakawa, Akiko
    Nakano, Yukiko
    Ozaki, Kouichi
    Niida, Shumpei
    TRANSLATIONAL PSYCHIATRY, 2023, 13 (01)
  • [6] Taming the beast: a revised classification of Cortinariaceae based on genomic data
    Liimatainen, Kare
    Kim, Jan T.
    Pokorny, Lisa
    Kirk, Paul M.
    Dentinger, Bryn
    Niskanen, Tuula
    FUNGAL DIVERSITY, 2022, 112 (01) : 89 - 170
  • [7] Taming the beast: a revised classification of Cortinariaceae based on genomic data
    Kare Liimatainen
    Jan T. Kim
    Lisa Pokorny
    Paul M. Kirk
    Bryn Dentinger
    Tuula Niskanen
    Fungal Diversity, 2022, 112 : 89 - 170
  • [8] Optimized multilayer perceptrons for molecular classification and diagnosis using genomic data
    Wang, ZY
    Wang, Y
    Xuan, JH
    Dong, YB
    Bakay, M
    Feng, YJ
    Clarke, R
    Hoffman, EP
    BIOINFORMATICS, 2006, 22 (06) : 755 - 761
  • [9] Identification and classification of feline endogenous retroviruses in the cat genome using degenerate PCR and in silico data analysis
    Song, Ning
    Jo, Haiin
    Choi, Minkyeung
    Kim, Jin-Hoi
    Seo, Han Geuk
    Cha, Se-Yeoun
    Seo, Kunho
    Park, Chankyu
    JOURNAL OF GENERAL VIROLOGY, 2013, 94 : 1587 - 1596
  • [10] Identification and classification of endogenous retroviruses in the canine genome using degenerative PCR and in-silico data analysis
    Jo, Haiin
    Choi, Hojun
    Choi, Min-Kyeung
    Song, Ning
    Kim, Jin-Hoi
    Oh, Jae-Wook
    Seo, Kunho
    Seo, Han Geuk
    Chun, Taehoon
    Kim, Tae-Hun
    Park, Chankyu
    VIROLOGY, 2012, 422 (02) : 195 - 204