Retroviruses are a large group of infectious agents with similar virion structures and replication mechanisms. AIDS, cancer, neurologic disorders, and other clinical conditions can all be fatal due to retrovirus infections. Detection of retroviruses by genome sequence is a biological problem that benefits from computational methods. The National Center for Biotechnol-ogy Information (NCBI) promotes science and health by making biomedical and genomic data available to the public. This research aims to classify the different types of rotavirus genome sequences available at the NCBI. First, nucleotide pattern occurrences are counted in the given genome sequences at the preprocessing stage. Based on some significant results, the number of features used for classification is reduced to five. The classification shall be carried out in two phases. The first phase of classification shall select only two features. Unclassified data in the first phase is transferred to the next phase, where the final decision is taken with the remaining three features. Three data sets of animals and human retroviruses are selected; the training data set is used to minimize the classifier's number and training; the validation data set is used to validate the models. The performance of the classifier is analyzed using the test data set. Also, we use decision tree, naive Bayes, k-nearest neighbors, and vector support machines to compare results. The results show that the proposed approach performs better than the existing methods for the retrovirus's imbalanced genome-sequence dataset.
机构:
Rutgers State Univ, Grad Program Ecol & Evolut, 14 Coll Farm Rd, New Brunswick, NJ 08901 USARutgers State Univ, Grad Program Ecol & Evolut, 14 Coll Farm Rd, New Brunswick, NJ 08901 USA
Van Etten, Julia
Stephens, Timothy G.
论文数: 0引用数: 0
h-index: 0
机构:
Rutgers State Univ, Dept Biochem & Microbiol, 59 Dudley Rd, New Brunswick, NJ 08901 USARutgers State Univ, Grad Program Ecol & Evolut, 14 Coll Farm Rd, New Brunswick, NJ 08901 USA
Stephens, Timothy G.
Bhattacharya, Debashish
论文数: 0引用数: 0
h-index: 0
机构:
Rutgers State Univ, Dept Biochem & Microbiol, 59 Dudley Rd, New Brunswick, NJ 08901 USARutgers State Univ, Grad Program Ecol & Evolut, 14 Coll Farm Rd, New Brunswick, NJ 08901 USA
机构:
Department of Computer Science and Engineering, National Institute of Technology, PatnaDepartment of Computer Science and Engineering, National Institute of Technology, Patna
机构:
Konkuk Univ, Dept Anim Biotechnol, Lab Mammalian Genom, Seoul 143701, South KoreaKonkuk Univ, Dept Anim Biotechnol, Lab Mammalian Genom, Seoul 143701, South Korea
Xiao, Rui
Park, Kwangha
论文数: 0引用数: 0
h-index: 0
机构:
Konkuk Univ, Dept Anim Biotechnol, Lab Mammalian Genom, Seoul 143701, South KoreaKonkuk Univ, Dept Anim Biotechnol, Lab Mammalian Genom, Seoul 143701, South Korea
Park, Kwangha
Lee, Hoontaek
论文数: 0引用数: 0
h-index: 0
机构:
Konkuk Univ, Dept Anim Biotechnol, Lab Mammalian Genom, Seoul 143701, South KoreaKonkuk Univ, Dept Anim Biotechnol, Lab Mammalian Genom, Seoul 143701, South Korea
Lee, Hoontaek
Kim, Jinhoi
论文数: 0引用数: 0
h-index: 0
机构:
Konkuk Univ, Dept Anim Biotechnol, Lab Mammalian Genom, Seoul 143701, South KoreaKonkuk Univ, Dept Anim Biotechnol, Lab Mammalian Genom, Seoul 143701, South Korea
Kim, Jinhoi
Park, Chankyu
论文数: 0引用数: 0
h-index: 0
机构:
Konkuk Univ, Dept Anim Biotechnol, Lab Mammalian Genom, Seoul 143701, South KoreaKonkuk Univ, Dept Anim Biotechnol, Lab Mammalian Genom, Seoul 143701, South Korea
机构:
Amer Museum Nat Hist, Div Invertebrate Zool, New York, NY USA
Univ Nacl Mayor San Marcos, Fac Ciencias Biol, Lima, Peru
Univ Nacl Mayor San Marcos, Dept Entomol, Museo Hist Nat, Lima 14, PeruGrad Univ, Okinawa Inst Sci & Technol, Okinawa, Japan