Gene Classification Using Codon Usage and Support Vector Machines

被引:28
|
作者
Ma, Jianmin [1 ]
Nguyen, Minh N. [1 ]
Rajapakse, Jagath C. [1 ]
机构
[1] Nanyang Technol Univ, Bioinformat Res Ctr, Singapore 637553, Singapore
关键词
Codon usage bias; gene classification; Human Leukocyte Antigen (HLA); Major Histocompatibility Complex (MHC); relative synonymous codon usage (RSCU); Support Vector Machines (SVMs); MAJOR HISTOCOMPATIBILITY COMPLEX; INDEPENDENT COMPONENT ANALYSIS; MULTIPLE-SEQUENCE ALIGNMENT; ESCHERICHIA-COLI; CLUSTER-ANALYSIS; SACCHAROMYCES-CEREVISIAE; CANCER CLASSIFICATION; ARABIDOPSIS-THALIANA; BACILLUS-SUBTILIS; BINDING PEPTIDES;
D O I
10.1109/TCBB.2007.70240
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A novel approach for gene classification, adopting codon usage bias as feature inputs to support vector machines (SVMs) is proposed. The DNA sequence is first converted to a 59-dimensional feature vector, where each element corresponds to the relative synonymous usage (RSCU) frequency of a codon. Since the input to the classifier is independent of sequence length, the approach is especially useful when sequences to be classified are of differing lengths and homology-based methods tend to fail. The method is demonstrated with 1,841 Human Leukocyte Antigen (HLA) sequences, which are classified into two major classes, HLA-I and HLA-II. Each major class is further classified into subgroups. Using codon usage frequencies, binary SVM achieved an accuracy rate of 99.3 percent for HLA major class classification and multiclass SVM achieved accuracy rates of 99.73 percent and 98.38 percent for the subclass classification of HLA-I and HLA-II molecules, respectively. Comparisons with K-Means clustering and other classifiers and homology-based features are given. Results indicate that the classification based on codon usage bias is consistent with biological functions of HLA molecules.
引用
收藏
页码:134 / 143
页数:10
相关论文
共 50 条
  • [1] Gene selection for cancer classification using support vector machines
    Guyon, I
    Weston, J
    Barnhill, S
    Vapnik, V
    MACHINE LEARNING, 2002, 46 (1-3) : 389 - 422
  • [2] Gene Selection for Cancer Classification using Support Vector Machines
    Isabelle Guyon
    Jason Weston
    Stephen Barnhill
    Vladimir Vapnik
    Machine Learning, 2002, 46 : 389 - 422
  • [3] Gene classification using codon usage and SVMs
    Ma, JM
    Nguyen, MN
    Pang, GWL
    Rajapakse, JC
    PROCEEDINGS OF THE 2005 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2005, : 435 - 442
  • [4] A note on classification of gene expression data using support vector machines
    Fujarewicz, K
    Kimmel, M
    Rzeszowska-Wolny, J
    Swierniak, A
    JOURNAL OF BIOLOGICAL SYSTEMS, 2003, 11 (01) : 43 - 56
  • [5] Bag classification using support vector machines
    Kartoun, Uri
    Stern, Helman
    Edan, Yael
    APPLIED SOFT COMPUTING TECHNOLOGIES: THE CHALLENGE OF COMPLEXITY, 2006, 34 : 665 - 674
  • [6] Wafer Classification Using Support Vector Machines
    Baly, Ramy
    Hajj, Hazem
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2012, 25 (03) : 373 - 383
  • [7] A CBIR CLASSIFICATION USING SUPPORT VECTOR MACHINES
    Sugamya, Katta
    Pabboju, Suresh
    Babu, A. Vinaya
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN HUMAN MACHINE INTERACTION (HMI), 2016, : 135 - +
  • [8] Classification of Torreya Using Support Vector Machines
    Wang, Xiaodong
    Chang, Jianli
    2012 THIRD INTERNATIONAL CONFERENCE ON TELECOMMUNICATION AND INFORMATION (TEIN 2012), 2012, : 212 - 216
  • [9] Cloud classification using support vector machines
    Azimi-Sadjadi, MR
    Zekavat, SA
    IGARSS 2000: IEEE 2000 INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOL I - VI, PROCEEDINGS, 2000, : 669 - 671
  • [10] Gender classification using support vector machines
    Yang, MH
    Moghaddam, B
    2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2000, : 471 - 474