A novel approach to extracting features from motif content and protein composition for protein sequence classification

被引:44
|
作者
Zhao, XM
Cheung, YM
Huang, DS
机构
[1] Chinese Acad Sci, Inst Intelligent Machines, Intelligent Comp Lab, Hefei 230031, Anhui, Peoples R China
[2] Univ Sci & Technol China, Dept Automat, Hefei 230026, Anhui, Peoples R China
[3] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
genetic algorithm; motif content; protein composition; protein sequence classification; support vector machine;
D O I
10.1016/j.neunet.2005.07.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel approach to extracting features from motif content and protein composition for protein sequence classification. First, we formulate a protein sequence as a fixed-dimensional vector using the motif content and protein composition. Then, we further project the vectors into a low-dimensional space by the Principal Component Analysis (PCA) so that they can be represented by a combination of the eigenvectors of the covariance matrix of these vectors. Subsequently, the Genetic Algorithm (GA) is used to extract a subset of biological and functional sequence features from the eigen-space and to optimize the regularization parameter of the Support Vector Machine (SVM) simultaneously. Finally, we utilize the SVM classifiers to classify protein sequences into corresponding families based on the selected feature subsets. In comparison with the existing PSI-BLAST and SVM-pairwise methods, the experiments show the promising results of our approach. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1019 / 1028
页数:10
相关论文
共 50 条
  • [21] A genetic programming method for protein motif discovery and protein classification
    Denise Fukumi Tsunoda
    Alex Alves Freitas
    Heitor Silvério Lopes
    Soft Computing, 2011, 15 : 1897 - 1908
  • [22] A genetic programming method for protein motif discovery and protein classification
    Tsunoda, Denise Fukumi
    Freitas, Alex Alves
    Lopes, Heitor Silverio
    SOFT COMPUTING, 2011, 15 (10) : 1897 - 1908
  • [23] Discovering Interesting Motif-Sets for Multi-Class Protein Sequence Classification
    Ma, Patrick C. H.
    Chan, Keith C. C.
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2010, 17 (05) : 733 - 743
  • [24] A novel protein motif that targets misfolded protein assemblies
    Krishnan, Rajaraman
    PRION, 2013, 7 : 97 - 97
  • [25] IDENTIFICATION OF A NOVEL PROTEIN-SEQUENCE MOTIF - THE RAN GTPASE BINDING DOMAIN
    MACARA, I
    LOUNSBURY, K
    OREM, N
    PERLUNGHEI, R
    RICHARDS, S
    BEDDOW, A
    JOURNAL OF CELLULAR BIOCHEMISTRY, 1995, : 55 - 55
  • [26] A Novel Approach of Protein Secondary Structure Prediction by SVM Using PSSM Combined by Sequence Features
    Chen, Yehong
    Cheng, Jinyong
    Liu, Yihui
    Park, Pil Seong
    PROCEEDINGS OF SAI INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS) 2016, VOL 1, 2018, 15 : 1074 - 1084
  • [27] On the use of structure and sequence-based features for protein classification and retrieval
    Keith Marsolo
    Srinivasan Parthasarathy
    Knowledge and Information Systems, 2008, 14 : 59 - 80
  • [28] On the use of structure and sequence-based features for protein classification and retrieval
    Marsolo, Keith
    Parthasarathy, Srinivasan
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 394 - +
  • [29] Enzyme Function Classification using Protein Sequence Features and Random Forest
    Kumar, Chetan
    Li, Gang
    Choudhary, Alok
    2009 3RD INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1-11, 2009, : 764 - 767
  • [30] On the use of structure and sequence-based features for protein classification and retrieval
    Marsolo, Keith
    Parthasarathy, Srinivasan
    KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 14 (01) : 59 - 80