A novel approach to extracting features from motif content and protein composition for protein sequence classification

被引:44
|
作者
Zhao, XM
Cheung, YM
Huang, DS
机构
[1] Chinese Acad Sci, Inst Intelligent Machines, Intelligent Comp Lab, Hefei 230031, Anhui, Peoples R China
[2] Univ Sci & Technol China, Dept Automat, Hefei 230026, Anhui, Peoples R China
[3] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
genetic algorithm; motif content; protein composition; protein sequence classification; support vector machine;
D O I
10.1016/j.neunet.2005.07.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel approach to extracting features from motif content and protein composition for protein sequence classification. First, we formulate a protein sequence as a fixed-dimensional vector using the motif content and protein composition. Then, we further project the vectors into a low-dimensional space by the Principal Component Analysis (PCA) so that they can be represented by a combination of the eigenvectors of the covariance matrix of these vectors. Subsequently, the Genetic Algorithm (GA) is used to extract a subset of biological and functional sequence features from the eigen-space and to optimize the regularization parameter of the Support Vector Machine (SVM) simultaneously. Finally, we utilize the SVM classifiers to classify protein sequences into corresponding families based on the selected feature subsets. In comparison with the existing PSI-BLAST and SVM-pairwise methods, the experiments show the promising results of our approach. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1019 / 1028
页数:10
相关论文
共 50 条
  • [1] A Novel Semi-supervised Approach for Protein Sequence Classification
    Chaturvedi, Bharti
    Patil, Nagamma
    2015 IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2015, : 1158 - 1162
  • [2] A NOVEL APPROACH FOR MUSIC CLASSIFICATION BY EXTRACTING SCORE FEATURES
    Lu, Cheng-Che
    Tseng, Vincent S.
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (12A): : 4725 - 4735
  • [3] Integrated graphical analysis of protein sequence features predicted from sequence composition
    Sonnhammer, ELL
    Wootton, JC
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2001, 45 (03) : 262 - 273
  • [4] An evolutionary approach for motif discovery and transmembrane protein classification
    Tsunoda, DF
    Lopes, HS
    Freitas, AA
    APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, 2005, 3449 : 105 - 114
  • [5] Extracting Coevolutionary Features from Protein Sequences for Predicting Protein-Protein Interactions
    Hu, Lun
    Chan, Keith C. C.
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (01) : 155 - 166
  • [6] Motif-based protein sequence classification using neural networks
    Blekas, K
    Fotiadis, DI
    Likas, A
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2005, 12 (01) : 64 - 82
  • [7] Sequence motif identification and protein family classification using probabilistic trees
    Leonardi, F
    Galves, A
    ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, PROCEEDINGS, 2005, 3594 : 190 - 193
  • [8] Extracting protein alignment models from the sequence database
    Neuwald, AF
    Liu, JS
    Lipman, DJ
    Lawrence, CE
    NUCLEIC ACIDS RESEARCH, 1997, 25 (09) : 1665 - 1677
  • [9] Novel tiny textural motif pattern-based RNA virus protein sequence classification model
    Erten, Mehmet
    Aydemir, Emrah
    Barua, Prabal Datta
    Baygin, Mehmet
    Dogan, Sengul
    Tuncer, Turker
    Tan, Ru-San
    Hafeez-Baig, Abdul
    Acharya, U. Rajendra
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 242
  • [10] Motif extraction and protein classification
    Kunik, V
    Solan, Z
    Edelman, S
    Ruppin, E
    Horn, D
    2005 IEEE Computational Systems Bioinformatics Conference, Proceedings, 2005, : 80 - 85