A novel approach to extracting features from motif content and protein composition for protein sequence classification

被引:44
|
作者
Zhao, XM
Cheung, YM
Huang, DS
机构
[1] Chinese Acad Sci, Inst Intelligent Machines, Intelligent Comp Lab, Hefei 230031, Anhui, Peoples R China
[2] Univ Sci & Technol China, Dept Automat, Hefei 230026, Anhui, Peoples R China
[3] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
genetic algorithm; motif content; protein composition; protein sequence classification; support vector machine;
D O I
10.1016/j.neunet.2005.07.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel approach to extracting features from motif content and protein composition for protein sequence classification. First, we formulate a protein sequence as a fixed-dimensional vector using the motif content and protein composition. Then, we further project the vectors into a low-dimensional space by the Principal Component Analysis (PCA) so that they can be represented by a combination of the eigenvectors of the covariance matrix of these vectors. Subsequently, the Genetic Algorithm (GA) is used to extract a subset of biological and functional sequence features from the eigen-space and to optimize the regularization parameter of the Support Vector Machine (SVM) simultaneously. Finally, we utilize the SVM classifiers to classify protein sequences into corresponding families based on the selected feature subsets. In comparison with the existing PSI-BLAST and SVM-pairwise methods, the experiments show the promising results of our approach. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1019 / 1028
页数:10
相关论文
共 50 条
  • [31] A Novel Particle Swarm-Based Approach for 3D Motif Matching and Protein Structure Classification
    Ahmed, Hazem Radwan
    Glasgow, Janice
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2014, 2014, 8436 : 1 - 12
  • [32] Protein sequence motif discovery on distributed supercomputer
    Challa, Santan
    Thulasiraman, Parimala
    ADVANCES IN GRID AND PERVASIVE COMPUTING, PROCEEDINGS, 2008, 5036 : 232 - 243
  • [33] THE ELUCIDATION OF PROTEIN FUNCTION BY SEQUENCE MOTIF ANALYSIS
    HODGMAN, TC
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1989, 5 (01): : 1 - 13
  • [34] Fast motif search in protein sequence databases
    Zheleva, Elena
    Arslan, Abdullah N.
    COMPUTER SCIENCE - THEORY AND APPLICATIONS, 2006, 3967 : 670 - 681
  • [35] Prediction of Functional WXXF-Like Protein Motif from Sequence
    Dalafave, D. S.
    BIOPHYSICAL JOURNAL, 2010, 98 (03) : 197A - 197A
  • [36] Identification of protein superfamily from structure-based sequence motif
    HUANG Jingfei & LIU CiquanKey Laboratory of Cellular and Molecular Evolution
    Chinese Science Bulletin, 2002, (16) : 1377 - 1381
  • [37] Identification of protein superfamily from structure-based sequence motif
    Huang, JF
    Liu, CQ
    CHINESE SCIENCE BULLETIN, 2002, 47 (16): : 1377 - 1381
  • [38] Extracting fractal features for analyzing protein structure
    Tao, Y
    Ioerger, TR
    Sacchettini, JC
    16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL II, PROCEEDINGS, 2002, : 482 - 485
  • [39] Protein Sequence Classification Using Bidirectional Encoder Representations from Transformers (BERT) Approach
    Balamurugan R.
    Mohite S.
    Raja S.P.
    SN Computer Science, 4 (5)
  • [40] FlexSLiM: a Novel Approach for Short Linear Motif Discovery in Protein Sequences
    Li, Xiaoman
    Ge, Ping
    Hu, Haiyan
    PROCEEDINGS OF 2018 6TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (ICBCB 2018), 2018, : 32 - 39