A novel approach to extracting features from motif content and protein composition for protein sequence classification

被引:44
|
作者
Zhao, XM
Cheung, YM
Huang, DS
机构
[1] Chinese Acad Sci, Inst Intelligent Machines, Intelligent Comp Lab, Hefei 230031, Anhui, Peoples R China
[2] Univ Sci & Technol China, Dept Automat, Hefei 230026, Anhui, Peoples R China
[3] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
genetic algorithm; motif content; protein composition; protein sequence classification; support vector machine;
D O I
10.1016/j.neunet.2005.07.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel approach to extracting features from motif content and protein composition for protein sequence classification. First, we formulate a protein sequence as a fixed-dimensional vector using the motif content and protein composition. Then, we further project the vectors into a low-dimensional space by the Principal Component Analysis (PCA) so that they can be represented by a combination of the eigenvectors of the covariance matrix of these vectors. Subsequently, the Genetic Algorithm (GA) is used to extract a subset of biological and functional sequence features from the eigen-space and to optimize the regularization parameter of the Support Vector Machine (SVM) simultaneously. Finally, we utilize the SVM classifiers to classify protein sequences into corresponding families based on the selected feature subsets. In comparison with the existing PSI-BLAST and SVM-pairwise methods, the experiments show the promising results of our approach. (c) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1019 / 1028
页数:10
相关论文
共 50 条
  • [41] FLCFE: A Novel Method for Extracting Content Features of Flash
    Wang, Jiwei
    Liu, Fangai
    Xu, Zhenguo
    Meng, Xiangzeng
    Wang, Xin
    2015 8TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2015, : 290 - 294
  • [42] Extracting Protein-Protein Interactions Based on Shallow Syntactic Features
    Wang, Haochang
    Li, Yu
    Zhao, Tiejun
    ISBE 2011: 2011 INTERNATIONAL CONFERENCE ON BIOMEDICINE AND ENGINEERING, VOL 4, 2011, : 464 - 467
  • [43] TSSub: eukaryotic protein subcellular localization by extracting features from profiles
    Guo, Jian
    Lin, Yuanlie
    BIOINFORMATICS, 2006, 22 (14) : 1784 - 1785
  • [44] Predicting protein sumoylation sites from sequence features
    Teng, Shaolei
    Luo, Hong
    Wang, Liangjiang
    AMINO ACIDS, 2012, 43 (01) : 447 - 455
  • [45] Predicting protein sumoylation sites from sequence features
    Shaolei Teng
    Hong Luo
    Liangjiang Wang
    Amino Acids, 2012, 43 : 447 - 455
  • [46] Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition
    Hayat, Maqsood
    Khan, Asifullah
    JOURNAL OF THEORETICAL BIOLOGY, 2011, 271 (01) : 10 - 17
  • [47] A Novel Fast Approach for Protein Classification and Evolutionary Analysis
    Ai, Liang
    Feng, Jie
    Yao, Yu Hua
    MATCH-COMMUNICATIONS IN MATHEMATICAL AND IN COMPUTER CHEMISTRY, 2023, 90 (02) : 381 - 398
  • [48] Discovery of binding motif pairs from protein complex structural data and protein interaction sequence data
    Li, H
    Li, J
    Tan, SH
    Ng, SK
    PACIFIC SYMPOSIUM ON BIOCOMPUTING 2004, 2003, : 312 - 323
  • [49] Using protein granularity to extract the protein sequence features
    Liu, Zhi-Xin
    Liu, Song-lei
    Yang, Hong-Qiang
    Bao, Li-Hua
    JOURNAL OF THEORETICAL BIOLOGY, 2013, 331 : 48 - 53
  • [50] Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition
    Ibrahim, Wisam
    Abadeh, Mohammad Saniee
    JOURNAL OF THEORETICAL BIOLOGY, 2017, 421 : 1 - 15