Identification of Mammalian Enzymatic Proteins Based on Sequence-Derived Features and Species-Specific Scheme

被引:7
|
作者
Chai, Haiting [1 ]
Zhang, Jian [2 ]
机构
[1] Univ Glasgow, Coll Med Vet & Life Sci, Glasgow G12 8QQ, Lanark, Scotland
[2] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Peoples R China
来源
IEEE ACCESS | 2018年 / 6卷
基金
中国国家自然科学基金;
关键词
Enzymatic proteins; species-specific; sequence-based; feature selection; REPLACEMENT THERAPY; ENZYMES; CLASSIFICATION; INSUFFICIENCY;
D O I
10.1109/ACCESS.2018.2798284
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Enzymatic proteins (EPs) are widely distributed in organisms and cells and implicated in biochemical processes. Without these proteins, most biochemical reactions slowly occur at mild temperatures and pressures in living bodies. Given the wide application of these proteins in drug discovery and disease therapy, they should be accurately identified, but specific methods have yet to be reported to determine EPs from primary sequences. To achieve this, in this paper, we propose a novel method for predicting mammalian EPs. We collect a series of sequence-based features observed in EPs and perform detailed analyses to investigate the intrinsic properties of enzymatic and non-EPs. To remove redundant features and select an optimal feature subset, we introduce Fisher Markov selector and incremental feature selection. Based on the optimal feature subset, our method achieves the area under the curve values of 0.731, 0.820, and 0.822 on three training datasets using fivefold cross validation. Our strategy also shows a good generalization capability on independent testing datasets. We further compare the differences between our species-specific and universal models, which confirm the effectiveness of introducing the species-specific scheme. We believe that our method is useful for biomedical research on EPs. Our proposed method is implemented in a user-friendly Web server named predict EPs, which is freely available for academic use at http://www.inforstation.com/webservers/PEP/.
引用
收藏
页码:8452 / 8458
页数:7
相关论文
共 50 条
  • [1] Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features
    Yuan Li
    Mingjun Wang
    Huilin Wang
    Hao Tan
    Ziding Zhang
    Geoffrey I. Webb
    Jiangning Song
    Scientific Reports, 4
  • [2] Accurate in silico identification of species-specific acetylation sites by integrating protein sequence-derived and functional features
    Li, Yuan
    Wang, Mingjun
    Wang, Huilin
    Tan, Hao
    Zhang, Ziding
    Webb, Geoffrey I.
    Song, Jiangning
    SCIENTIFIC REPORTS, 2014, 4
  • [3] Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme
    Jian Zhang
    Haiting Chai
    Guifu Yang
    Zhiqiang Ma
    BMC Bioinformatics, 18
  • [4] Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme
    Zhang, Jian
    Chai, Haiting
    Yang, Guifu
    Ma, Zhiqiang
    BMC BIOINFORMATICS, 2017, 18
  • [5] Identification of S-glutathionylation sites in species-specific proteins by incorporating five sequence-derived features into the general pseudo-amino acid composition
    Zhao, Xiaowei
    Ning, Qiao
    Ai, Meiyue
    Chai, Haiting
    Yang, Guifu
    JOURNAL OF THEORETICAL BIOLOGY, 2016, 398 : 96 - 102
  • [6] High-Throughput Identification of Mammalian Secreted Proteins Using Species-Specific Scheme and Application to Human Proteome
    Zhang, Jian
    Chai, Haiting
    Guo, Song
    Guo, Huaping
    Li, Yanling
    MOLECULES, 2018, 23 (06):
  • [7] NetPhosK - Prediction of kinase-specific phosphorylation from sequence and sequence-derived features
    Miller, ML
    Ponten, TS
    Petersen, TN
    Blom, N
    FEBS JOURNAL, 2005, 272 : 111 - 111
  • [8] Improving Bacterial sRNA Identification By Combining Genomic Context and Sequence-Derived Features
    Sorkhian, Mohammad
    Nagari, Megha
    Elsisy, Moustafa
    Pena-Castillo, Lourdes
    COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS, CIBB 2021, 2022, 13483 : 67 - 78
  • [9] A Machine Learning Approach to Identify DNA Replication Proteins from Sequence-Derived Features
    Yang, Runtao
    Zhang, Chengjin
    Gao, Rui
    Zhang, Lina
    2015 IEEE 28TH CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2015, : 13 - 18
  • [10] SRAMP: prediction of mammalian N6-methyladenosine (m6A) sites based on sequence-derived features
    Zhou, Yuan
    Zeng, Pan
    Li, Yan-Hui
    Zhang, Ziding
    Cui, Qinghua
    NUCLEIC ACIDS RESEARCH, 2016, 44 (10)