Prediction of bioluminescent proteins by using sequence-derived features and lineage-specific scheme

被引:20
|
作者
Zhang, Jian [1 ,2 ]
Chai, Haiting [1 ]
Yang, Guifu [1 ]
Ma, Zhiqiang [1 ]
机构
[1] Northeast Normal Univ, Sch Comp Sci & Informat Technol, Changchun 130117, Jilin Province, Peoples R China
[2] Xinyang Normal Univ, Sch Comp & Informat Technol, Xinyang 464000, Henan Province, Peoples R China
来源
BMC BIOINFORMATICS | 2017年 / 18卷
基金
中国国家自然科学基金;
关键词
Bioluminescent proteins; Sequence-derived; Feature analysis; Lineage-specific; SUPPORT VECTOR MACHINES; COLOR; CLASSIFICATION; RESIDUES;
D O I
10.1186/s12859-017-1709-6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Bioluminescent proteins (BLPs) widely exist in many living organisms. As BLPs are featured by the capability of emitting lights, they can be served as biomarkers and easily detected in biomedical research, such as gene expression analysis and signal transduction pathways. Therefore, accurate identification of BLPs is important for disease diagnosis and biomedical engineering. In this paper, we propose a novel accurate sequence-based method named PredBLP (Prediction of BioLuminescent Proteins) to predict BLPs. Results: We collect a series of sequence-derived features, which have been proved to be involved in the structure and function of BLPs. These features include amino acid composition, dipeptide composition, sequence motifs and physicochemical properties. We further prove that the combination of four types of features outperforms any other combinations or individual features. To remove potential irrelevant or redundant features, we also introduce Fisher Markov Selector together with Sequential Backward Selection strategy to select the optimal feature subsets. Additionally, we design a lineage-specific scheme, which is proved to be more effective than traditional universal approaches. Conclusion: Experiment on benchmark datasets proves the robustness of PredBLP. We demonstrate that lineagespecific models significantly outperform universal ones. We also test the generalization capability of PredBLP based on independent testing datasets as well as newly deposited BLPs in UniProt. PredBLP is proved to be able to exceed many state-of-art methods. A web server named PredBLP, which implements the proposed method, is free available for academic use.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] A Lineage-Specific Centromeric Satellite Sequence in the Genus Trifolium
    Helal A. Ansari
    Nick W. Ellison
    Andrew G. Griffiths
    Warren M. Williams
    Chromosome Research, 2004, 12 : 357 - 367
  • [22] A lineage-specific centromeric satellite sequence in the genus Trifolium
    Ansari, HA
    Ellison, NW
    Griffiths, AG
    Williams, WM
    CHROMOSOME RESEARCH, 2004, 12 (04) : 357 - 367
  • [23] Accurate Prediction of ATP-binding Residues Using Sequence and Sequence-derived Structural Descriptors
    Chen, Ke
    Mizianty, Marcin J.
    Kurgan, Lukasz
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2010, : 43 - 48
  • [24] Application of Intelligent Techniques for Classification of Bacteria Using Protein Sequence-Derived Features
    Banerjee, Amit Kumar
    Ravi, Vadlamani
    Murty, U. S. N.
    Sengupta, Neelava
    Karuna, Batepatti
    APPLIED BIOCHEMISTRY AND BIOTECHNOLOGY, 2013, 170 (06) : 1263 - 1281
  • [25] Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors
    Chen, Ke
    Mizianty, Marcin J.
    Kurgan, Lukasz
    BIOINFORMATICS, 2012, 28 (03) : 331 - 341
  • [26] A neural network learning approach for improving the prediction of residue depth based on sequence-derived features
    Yan, Renxiang
    Wang, Xiaofeng
    Xu, Weiming
    Cai, Weiwen
    Lin, Juan
    Li, Jian
    Song, Jiangning
    RSC ADVANCES, 2016, 6 (72): : 67729 - 67738
  • [27] An improved classification of G-protein-coupled receptors using sequence-derived features
    Zhen-Ling Peng
    Jian-Yi Yang
    Xin Chen
    BMC Bioinformatics, 11
  • [28] An improved classification of G-protein-coupled receptors using sequence-derived features
    Peng, Zhen-Ling
    Yang, Jian-Yi
    Chen, Xin
    BMC BIOINFORMATICS, 2010, 11
  • [29] Complementing sequence-derived features with structural information extracted from fragment libraries for protein structure prediction
    Siyuan Liu
    Tong Wang
    Qijiang Xu
    Bin Shao
    Jian Yin
    Tie-Yan Liu
    BMC Bioinformatics, 22
  • [30] Complementing sequence-derived features with structural information extracted from fragment libraries for protein structure prediction
    Liu, Siyuan
    Wang, Tong
    Xu, Qijiang
    Shao, Bin
    Yin, Jian
    Liu, Tie-Yan
    BMC BIOINFORMATICS, 2021, 22 (01)