Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences

被引:10
|
作者
Wang, Wei [1 ,2 ]
Sun, Lin [1 ]
Zhang, Shiguang [1 ]
Zhang, Hongjun [3 ]
Shi, Jinling [4 ]
Xu, Tianhe [1 ]
Li, Keliang [1 ]
机构
[1] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang 453007, Henan Province, Peoples R China
[2] Engn Technol Res Ctr Comp Intelligence & Data Min, Lab Computat Intelligence & Informat Proc, Xinxiang 453007, Henan Province, Peoples R China
[3] Anyang Univ, Sch Aviat Engn, Anyang 455000, Henan Province, Peoples R China
[4] Xuchang Univ, Sch Int Educ, Xuchang 461000, Henan Province, Peoples R China
来源
BMC BIOINFORMATICS | 2017年 / 18卷
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
SSBs (Single-stranded DNA-binding proteins); DSBs (Double-stranded DNA-binding proteins); Binding specificity; Protein sequence; SUBCELLULAR-LOCALIZATION; OB-FOLD; EVOLUTIONARY; RECOGNITION; SPECIFICITY; FEATURES; SITES; IDENTIFICATION; INTERFACE; DOMAINS;
D O I
10.1186/s12859-017-1715-8
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: DNA-binding proteins perform important functions in a great number of biological activities. DNA-binding proteins can interact with ssDNA (single-stranded DNA) or dsDNA (double-stranded DNA), and DNA-binding proteins can be categorized as single-stranded DNA-binding proteins (SSBs) and double-stranded DNA-binding proteins (DSBs). The identification of DNA-binding proteins from amino acid sequences can help to annotate protein functions and understand the binding specificity. In this study, we systematically consider a variety of schemes to represent protein sequences: OAAC (overall amino acid composition) features, dipeptide compositions, PSSM (position-specific scoring matrix profiles) and split amino acid composition (SAA), and then we adopt SVM (support vector machine) and RF (random forest) classification model to distinguish SSBs from DSBs. Results: Our results suggest that some sequence features can significantly differentiate DSBs and SSBs. Evaluated by 10 fold cross-validation on the benchmark datasets, our prediction method can achieve the accuracy of 88.7% and AUC (area under the curve) of 0.919. Moreover, our method has good performance in independent testing. Conclusions: Using various sequence-derived features, a novel method is proposed to distinguish DSBs and SSBs accurately. The method also explores novel features, which could be helpful to discover the binding specificity of DNA-binding proteins.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Analysis and prediction of single-stranded and double-stranded DNA binding proteins based on protein sequences
    Wei Wang
    Lin Sun
    Shiguang Zhang
    Hongjun Zhang
    Jinling Shi
    Tianhe Xu
    Keliang Li
    BMC Bioinformatics, 18
  • [2] Identification of single-stranded and double-stranded dna binding proteins based on protein structure
    Wei Wang
    Juan Liu
    Xionghui Zhou
    BMC Bioinformatics, 15
  • [3] Identification of single-stranded and double-stranded dna binding proteins based on protein structure
    Wang, Wei
    Liu, Juan
    Zhou, Xionghui
    BMC BIOINFORMATICS, 2014, 15
  • [4] Distinguishing Single-Stranded and Double-Stranded DNA binding Proteins Based on Structural Information
    Wang, Wei
    Liu, Juan
    2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,
  • [5] Single-stranded and double-stranded DNA-binding protein prediction using HMM profiles
    Sharma, Ronesh
    Kumar, Shiu
    Tsunoda, Tatsuhiko
    Kumarevel, Thirumananseri
    Sharma, Alok
    ANALYTICAL BIOCHEMISTRY, 2021, 612
  • [6] Analysis and classification of DNA-binding sites in single-stranded and double-stranded DNA-binding proteins using protein information
    Wang, Wei
    Liu, Juan
    Xiong, Yi
    Zhu, Lida
    Zhou, Xionghui
    IET SYSTEMS BIOLOGY, 2014, 8 (04) : 176 - 183
  • [7] PredPSD: A Gradient Tree Boosting Approach for Single-Stranded and Double-Stranded DNA Binding Protein Prediction
    Tan, Changgeng
    Wang, Tong
    Yang, Wenyi
    Deng, Lei
    MOLECULES, 2020, 25 (01):
  • [8] BINDING OF THE RECA PROTEIN OF ESCHERICHIA-COLI TO SINGLE-STRANDED AND DOUBLE-STRANDED DNA
    MCENTEE, K
    WEINSTOCK, GM
    LEHMAN, IR
    JOURNAL OF BIOLOGICAL CHEMISTRY, 1981, 256 (16) : 8835 - 8844
  • [9] Coordinated Binding of Single-Stranded and Double-Stranded DNA by UvsX Recombinase
    Maher, Robyn L.
    Morrical, Scott W.
    PLOS ONE, 2013, 8 (06):
  • [10] Single-Stranded DNA Binding Proteins Unwind the Newly Synthesized Double-Stranded DNA of Model Miniforks
    Delagoutte, Emmanuelle
    Heneman-Masurel, Amelie
    Baldacci, Giuseppe
    BIOCHEMISTRY, 2011, 50 (06) : 932 - 944