A systematic, large-scale comparison of transcription factor binding site models

被引:14
|
作者
Hombach, Daniela [1 ,2 ]
Schwarz, Jana Marie [1 ,2 ]
Robinson, Peter N. [3 ]
Schuelke, Markus [1 ,2 ]
Seelow, Dominik [1 ,2 ,4 ]
机构
[1] Charite, Dept Neuropaediat, D-13353 Berlin, Germany
[2] Charite, NeuroCure Clin Res Ctr, D-13353 Berlin, Germany
[3] Charite, Inst Med Genet & Human Genet, D-13353 Berlin, Germany
[4] Berlin Inst Hlth, Berliner Inst Gesundheitsforsch, Berlin, Germany
来源
BMC GENOMICS | 2016年 / 17卷
关键词
Transcription factor binding sites; TFBS prediction; PSSM; Genetic variation; RAPID EVOLUTION; GENE-REGULATION; DNA; DATABASE; SEQUENCES; IDENTIFICATION; REVEALS;
D O I
10.1186/s12864-016-2729-8
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. Results: While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. Conclusions: Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM (http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Transcription Factor Response Elements on Tip: A Sensitive Approach for Large-Scale Endogenous Transcription Factor Quantitative Identification
    Shi, Wenhao
    Li, Kai
    Song, Lei
    Liu, Mingwei
    Wang, Yunzhi
    Liu, Wanlin
    Xia, Xia
    Qin, Zhaoyu
    Zhen, Bei
    Wang, Yi
    He, Fuchu
    Qin, Jun
    Ding, Chen
    ANALYTICAL CHEMISTRY, 2016, 88 (24) : 11990 - 11994
  • [22] Large-scale analysis of transcription factors
    Marion, RM
    O'Shea, EK
    MOLECULAR BIOLOGY OF THE CELL, 2002, 13 : 100A - 100A
  • [23] Improved Models for Transcription Factor Binding Site Identification Using Nonindependent Interactions
    Zhao, Yue
    Ruan, Shuxiang
    Pandey, Manishi
    Stormo, Gary D.
    GENETICS, 2012, 191 (03) : 781 - U204
  • [24] Building Transcription Factor Binding Site Models to Understand Gene Regulation in Plants
    Lai, Xuelei
    Stigliani, Arnaud
    Vachon, Gilles
    Carles, Cristel
    Smaczniak, Cezary
    Zubieta, Chloe
    Kaufmann, Kerstin
    Parcy, Francois
    MOLECULAR PLANT, 2019, 12 (06) : 743 - 763
  • [25] REQUIEM FOR LARGE-SCALE MODELS
    LEE, DB
    JOURNAL OF THE AMERICAN INSTITUTE OF PLANNERS, 1973, 39 (03): : 163 - 178
  • [26] MODELS OF LARGE-SCALE STRUCTURE
    FRENK, CS
    PHYSICA SCRIPTA, 1991, T36 : 70 - 87
  • [27] SiteMine: Large-scale binding site similarity searching in protein structure databases
    Reim, Thorben
    Ehrt, Christiane
    Graef, Joel
    Guenther, Sebastian
    Meents, Alke
    Rarey, Matthias
    ARCHIV DER PHARMAZIE, 2024, 357 (05)
  • [28] Genome-scale study of the importance of binding site context for transcription factor binding and gene regulation
    Jakub Orzechowski Westholm
    Feifei Xu
    Hans Ronne
    Jan Komorowski
    BMC Bioinformatics, 9
  • [29] Modeling and simulation of large-scale systems: A systematic comparison of modeling paradigms
    Schweiger, G.
    Nilsson, H.
    Schoeggl, J.
    Birk, W.
    Posch, A.
    APPLIED MATHEMATICS AND COMPUTATION, 2020, 365
  • [30] Genome-scale study of the importance of binding site context for transcription factor binding and gene regulation
    Orzechowski, Jakub
    Xu, Feifei
    Ronne, Hans
    Komorowski, Jan
    BMC BIOINFORMATICS, 2008, 9 (1) : 484