A systematic, large-scale comparison of transcription factor binding site models

被引：14

作者：

Hombach, Daniela ^{[1
,2
]}

Schwarz, Jana Marie ^{[1
,2
]}

Robinson, Peter N. ^{[3
]}

Schuelke, Markus ^{[1
,2
]}

Seelow, Dominik ^{[1
,2
,4
]}

机构：

[1] Charite, Dept Neuropaediat, D-13353 Berlin, Germany

[2] Charite, NeuroCure Clin Res Ctr, D-13353 Berlin, Germany

[3] Charite, Inst Med Genet & Human Genet, D-13353 Berlin, Germany

[4] Berlin Inst Hlth, Berliner Inst Gesundheitsforsch, Berlin, Germany

来源：

BMC GENOMICS | 2016年 / 17卷

关键词：

Transcription factor binding sites; TFBS prediction; PSSM; Genetic variation; RAPID EVOLUTION; GENE-REGULATION; DNA; DATABASE; SEQUENCES; IDENTIFICATION; REVEALS;

D O I：

10.1186/s12864-016-2729-8

中图分类号：

Q81 [生物工程学（生物技术）]; Q93 [微生物学];

学科分类号：

071005 ; 0836 ; 090102 ; 100705 ;

摘要：

Background: The modelling of gene regulation is a major challenge in biomedical research. This process is dominated by transcription factors (TFs) and mutations in their binding sites (TFBSs) may cause the misregulation of genes, eventually leading to disease. The consequences of DNA variants on TF binding are modelled in silico using binding matrices, but it remains unclear whether these are capable of accurately representing in vivo binding. In this study, we present a systematic comparison of binding models for 82 human TFs from three freely available sources: JASPAR matrices, HT-SELEX-generated models and matrices derived from protein binding microarrays (PBMs). We determined their ability to detect experimentally verified "real" in vivo TFBSs derived from ENCODE ChIP-seq data. As negative controls we chose random downstream exonic sequences, which are unlikely to harbour TFBS. All models were assessed by receiver operating characteristics (ROC) analysis. Results: While the area-under-curve was low for most of the tested models with only 47 % reaching a score of 0.7 or higher, we noticed strong differences between the various position-specific scoring matrices with JASPAR and HT-SELEX models showing higher success rates than PBM-derived models. In addition, we found that while TFBS sequences showed a higher degree of conservation than randomly chosen sequences, there was a high variability between individual TFBSs. Conclusions: Our results show that only few of the matrix-based models used to predict potential TFBS are able to reliably detect experimentally confirmed TFBS. We compiled our findings in a freely accessible web application called ePOSSUM (http:/mutationtaster.charite.de/ePOSSUM/) which uses a Bayes classifier to assess the impact of genetic alterations on TF binding in user-defined sequences. Additionally, ePOSSUM provides information on the reliability of the prediction using our test set of experimentally confirmed binding sites.

引用

页数：10

共 50 条

[1] A systematic, large-scale comparison of transcription factor binding site models
Daniela Hombach
Jana Marie Schwarz
Peter N. Robinson
Markus Schuelke
Dominik Seelow
BMC Genomics, 17
[2] Erratum to: A systematic, large-scale comparison of transcription factor binding site models
Daniela Hombach
Jana Marie Schwarz
Peter N. Robinson
Markus Schuelke
Dominik Seelow
BMC Genomics, 17
[3] A systematic, large-scale comparison of transcription factor binding site models (vol 17, pg 388, 2016)
Hombach, Daniela
Schwarz, Jana Marie
Robinson, Peter N.
Schuelke, Markus
Seelow, Dominik
BMC GENOMICS, 2016, 17
[4] Large-scale transcription factor binding prediction
Nature Methods, 2014, 11 (11) : 1091 - 1091
[5] Large-Scale Comparison of Four Binding Site Detection Algorithms
Schmidtke, Peter
Souaille, Catherine
Estienne, Frederic
Baurin, Nicolas
Kroemer, Romano T.
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2010, 50 (12) : 2191 - 2200
[6] Large-scale turnover of functional transcription factor binding sites in Drosophila
Moses, Alan M.
Pollard, Daniel A.
Nix, David A.
Iyer, Venky N.
Li, Xiao-Yong
Biggin, Mark D.
Eisen, Michael B.
PLOS COMPUTATIONAL BIOLOGY, 2006, 2 (10) : 1219 - 1231
[7] BindSpace decodes transcription factor binding signals by large-scale sequence embedding
Yuan, Han
Kshirsagar, Meghana
Zamparo, Lee
Lu, Yuheng
Leslie, Christina S.
NATURE METHODS, 2019, 16 (09) : 858 - +
[8] BindSpace decodes transcription factor binding signals by large-scale sequence embedding
Han Yuan
Meghana Kshirsagar
Lee Zamparo
Yuheng Lu
Christina S. Leslie
Nature Methods, 2019, 16 : 858 - 861
[9] Mocap: large-scale inference of transcription factor binding sites from chromatin accessibility
Chen, Xi
Yu, Bowen
Carriero, Nicholas
Silva, Claudio
Bonneau, Richard
NUCLEIC ACIDS RESEARCH, 2017, 45 (08) : 4315 - 4329
[10] The Large-Scale, Systematic and Iterated Comparison of Agent-Based Policy Models
Bithell, Mike
Chattoe-Brown, Edmund
Edmonds, Bruce
ADVANCES IN SOCIAL SIMULATION, 2022, : 367 - 378

← 1 2 3 4 5 →