Discovering unknown human and mouse transcription factor binding sites and their characteristics from ChIP-seq data

被引:6
|
作者
Yu, Chun-Ping [1 ]
Kuo, Chen-Hao [1 ]
Nelson, Chase W. [1 ,2 ]
Chen, Chi-An [1 ]
Soh, Zhi Thong [1 ]
Lin, Jinn-Jy [1 ]
Hsiao, Ru-Xiu [1 ]
Chang, Chih-Yao [1 ]
Li, Wen-Hsiung [1 ,3 ]
机构
[1] Acad Sinica, Biodivers Res Ctr, Taipei 115, Taiwan
[2] Amer Museum Nat Hist, Inst Comparat Genom, New York, NY 10024 USA
[3] Univ Chicago, Dept Ecol & Evolut, 940 E 57th St, Chicago, IL 60637 USA
关键词
ChIP-seq; transcription factor; binding site; promoter; position weight matrix; CHROMATIN; ENCODE; ALIGNMENT; PROTEINS; FEATURES;
D O I
10.1073/pnas.2026754118
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Transcription factor binding sites (TFBSs) are essential for gene regulation, but the number of known TFBSs remains limited. We aimed to discover and characterize unknown TFBSs by developing a computational pipeline for analyzing ChIP-seq (chromatin immunoprecipitation followed by sequencing) data. Applying it to the latest ENCODE ChIP-seq data for human and mouse, we found that using the irreproducible discovery rate as a quality-control criterion resulted in many experiments being unnecessarily discarded. By contrast, the number of motif occurrences in ChIP-seq peak regions provides a highly effective criterion, which is reliable even if supported by only one experimental replicate. In total, we obtained 2,058 motifs from 1,089 experiments for 354 human TFs and 163 motifs from 101 experiments for 34 mouse TFs. Among these motifs, 487 have not previously been reported. Mapping the canonical motifs to the human genome reveals a high TFBS density +/- 2 kb around transcription start sites (TSSs) with a peak at -50 bp. On average, a promoter contains 5.7 TFBSs. However, 70% of TFBSs are in introns (41%) and intergenic regions (29%), whereas only 12% are in promoters (-1 kb to +100 bp from TSSs). Notably, some TFs (e.g., CTCF, JUN, JUNB, and NFE2) have motifs enriched in intergenic regions, including enhancers. We inferred 142 cobinding TF pairs and 186 (including 115 completely) tethered binding TF pairs, indicating frequent interactions between TFs and a higher frequency of tethered binding than cobinding. This study provides a large number of previously undocumented motifs and insights into the biological and genomic features of TFBSs.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] De novo motif identification improves the accuracy of predicting transcription factor binding sites in ChIP-Seq data analysis
    Boeva, Valentina
    Surdez, Didier
    Guillon, Noelle
    Tirode, Franck
    Fejes, Anthony P.
    Delattre, Olivier
    Barillot, Emmanuel
    NUCLEIC ACIDS RESEARCH, 2010, 38 (11) : e126 - e126
  • [32] CISMAPPER: predicting regulatory interactions from transcription factor ChIP-seq data
    O'Connor, Timothy
    Boden, Mikael
    Bailey, Timothy L.
    NUCLEIC ACIDS RESEARCH, 2017, 45 (04) : e19
  • [33] Transcription factor binding predictions using TRAP for the analysis of ChIP-seq data and regulatory SNPs
    Morgane Thomas-Chollier
    Andrew Hufton
    Matthias Heinig
    Sean O'Keeffe
    Nassim El Masri
    Helge G Roider
    Thomas Manke
    Martin Vingron
    Nature Protocols, 2011, 6 : 1860 - 1869
  • [34] Transcription factor binding predictions using TRAP for the analysis of ChIP-seq data and regulatory SNPs
    Thomas-Chollier, Morgane
    Hufton, Andrew
    Heinig, Matthias
    O'Keeffe, Sean
    El Masri, Nassim
    Roider, Helge G.
    Manke, Thomas
    Vingron, Martin
    NATURE PROTOCOLS, 2011, 6 (12) : 1860 - 1869
  • [35] Development of computational methods to search for FoxA transcription factor binding sites, their experimental verification and application to the analysis of ChIP-seq data
    V. G. Levitsky
    D. Yu. Oshchepkov
    N. I. Ershov
    L. O. Bryzgalov
    E. V. Antontseva
    G. V. Vasiliev
    T. I. Merkulova
    N. A. Kolchanov
    Doklady Biochemistry and Biophysics, 2011, 436 : 12 - 15
  • [36] Development of Computational Methods to Search for FoxA Transcription Factor Binding Sites, Their Experimental Verification and Application to the Analysis of Chip-Seq Data
    Levitsky, V. G.
    Oshchepkov, D. Yu.
    Ershov, N. I.
    Bryzgalov, L. O.
    Antontseva, E. V.
    Vasiliev, G. V.
    Merkulova, T. I.
    Kolchanov, N. A.
    DOKLADY BIOCHEMISTRY AND BIOPHYSICS, 2011, 436 (01) : 12 - 15
  • [37] Five vertebrate ChIP-seq reveals the evolutionary dynamics of transcription factor binding
    Odom, D.
    EJC SUPPLEMENTS, 2010, 8 (05): : 158 - 158
  • [38] Genome-wide Statistical Analysis of Multiple Transcription Factor Binding Sites Obtained by ChIP-seq Technologies
    Orlov, Yuriy L.
    Huss, Mikael
    Joseph, Roy
    Xu, Han
    Vega, Vinsensius B.
    Lee, Yew K.
    Goh, Wee S.
    Thomsen, Jane S.
    Cheung, Edwin
    Clarke, Neil D.
    Ng, Huck H.
    COMPBIO09: BREAKING FRONTIERS OF COMPUTATIONAL BIOLOGY, 2009, : 11 - 18
  • [39] A ChIP-Seq Benchmark Shows That Sequence Conservation Mainly Improves Detection of Strong Transcription Factor Binding Sites
    Handstad, Tony
    Rye, Morten Beck
    Drablos, Finn
    Saetrom, Pal
    PLOS ONE, 2011, 6 (04):
  • [40] Detecting differential binding of transcription factors with ChIP-seq
    Liang, Kun
    Keles, Sunduz
    BIOINFORMATICS, 2012, 28 (01) : 121 - 122