Discovering unknown human and mouse transcription factor binding sites and their characteristics from ChIP-seq data

被引:6
|
作者
Yu, Chun-Ping [1 ]
Kuo, Chen-Hao [1 ]
Nelson, Chase W. [1 ,2 ]
Chen, Chi-An [1 ]
Soh, Zhi Thong [1 ]
Lin, Jinn-Jy [1 ]
Hsiao, Ru-Xiu [1 ]
Chang, Chih-Yao [1 ]
Li, Wen-Hsiung [1 ,3 ]
机构
[1] Acad Sinica, Biodivers Res Ctr, Taipei 115, Taiwan
[2] Amer Museum Nat Hist, Inst Comparat Genom, New York, NY 10024 USA
[3] Univ Chicago, Dept Ecol & Evolut, 940 E 57th St, Chicago, IL 60637 USA
关键词
ChIP-seq; transcription factor; binding site; promoter; position weight matrix; CHROMATIN; ENCODE; ALIGNMENT; PROTEINS; FEATURES;
D O I
10.1073/pnas.2026754118
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Transcription factor binding sites (TFBSs) are essential for gene regulation, but the number of known TFBSs remains limited. We aimed to discover and characterize unknown TFBSs by developing a computational pipeline for analyzing ChIP-seq (chromatin immunoprecipitation followed by sequencing) data. Applying it to the latest ENCODE ChIP-seq data for human and mouse, we found that using the irreproducible discovery rate as a quality-control criterion resulted in many experiments being unnecessarily discarded. By contrast, the number of motif occurrences in ChIP-seq peak regions provides a highly effective criterion, which is reliable even if supported by only one experimental replicate. In total, we obtained 2,058 motifs from 1,089 experiments for 354 human TFs and 163 motifs from 101 experiments for 34 mouse TFs. Among these motifs, 487 have not previously been reported. Mapping the canonical motifs to the human genome reveals a high TFBS density +/- 2 kb around transcription start sites (TSSs) with a peak at -50 bp. On average, a promoter contains 5.7 TFBSs. However, 70% of TFBSs are in introns (41%) and intergenic regions (29%), whereas only 12% are in promoters (-1 kb to +100 bp from TSSs). Notably, some TFs (e.g., CTCF, JUN, JUNB, and NFE2) have motifs enriched in intergenic regions, including enhancers. We inferred 142 cobinding TF pairs and 186 (including 115 completely) tethered binding TF pairs, indicating frequent interactions between TFs and a higher frequency of tethered binding than cobinding. This study provides a large number of previously undocumented motifs and insights into the biological and genomic features of TFBSs.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Studying the evolution of transcription factor binding events using multi-species ChIP-Seq data
    Zheng, Wei
    Zhao, Hongyu
    STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2013, 12 (01) : 1 - 15
  • [42] Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data
    Eggeling, Ralf
    Roos, Teemu
    Myllymaki, Petri
    Grosse, Ivo
    BMC BIOINFORMATICS, 2015, 16
  • [43] Inferring intra-motif dependencies of DNA binding sites from ChIP-seq data
    Ralf Eggeling
    Teemu Roos
    Petri Myllymäki
    Ivo Grosse
    BMC Bioinformatics, 16
  • [44] HOCOMOCO: towards a complete collection of transcription factor binding models for human and mouse via large-scale ChIP-Seq analysis
    Kulakovskiy, Ivan V.
    Vorontsov, Ilya E.
    Yevshin, Ivan S.
    Sharipov, Ruslan N.
    Fedorova, Alla D.
    Rumynskiy, Eugene I.
    Medvedeva, Yulia A.
    Magana-Mora, Arturo
    Bajic, Vladimir B.
    Papatsenko, Dmitry A.
    Kolpakov, Fedor A.
    Makeev, Vsevolod J.
    NUCLEIC ACIDS RESEARCH, 2018, 46 (D1) : D252 - D259
  • [45] Optimized ChIP-seq method facilitates transcription factor profiling in human tumors
    Singh, Abhishek A.
    Schuurman, Karianne
    Nevedomskaya, Ekaterina
    Stelloo, Suzan
    Linder, Simon
    Droog, Marjolein
    Kim, Yongsoo
    Sanders, Joyce
    van der Poel, Henk
    Bergman, Andries M.
    Wessels, Lodewyk F. A.
    Zwart, Wilbert
    LIFE SCIENCE ALLIANCE, 2019, 2 (01)
  • [46] Role of ChIP-seq in the discovery of transcription factor binding sites, differential gene regulation mechanism, epigenetic marks and beyond
    Mundade, Rasika
    Ozer, Hatice Gulcin
    Wei, Han
    Prabhu, Lakshmi
    Lu, Tao
    CELL CYCLE, 2014, 13 (18) : 2847 - 2852
  • [47] Chromatin Immunoprecipitation and Multiplex Sequencing (ChIP-Seq) to Identify Global Transcription Factor Binding Sites in the Nematode Caenorhabditis Elegans
    Brdlik, Cathleen M.
    Niu, Wei
    Snyder, Michael
    LABORATORY METHODS IN ENZYMOLOGY: PROTEIN, PT B, 2014, 539 : 89 - 111
  • [48] Pooled ChIP-Seq Links Variation in Transcription Factor Binding to Complex Disease Risk
    Tehranchi, Ashley K.
    Myrthil, Marsha
    Martin, Trevor
    Hie, Brian L.
    Golan, David
    Fraser, Hunter B.
    CELL, 2016, 165 (03) : 730 - 741
  • [49] Five-Vertebrate ChIP-seq Reveals the Evolutionary Dynamics of Transcription Factor Binding
    Schmidt, Dominic
    Wilson, Michael D.
    Ballester, Benoit
    Schwalie, Petra C.
    Brown, Gordon D.
    Marshall, Aileen
    Kutter, Claudia
    Watt, Stephen
    Martinez-Jimenez, Celia P.
    Mackay, Sarah
    Talianidis, Iannis
    Flicek, Paul
    Odom, Duncan T.
    SCIENCE, 2010, 328 (5981) : 1036 - 1040
  • [50] TIP: A probabilistic method for identifying transcription factor target genes from ChIP-seq binding profiles
    Cheng, Chao
    Min, Renqiang
    Gerstein, Mark
    BIOINFORMATICS, 2011, 27 (23) : 3221 - 3227