Discovering unknown human and mouse transcription factor binding sites and their characteristics from ChIP-seq data

被引:6
|
作者
Yu, Chun-Ping [1 ]
Kuo, Chen-Hao [1 ]
Nelson, Chase W. [1 ,2 ]
Chen, Chi-An [1 ]
Soh, Zhi Thong [1 ]
Lin, Jinn-Jy [1 ]
Hsiao, Ru-Xiu [1 ]
Chang, Chih-Yao [1 ]
Li, Wen-Hsiung [1 ,3 ]
机构
[1] Acad Sinica, Biodivers Res Ctr, Taipei 115, Taiwan
[2] Amer Museum Nat Hist, Inst Comparat Genom, New York, NY 10024 USA
[3] Univ Chicago, Dept Ecol & Evolut, 940 E 57th St, Chicago, IL 60637 USA
关键词
ChIP-seq; transcription factor; binding site; promoter; position weight matrix; CHROMATIN; ENCODE; ALIGNMENT; PROTEINS; FEATURES;
D O I
10.1073/pnas.2026754118
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Transcription factor binding sites (TFBSs) are essential for gene regulation, but the number of known TFBSs remains limited. We aimed to discover and characterize unknown TFBSs by developing a computational pipeline for analyzing ChIP-seq (chromatin immunoprecipitation followed by sequencing) data. Applying it to the latest ENCODE ChIP-seq data for human and mouse, we found that using the irreproducible discovery rate as a quality-control criterion resulted in many experiments being unnecessarily discarded. By contrast, the number of motif occurrences in ChIP-seq peak regions provides a highly effective criterion, which is reliable even if supported by only one experimental replicate. In total, we obtained 2,058 motifs from 1,089 experiments for 354 human TFs and 163 motifs from 101 experiments for 34 mouse TFs. Among these motifs, 487 have not previously been reported. Mapping the canonical motifs to the human genome reveals a high TFBS density +/- 2 kb around transcription start sites (TSSs) with a peak at -50 bp. On average, a promoter contains 5.7 TFBSs. However, 70% of TFBSs are in introns (41%) and intergenic regions (29%), whereas only 12% are in promoters (-1 kb to +100 bp from TSSs). Notably, some TFs (e.g., CTCF, JUN, JUNB, and NFE2) have motifs enriched in intergenic regions, including enhancers. We inferred 142 cobinding TF pairs and 186 (including 115 completely) tethered binding TF pairs, indicating frequent interactions between TFs and a higher frequency of tethered binding than cobinding. This study provides a large number of previously undocumented motifs and insights into the biological and genomic features of TFBSs.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] PolyaPeak: Detecting Transcription Factor Binding Sites from ChIP-seq Using Peak Shape Information
    Wu, Hao
    Ji, Hongkai
    PLOS ONE, 2014, 9 (03):
  • [22] An Integrated Pipeline for the Genome-Wide Analysis of Transcription Factor Binding Sites from ChIP-Seq
    Mercier, Eloi
    Droit, Arnaud
    Li, Leping
    Robertson, Gordon
    Zhang, Xuekui
    Gottardo, Raphael
    PLOS ONE, 2011, 6 (02):
  • [23] Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
    Rebecca Worsley Hunt
    Anthony Mathelier
    Luis del Peso
    Wyeth W Wasserman
    BMC Genomics, 15
  • [24] Improving analysis of transcription factor binding sites within ChIP-Seq data based on topological motif enrichment
    Hunt, Rebecca Worsley
    Mathelier, Anthony
    del Peso, Luis
    Wasserman, Wyeth W.
    BMC GENOMICS, 2014, 15
  • [25] Transcription Factor Binding Site Mapping Using ChIP-Seq
    Jaini, Suma
    Lyubetskaya, Anna
    Gomes, Antonio
    Peterson, Matthew
    Park, Sang Tae
    Raman, Sahadevan
    Schoolnik, Gary
    Galagan, James
    MICROBIOLOGY SPECTRUM, 2014, 2 (02):
  • [26] Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome
    Karimzadeh, Mehran
    Hoffman, Michael M.
    GENOME BIOLOGY, 2022, 23 (01)
  • [27] Virtual ChIP-seq: predicting transcription factor binding by learning from the transcriptome
    Mehran Karimzadeh
    Michael M. Hoffman
    Genome Biology, 23
  • [28] ChIP-Seq Data Completion and Transcription Factors Binding Analyses
    Huang, De-Shuang
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 7 - 7
  • [29] NUCLEOSOME DISTRIBUTION AROUND TRANSCRIPTION FACTOR BINDING SITES ENRICHED FROM GENOME-WIDE CHIP-SEQ
    Wang Wei
    Lu Zuhong
    IFPT'6: PROGRESS ON POST-GENOME TECHNOLOGIES, PROCEEDINGS, 2009, : 392 - 395
  • [30] DREME: motif discovery in transcription factor ChIP-seq data
    Bailey, Timothy L.
    BIOINFORMATICS, 2011, 27 (12) : 1653 - 1659