Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2

被引:117
|
作者
Kaul, Arya [1 ,4 ]
Bhattacharyya, Sourya [2 ]
Ay, Ferhat [2 ,3 ]
机构
[1] Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USA
[2] La Jolla Inst Immunol, Div Vaccine Discovery, La Jolla, CA 92037 USA
[3] Univ Calif San Diego, Sch Med, La Jolla, CA 92093 USA
[4] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
关键词
REVEALS; GENOME; ORGANIZATION; PRINCIPLES; MODEL; MAP;
D O I
10.1038/s41596-019-0273-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Fit-Hi-C is a programming application to compute statistical confidence estimates for Hi-C contact maps to identify significant chromatin contacts. By fitting a monotonically non-increasing spline, Fit-Hi-C captures the relationship between genomic distance and contact probability without any parametric assumption. The spline fit together with the correction of contact probabilities with respect to bin- or locus-specific biases accounts for previously characterized covariates impacting Hi-C contact counts. Fit-Hi-C is best applied for the study of mid-range (e.g., 20 kb-2 Mb for human genome) intra-chromosomal contacts; however, with the latest reimplementation, named FitHiC2, it is possible to perform genome-wide analysis for high-resolution Hi-C data, including all intra-chromosomal distances and inter-chromosomal contacts. FitHiC2 also offers a merging filter module, which eliminates indirect/bystander interactions, leading to significant reduction in the number of reported contacts without sacrificing recovery of key loops such as those between convergent CTCF binding sites. Here, we describe how to apply the FitHiC2 protocol to three use cases: (i) 5-kb resolution Hi-C data of chromosome 5 from GM12878 (a human lymphoblastoid cell line), (ii) 40-kb resolution whole-genome Hi-C data from IMR90 (human lung fibroblast), and (iii) budding yeast whole-genome Hi-C data at a single restriction cut site (EcoRI) resolution. The procedure takes 12 h with preprocessing when all use cases are run sequentially (4 h when run parallel). With the recent improvements in its implementation, FitHiC2 (8 processors and 16 GB memory) is also scalable to genome-wide analysis of the highest resolution (1 kb) Hi-C data available to date (48 h with 32 GB peak memory). FitHiC2 is available through Bioconda, GitHub and the Python Package Index. Fit-Hi-C is a computational tool for identifying statistically significant contacts from Hi-C data. This protocol describes how to apply the new version, called FitHiC2, on high-resolution Hi-C data, demonstrating the added functionalities.
引用
收藏
页码:991 / 1012
页数:22
相关论文
共 50 条
  • [1] Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2
    Arya Kaul
    Sourya Bhattacharyya
    Ferhat Ay
    Nature Protocols, 2020, 15 : 991 - 1012
  • [2] Fine mapping chromatin contacts in capture Hi-C data
    Eijsbouts, Christiaan Q.
    Burren, Oliver S.
    Newcombe, Paul J.
    Wallace, Chris
    BMC GENOMICS, 2019, 20 (1)
  • [3] Fine mapping chromatin contacts in capture Hi-C data
    Christiaan Q Eijsbouts
    Oliver S Burren
    Paul J Newcombe
    Chris Wallace
    BMC Genomics, 20
  • [4] Extracting multi-way chromatin contacts from Hi-C data
    Liu, Lei
    Zhang, Bokai
    Hyeon, Changbong
    PLOS COMPUTATIONAL BIOLOGY, 2021, 17 (12)
  • [5] HiCEnterprise: identifying long range chromosomal contacts in Hi-C data
    Kranas, Hanna
    Tuszynska, Irina
    Wilczynski, Bartek
    PEERJ, 2021, 9
  • [6] Statistical confidence estimation for Hi-C data reveals regulatory chromatin contacts
    Ay, Ferhat
    Bailey, Timothy L.
    Noble, William Stafford
    GENOME RESEARCH, 2014, 24 (06) : 999 - 1011
  • [7] Rich Chromatin Structure Prediction from Hi-C Data
    Malik, Laraib
    Patro, Rob
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (05) : 1448 - 1458
  • [8] Rich Chromatin Structure Prediction from Hi-C Data
    Malik, Laraib
    Patro, Rob
    ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 184 - 193
  • [9] An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data
    Mark Carty
    Lee Zamparo
    Merve Sahin
    Alvaro González
    Raphael Pelossof
    Olivier Elemento
    Christina S. Leslie
    Nature Communications, 8
  • [10] An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data
    Carty, Mark
    Zamparo, Lee
    Sahin, Merve
    Gonzalez, Alvaro
    Pelossof, Raphael
    Elemento, Olivier
    Leslie, Christina S.
    NATURE COMMUNICATIONS, 2017, 8