Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2

被引:117
|
作者
Kaul, Arya [1 ,4 ]
Bhattacharyya, Sourya [2 ]
Ay, Ferhat [2 ,3 ]
机构
[1] Univ Calif San Diego, Dept Bioengn, La Jolla, CA 92093 USA
[2] La Jolla Inst Immunol, Div Vaccine Discovery, La Jolla, CA 92037 USA
[3] Univ Calif San Diego, Sch Med, La Jolla, CA 92093 USA
[4] Harvard Med Sch, Dept Biomed Informat, Boston, MA 02115 USA
关键词
REVEALS; GENOME; ORGANIZATION; PRINCIPLES; MODEL; MAP;
D O I
10.1038/s41596-019-0273-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Fit-Hi-C is a programming application to compute statistical confidence estimates for Hi-C contact maps to identify significant chromatin contacts. By fitting a monotonically non-increasing spline, Fit-Hi-C captures the relationship between genomic distance and contact probability without any parametric assumption. The spline fit together with the correction of contact probabilities with respect to bin- or locus-specific biases accounts for previously characterized covariates impacting Hi-C contact counts. Fit-Hi-C is best applied for the study of mid-range (e.g., 20 kb-2 Mb for human genome) intra-chromosomal contacts; however, with the latest reimplementation, named FitHiC2, it is possible to perform genome-wide analysis for high-resolution Hi-C data, including all intra-chromosomal distances and inter-chromosomal contacts. FitHiC2 also offers a merging filter module, which eliminates indirect/bystander interactions, leading to significant reduction in the number of reported contacts without sacrificing recovery of key loops such as those between convergent CTCF binding sites. Here, we describe how to apply the FitHiC2 protocol to three use cases: (i) 5-kb resolution Hi-C data of chromosome 5 from GM12878 (a human lymphoblastoid cell line), (ii) 40-kb resolution whole-genome Hi-C data from IMR90 (human lung fibroblast), and (iii) budding yeast whole-genome Hi-C data at a single restriction cut site (EcoRI) resolution. The procedure takes 12 h with preprocessing when all use cases are run sequentially (4 h when run parallel). With the recent improvements in its implementation, FitHiC2 (8 processors and 16 GB memory) is also scalable to genome-wide analysis of the highest resolution (1 kb) Hi-C data available to date (48 h with 32 GB peak memory). FitHiC2 is available through Bioconda, GitHub and the Python Package Index. Fit-Hi-C is a computational tool for identifying statistically significant contacts from Hi-C data. This protocol describes how to apply the new version, called FitHiC2, on high-resolution Hi-C data, demonstrating the added functionalities.
引用
收藏
页码:991 / 1012
页数:22
相关论文
共 50 条
  • [21] Minute-Made Data Analysis: Tools for Rapid Interrogation of Hi-C Contacts
    Rowley, M. Jordan
    Corces, Victor G.
    MOLECULAR CELL, 2016, 64 (01) : 9 - 11
  • [22] scHiCDiff: detecting differential chromatin interactions in single-cell Hi-C data
    Liu, Huiling
    Ma, Wenxiu
    BIOINFORMATICS, 2023, 39 (10)
  • [23] Comparative study on chromatin loop callers using Hi-C data reveals their effectiveness
    H. M. A. Mohit Chowdhury
    Terrance Boult
    Oluwatosin Oluwadare
    BMC Bioinformatics, 25
  • [24] Comparative study on chromatin loop callers using Hi-C data reveals their effectiveness
    Chowdhury, H. M. A. Mohit
    Boult, Terrance
    Oluwadare, Oluwatosin
    BMC BIOINFORMATICS, 2024, 25 (01)
  • [25] Multiplex-GAM: genome-wide identification of chromatin contacts yields insights overlooked by Hi-C
    Beagrie, Robert A. A.
    Thieme, Christoph J. J.
    Annunziatella, Carlo
    Baugher, Catherine
    Zhang, Yingnan
    Schueler, Markus
    Kukalev, Alexander
    Kempfer, Rieke
    Chiariello, Andrea M. M.
    Bianco, Simona
    Li, Yichao
    Davis, Trenton
    Scialdone, Antonio
    Welch, Lonnie R. R.
    Nicodemi, Mario
    Pombo, Ana
    NATURE METHODS, 2023, 20 (07) : 1037 - +
  • [26] Inferring chromosome radial organization from Hi-C data
    Das, Priyojit
    Shen, Tongye
    McCord, Rachel Patton
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [27] A fast and adaptive detection framework for genome-wide chromatin loop mapping from Hi-C data
    Chen, Siyuan
    Wang, Jiuming
    Jung, Inkyung
    Qiu, Zhaowen
    Gao, Xin
    Li, Yu
    GENOME RESEARCH, 2024, 34 (08) : 1174 - 1184
  • [28] Multiplex-GAM: genome-wide identification of chromatin contacts yields insights overlooked by Hi-C
    Robert A. Beagrie
    Christoph J. Thieme
    Carlo Annunziatella
    Catherine Baugher
    Yingnan Zhang
    Markus Schueler
    Alexander Kukalev
    Rieke Kempfer
    Andrea M. Chiariello
    Simona Bianco
    Yichao Li
    Trenton Davis
    Antonio Scialdone
    Lonnie R. Welch
    Mario Nicodemi
    Ana Pombo
    Nature Methods, 2023, 20 : 1037 - 1047
  • [29] Inferring chromosome radial organization from Hi-C data
    Priyojit Das
    Tongye Shen
    Rachel Patton McCord
    BMC Bioinformatics, 21
  • [30] Identification of significant chromatin contacts from HiChIP data by FitHiChIP
    Bhattacharyya, Sourya
    Chandra, Vivek
    Vijayanand, Pandurangan
    Ay, Ferhat
    NATURE COMMUNICATIONS, 2019, 10 (1)