An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data

被引:14
|
作者
Jenkinson, Garrett [1 ,2 ]
Abante, Jordi [1 ]
Feinberg, Andrew P. [2 ,3 ,4 ]
Goutsias, John [1 ]
机构
[1] Johns Hopkins Univ, Whitaker Biomed Engn Inst, Baltimore, MD 21218 USA
[2] Johns Hopkins Sch Med, Ctr Epigenet, Baltimore, MD USA
[3] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD USA
[4] Johns Hopkins Sch Med, Dept Med, Baltimore, MD USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
DNA methylation; Genome analysis; Information theory; Ising model; Methylation analysis; WGBS data modeling and analysis; DIFFERENTIALLY METHYLATED REGIONS; FALSE DISCOVERY RATE; DNA METHYLATION; CPG ISLANDS; OPTIMIZATION; POWERFUL; GENES;
D O I
10.1186/s12859-018-2086-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical Results: We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. Conclusions: This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Tagmentation-based whole-genome bisulfite sequencing
    Qi Wang
    Lei Gu
    Andrew Adey
    Bernhard Radlwimmer
    Wei Wang
    Volker Hovestadt
    Marion Bähr
    Stephan Wolf
    Jay Shendure
    Roland Eils
    Christoph Plass
    Dieter Weichenhan
    Nature Protocols, 2013, 8 : 2022 - 2032
  • [22] Global analysis of DNA methylation in hepatocellular carcinoma via a whole-genome bisulfite sequencing approach
    Yan, Qian
    Tang, Ying
    He, Fan
    Xue, Jiao
    Zhou, Ruisheng
    Zhang, Xiaoying
    Luo, Huiyan
    Zhou, Daihan
    Wang, Xiongwen
    GENOMICS, 2021, 113 (05) : 3618 - 3634
  • [23] A Bayesian Approach for Analysis of Whole-Genome Bisulfite Sequencing Data Identifies Disease-Associated Changes in DNA Methylation
    Rackham, Owen J. L.
    Langley, Sarah R.
    Oates, Thomas
    Vradi, Eleni
    Harmston, Nathan
    Srivastava, Prashant K.
    Behmoaras, Jacques
    Dellaportas, Petros
    Bottolo, Leonardo
    Petretto, Enrico
    GENETICS, 2017, 205 (04) : 1443 - 1458
  • [24] Methylated DNA is over-represented in whole-genome bisulfite sequencing data
    Ji, Lexiang
    Sasaki, Takahiko
    Sun, Xiaoxiao
    Ma, Ping
    Lewis, Zachary A.
    Schmitz, Robert J.
    FRONTIERS IN GENETICS, 2014, 5
  • [25] Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants
    Grehl, Claudius
    Wagner, Marc
    Lemnian, Ioana
    Glaser, Bruno
    Grosse, Ivo
    FRONTIERS IN PLANT SCIENCE, 2020, 11
  • [26] Parametric modeling of whole-genome sequencing data for CNV identification
    Vardhanabhuti, Saran
    Jeng, X. Jessie
    Wu, Yinghua
    Li, Hongzhe
    BIOSTATISTICS, 2014, 15 (03) : 427 - 441
  • [27] PennCNV in whole-genome sequencing data
    Lima, Leandro de Araujo
    Wang, Kai
    BMC BIOINFORMATICS, 2017, 18
  • [28] PennCNV in whole-genome sequencing data
    Leandro de Araújo Lima
    Kai Wang
    BMC Bioinformatics, 18
  • [29] Efficient and fast identification of differentially methylated regions using whole-genome bisulfite sequencing data
    Dinh Diep
    Kun Zhang
    JournalofGeneticsandGenomics, 2018, 45 (08) : 455 - 457
  • [30] epiG: statistical inference and profiling of DNA methylation from whole-genome bisulfite sequencing data
    Vincent, Martin
    Mundbjerg, Kamilla
    Pedersen, Jakob Skou
    Liang, Gangning
    Jones, Peter A.
    Orntoft, Torben Falck
    Sorensen, Karina Dalsgaard
    Wiuf, Carsten
    GENOME BIOLOGY, 2017, 18