An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data

被引:14
|
作者
Jenkinson, Garrett [1 ,2 ]
Abante, Jordi [1 ]
Feinberg, Andrew P. [2 ,3 ,4 ]
Goutsias, John [1 ]
机构
[1] Johns Hopkins Univ, Whitaker Biomed Engn Inst, Baltimore, MD 21218 USA
[2] Johns Hopkins Sch Med, Ctr Epigenet, Baltimore, MD USA
[3] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD USA
[4] Johns Hopkins Sch Med, Dept Med, Baltimore, MD USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
DNA methylation; Genome analysis; Information theory; Ising model; Methylation analysis; WGBS data modeling and analysis; DIFFERENTIALLY METHYLATED REGIONS; FALSE DISCOVERY RATE; DNA METHYLATION; CPG ISLANDS; OPTIMIZATION; POWERFUL; GENES;
D O I
10.1186/s12859-018-2086-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical Results: We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. Conclusions: This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data
    Garrett Jenkinson
    Jordi Abante
    Andrew P. Feinberg
    John Goutsias
    BMC Bioinformatics, 19
  • [2] Saturation analysis for whole-genome bisulfite sequencing data
    Emanuele Libertini
    Simon C Heath
    Rifat A Hamoudi
    Marta Gut
    Michael J Ziller
    Javier Herrero
    Agata Czyz
    Victor Ruotti
    Hendrik G Stunnenberg
    Mattia Frontini
    Willem H Ouwehand
    Alexander Meissner
    Ivo G Gut
    Stephan Beck
    Nature Biotechnology, 2016, 34 : 691 - 693
  • [3] Saturation analysis for whole-genome bisulfite sequencing data
    Libertini, Emanuele
    Heath, Simon C.
    Hamoudi, Rifat A.
    Gut, Marta
    Ziller, Michael J.
    Herrero, Javier
    Czyz, Agata
    Ruotti, Victor
    Stunnenberg, Hendrik G.
    Frontini, Mattia
    Ouwehand, Willem H.
    Meissner, Alexander
    Gut, Ivo G.
    Beck, Stephan
    NATURE BIOTECHNOLOGY, 2016, 34 (07) : 691 - 693
  • [4] msPIPE: a pipeline for the analysis and visualization of whole-genome bisulfite sequencing data
    Heesun Kim
    Mikang Sim
    Nayoung Park
    Kisang Kwon
    Junyoung Kim
    Jaebum Kim
    BMC Bioinformatics, 23
  • [5] msPIPE: a pipeline for the analysis and visualization of whole-genome bisulfite sequencing data
    Kim, Heesun
    Sim, Mikang
    Park, Nayoung
    Kwon, Kisang
    Kim, Junyoung
    Kim, Jaebum
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [6] Methodological aspects of whole-genome bisulfite sequencing analysis
    Adusumalli, Swarnaseetha
    Omar, Mohd Feroz Mohd
    Soong, Richie
    Benoukraf, Touati
    BRIEFINGS IN BIOINFORMATICS, 2015, 16 (03) : 369 - 379
  • [7] Information recovery from low coverage whole-genome bisulfite sequencing
    Libertini, Emanuele
    Heath, Simon C.
    Hamoudi, Rifat A.
    Gut, Marta
    Ziller, Michael J.
    Czyz, Agata
    Ruotti, Victor
    Stunnenberg, Hendrik G.
    Frontini, Mattia
    Ouwehand, Willem H.
    Meissner, Alexander
    Gut, Ivo G.
    Beck, Stephan
    NATURE COMMUNICATIONS, 2016, 7
  • [8] Information recovery from low coverage whole-genome bisulfite sequencing
    Emanuele Libertini
    Simon C. Heath
    Rifat A. Hamoudi
    Marta Gut
    Michael J. Ziller
    Agata Czyz
    Victor Ruotti
    Hendrik G. Stunnenberg
    Mattia Frontini
    Willem H. Ouwehand
    Alexander Meissner
    Ivo G. Gut
    Stephan Beck
    Nature Communications, 7
  • [9] Coverage recommendations for methylation analysis by whole-genome bisulfite sequencing
    Ziller, Michael J.
    Hansen, Kasper D.
    Meissner, Alexander
    Aryee, Martin J.
    NATURE METHODS, 2015, 12 (03) : 230 - +
  • [10] An integrative approach for efficient analysis of whole genome bisulfite sequencing data
    Lee, Jong-Hun
    Park, Sung-Joon
    Kenta, Nakai
    BMC GENOMICS, 2015, 16