An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data

被引:14
|
作者
Jenkinson, Garrett [1 ,2 ]
Abante, Jordi [1 ]
Feinberg, Andrew P. [2 ,3 ,4 ]
Goutsias, John [1 ]
机构
[1] Johns Hopkins Univ, Whitaker Biomed Engn Inst, Baltimore, MD 21218 USA
[2] Johns Hopkins Sch Med, Ctr Epigenet, Baltimore, MD USA
[3] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD USA
[4] Johns Hopkins Sch Med, Dept Med, Baltimore, MD USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
DNA methylation; Genome analysis; Information theory; Ising model; Methylation analysis; WGBS data modeling and analysis; DIFFERENTIALLY METHYLATED REGIONS; FALSE DISCOVERY RATE; DNA METHYLATION; CPG ISLANDS; OPTIMIZATION; POWERFUL; GENES;
D O I
10.1186/s12859-018-2086-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical Results: We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. Conclusions: This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Efficient and fast identification of differentially methylated regions using whole-genome bisulfite sequencing data
    Diep, Dinh
    Zhang, Kun
    JOURNAL OF GENETICS AND GENOMICS, 2018, 45 (08) : 455 - 457
  • [32] Detection of differentially methylated regions from whole-genome bisulfite sequencing data without replicates
    Wu, Hao
    Xu, Tianlei
    Feng, Hao
    Chen, Li
    Li, Ben
    Yao, Bing
    Qin, Zhaohui
    Jin, Peng
    Conneely, Karen N.
    NUCLEIC ACIDS RESEARCH, 2015, 43 (21)
  • [33] Comparison and quantitative verification of mapping algorithms for whole-genome bisulfite sequencing
    Kunde-Ramamoorthy, Govindarajan
    Coarfa, Cristian
    Laritsky, Eleonora
    Kessler, Noah J.
    Harris, R. Alan
    Xu, Mingchu
    Chen, Rui
    Shen, Lanlan
    Milosavljevic, Aleksandar
    Waterland, Robert A.
    NUCLEIC ACIDS RESEARCH, 2014, 42 (06) : e43
  • [34] epiG: statistical inference and profiling of DNA methylation from whole-genome bisulfite sequencing data
    Martin Vincent
    Kamilla Mundbjerg
    Jakob Skou Pedersen
    Gangning Liang
    Peter A. Jones
    Torben Falck Ørntoft
    Karina Dalsgaard Sørensen
    Carsten Wiuf
    Genome Biology, 18
  • [35] Amplification-free whole-genome bisulfite sequencing by post-bisulfite adaptor tagging
    Miura, Fumihito
    Enomoto, Yusuke
    Dairiki, Ryo
    Ito, Takashi
    NUCLEIC ACIDS RESEARCH, 2012, 40 (17)
  • [36] An Information-theoretic approach for computational material modeling
    Furukawa, Tomonari
    Michopoulos, John G.
    ADVANCES IN FRACTURE AND MATERIALS BEHAVIOR, PTS 1 AND 2, 2008, 33-37 : 857 - +
  • [37] An information-theoretic approach to stochastic materials modeling
    Zabaras, Nicholas
    Sankaran, Sethuraman
    COMPUTING IN SCIENCE & ENGINEERING, 2007, 9 (02) : 30 - 39
  • [38] Analysis of homozygosity disequilibrium using whole-genome sequencing data
    Hsin-Chou Yang
    Han-Wei Li
    BMC Proceedings, 8 (Suppl 1)
  • [39] Whole-Genome Sequencing in Outbreak Analysis
    Gilchrist, Carol A.
    Turner, Stephen D.
    Riley, Margaret F.
    Petri, William A., Jr.
    Hewlett, Erik L.
    CLINICAL MICROBIOLOGY REVIEWS, 2015, 28 (03) : 541 - 563
  • [40] Whole-genome bisulfite sequencing analysis of circulating tumour DNA for the detection and molecular classification of cancer
    Gao, Yibo
    Zhao, Hengqiang
    An, Ke
    Liu, Zongzhi
    Hai, Luo
    Li, Renda
    Zhou, Yang
    Zhao, Weipeng
    Jia, Yongsheng
    Wu, Nan
    Li, Lingyu
    Ying, Jianming
    Wang, Jie
    Xu, Binghe
    Wu, Zhihong
    Tong, Zhongsheng
    He, Jie
    Sun, Yingli
    CLINICAL AND TRANSLATIONAL MEDICINE, 2022, 12 (08):