An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data

被引:14
|
作者
Jenkinson, Garrett [1 ,2 ]
Abante, Jordi [1 ]
Feinberg, Andrew P. [2 ,3 ,4 ]
Goutsias, John [1 ]
机构
[1] Johns Hopkins Univ, Whitaker Biomed Engn Inst, Baltimore, MD 21218 USA
[2] Johns Hopkins Sch Med, Ctr Epigenet, Baltimore, MD USA
[3] Johns Hopkins Univ, Dept Biomed Engn, Baltimore, MD USA
[4] Johns Hopkins Sch Med, Dept Med, Baltimore, MD USA
来源
BMC BIOINFORMATICS | 2018年 / 19卷
关键词
DNA methylation; Genome analysis; Information theory; Ising model; Methylation analysis; WGBS data modeling and analysis; DIFFERENTIALLY METHYLATED REGIONS; FALSE DISCOVERY RATE; DNA METHYLATION; CPG ISLANDS; OPTIMIZATION; POWERFUL; GENES;
D O I
10.1186/s12859-018-2086-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical Results: We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. Conclusions: This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] A pipeline for sample tagging of whole genome bisulfite sequencing data using genotypes of whole genome sequencing
    Zhe Xu
    Si Cheng
    Xin Qiu
    Xiaoqi Wang
    Qiuwen Hu
    Yanfeng Shi
    Yang Liu
    Jinxi Lin
    Jichao Tian
    Yongfei Peng
    Yong Jiang
    Yadong Yang
    Jianwei Ye
    Yilong Wang
    Xia Meng
    Zixiao Li
    Hao Li
    Yongjun Wang
    BMC Genomics, 24
  • [42] A pipeline for sample tagging of whole genome bisulfite sequencing data using genotypes of whole genome sequencing
    Xu, Zhe
    Cheng, Si
    Qiu, Xin
    Wang, Xiaoqi
    Hu, Qiuwen
    Shi, Yanfeng
    Liu, Yang
    Lin, Jinxi
    Tian, Jichao
    Peng, Yongfei
    Jiang, Yong
    Yang, Yadong
    Ye, Jianwei
    Wang, Yilong
    Meng, Xia
    Li, Zixiao
    Li, Hao
    Wang, Yongjun
    BMC GENOMICS, 2023, 24 (01)
  • [43] Whole-genome sequencing data of Kazakh individuals
    Kairov, Ulykbek
    Molkenov, Askhat
    Rakhimova, Saule
    Kozhamkulov, Ulan
    Sharip, Aigul
    Karabayev, Daniyar
    Daniyarov, Asset
    Lee, Joseph H.
    Terwilliger, Joseph D.
    Akilzhanova, Ainur
    Zhumadilov, Zhaxybay
    BMC RESEARCH NOTES, 2021, 14 (01)
  • [44] A binary search approach to whole-genome data analysis
    Brodsky, Leonid
    Kogan, Simon
    BenJacob, Eshel
    Nevo, Eviatar
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (39) : 16893 - 16898
  • [45] Whole-genome sequencing data of Kazakh individuals
    Ulykbek Kairov
    Askhat Molkenov
    Saule Rakhimova
    Ulan Kozhamkulov
    Aigul Sharip
    Daniyar Karabayev
    Asset Daniyarov
    Joseph H.Lee
    Joseph D.Terwilliger
    Ainur Akilzhanova
    Zhaxybay Zhumadilov
    BMC Research Notes, 14
  • [46] Software updates in the Illumina HiSeq platform affect whole-genome bisulfite sequencing
    Toh, Hidehiro
    Shirane, Kenjiro
    Miura, Fumihito
    Kubo, Naoki
    Ichiyanagi, Kenji
    Hayashi, Katsuhiko
    Saitou, Mitinori
    Suyama, Mikita
    Ito, Takashi
    Sasaki, Hiroyuki
    BMC GENOMICS, 2017, 18
  • [47] Software updates in the Illumina HiSeq platform affect whole-genome bisulfite sequencing
    Hidehiro Toh
    Kenjiro Shirane
    Fumihito Miura
    Naoki Kubo
    Kenji Ichiyanagi
    Katsuhiko Hayashi
    Mitinori Saitou
    Mikita Suyama
    Takashi Ito
    Hiroyuki Sasaki
    BMC Genomics, 18
  • [48] Whole-genome sequencing
    Morris, Huw R.
    Houlden, Henry
    Polke, James
    PRACTICAL NEUROLOGY, 2021, 21 (04) : 322 - +
  • [49] Data quality of whole genome bisulfite sequencing on Illumina platforms
    Raine, Amanda
    Liljedahl, Ulrike
    Nordlund, Jessica
    PLOS ONE, 2018, 13 (04):
  • [50] A whole-genome shotgun approach to human reference genome sequencing
    Morishita, Shinichi
    NATURE REVIEWS GENETICS, 2024, 25 (04) : 236 - 236