CpGcluster:: a distance-based algorithm for CpG-island detection

被引:132
|
作者
Hackenberg, Michael
Previti, Christopher
Luque-Escamilla, Pedro Luis
Carpena, Pedro
Martinez-Aroza, Jose
Oliver, Jose L. [1 ]
机构
[1] Univ Granada, Fac Ciencias, Dept Genet, Granada, Spain
[2] Univ Jaen, Dpto Ingn Mecan & Minera, Jaen, Spain
[3] Univ Malaga, Dpto Fis Aplicada 2, E-29071 Malaga, Spain
[4] Univ Granada, Fac Ciencias, Dpto Matemat Aplicada, Granada, Spain
[5] German Canc Res Ctr, Dept Mol Biophys, D-6900 Heidelberg, Germany
关键词
D O I
10.1186/1471-2105-7-446
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Despite their involvement in the regulation of gene expression and their importance as genomic markers for promoter prediction, no objective standard exists for defining CpG islands (CGIs), since all current approaches rely on a large parameter space formed by the thresholds of length, CpG fraction and G+C content. Results: Given the higher frequency of CpG dinucleotides at CGIs, as compared to bulk DNA, the distance distributions between neighboring CpGs should differ for bulk and island CpGs. A new algorithm (CpGcluster) is presented, based on the physical distance between neighboring CpGs on the chromosome and able to predict directly clusters of CpGs, while not depending on the subjective criteria mentioned above. By assigning a p-value to each of these clusters, the most statistically significant ones can be predicted as CGIs. CpGcluster was benchmarked against five other CGI finders by using a test sequence set assembled from an experimental CGI library. CpGcluster reached the highest overall accuracy values, while showing the lowest rate of false-positive predictions. Since a minimum-length threshold is not required, CpGcluster can find short but fully functional CGIs usually missed by other algorithms. The CGIs predicted by CpGcluster present the lowest degree of overlap with Alu retrotransposons and, simultaneously, the highest overlap with vertebrate Phylogenetic Conserved Elements (PhastCons). CpGcluster's CGIs overlapping with the Transcription Start Site (TSS) show the highest statistical significance, as compared to the islands in other genome locations, thus qualifying CpGcluster as a valuable tool in discriminating functional CGIs from the remaining islands in the bulk genome. Conclusion: CpGcluster uses only integer arithmetic, thus being a fast and computationally efficient algorithm able to predict statistically significant clusters of CpG dinucleotides. Another outstanding feature is that all predicted CGIs start and end with a CpG dinucleotide, which should be appropriate for a genomic feature whose functionality is based precisely on CpG dinucleotides. The only search parameter in CpGcluster is the distance between two consecutive CpGs, in contrast to previous algorithms. Therefore, none of the main statistical properties of CpG islands ( neither G+C content, CpG fraction nor length threshold) are needed as search parameters, which may lead to the high specificity and low overlap with spurious Alu elements observed for CpGcluster predictions.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] CpGcluster: a distance-based algorithm for CpG-island detection
    Michael Hackenberg
    Christopher Previti
    Pedro Luis Luque-Escamilla
    Pedro Carpena
    José Martínez-Aroza
    José L Oliver
    BMC Bioinformatics, 7
  • [2] CpG-island methylation in aging and cancer
    Issa, JP
    DNA METHYLATION AND CANCER, 2000, 249 : 101 - 118
  • [3] An improved distance-based outliers detection algorithm
    Tian, Sheng-wen
    Huang, Ming-ming
    General System and Control System, Vol I, 2007, : 270 - 273
  • [4] Deoxyribonucleic acid methylation profiling of single human blastocysts by methylated CpG-island amplification coupled with CpG-island microarray
    Huntriss, John
    Hemmings, Karen
    Baskaran, Praveen
    Hazelwood, Lee
    Elder, Kay
    Virtanen, Carl
    Miller, David
    Picton, Helen M.
    FERTILITY AND STERILITY, 2015, 103 (06) : 1566 - U257
  • [5] AnomalyDetect: An Online Distance-Based Anomaly Detection Algorithm
    Huo, Wunjun
    Wang, Wei
    Li, Wen
    WEB SERVICES - ICWS 2019, 2019, 11512 : 63 - 79
  • [6] Detection of methylation patterns in the promoter CpG-island of O6-MGMT
    Muehlisch, J.
    Schlosser, S.
    Wagner, S.
    Hasselblatt, M.
    Pietsch, T.
    Warthorst, U.
    Lechtape, B.
    Wolff, J. E. A.
    Juergens, H.
    Fruehwald, M. C.
    KLINISCHE PADIATRIE, 2009, 221 (03): : 203 - 203
  • [7] CpG-island promoters drive transcription of human telomeres
    Nergadze, Solomon G.
    Farnung, Benjamin O.
    Wischnewski, Harry
    Khoriauli, Lela
    Vitelli, Valerio
    Chawla, Raghav
    Giulotto, Elena
    Azzalin, Claus M.
    RNA, 2009, 15 (12) : 2186 - 2194
  • [8] CpG-island methylation and epigenetic control of resistance to chemotherapy
    Teodoridis, JM
    Strathdee, G
    Plumb, JA
    Brown, R
    BIOCHEMICAL SOCIETY TRANSACTIONS, 2004, 32 : 916 - 917
  • [9] AIDA: Analytic isolation and distance-based anomaly detection algorithm
    Arias, Luis Antonio Souto
    Oosterlee, Cornelis W.
    Cirillo, Pasquale
    PATTERN RECOGNITION, 2023, 141
  • [10] Optical Trapping Nanometry of Hypermethylated CPG-Island DNA
    Pongor, Csaba I.
    Bianco, Pasquale
    Ferenczy, Gyorgy
    Kellermayer, Richard
    Kellermayer, Miklos
    BIOPHYSICAL JOURNAL, 2017, 112 (03) : 512 - 522