CART variance stabilization and regularization for high-throughput genomic data

被引:6
|
作者
Papana, Ariadni
Ishwaran, Hemant
机构
[1] Cleveland Clin, Dept Quantitat Hlth Sci, Cleveland, OH 44195 USA
[2] Case Western Reserve Univ, Dept Stat, Cleveland, OH 44106 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btl384
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: mRNA expression data obtained from high-throughput DNA microarrays exhibit strong departures from homogeneity of variances. Often a complex relationship between mean expression value and variance is seen. Variance stabilization of such data is crucial for many types of statistical analyses, while regularization of variances (pooling of information) can greatly improve overall accuracy of test statistics. Results: A Classification and Regression Tree (CART) procedure is introduced for variance stabilization as well as regularization. The CART procedure adaptively clusters genes by variances. Using both local and cluster wide information leads to improved estimation of population variances which improves test statistics. Whereas making use of cluster wide information allows for variance stabilization of data.
引用
收藏
页码:2254 / 2261
页数:8
相关论文
共 50 条
  • [31] Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems
    André E Minoche
    Juliane C Dohm
    Heinz Himmelbauer
    Genome Biology, 12
  • [32] DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis
    Sheng, Quanhu
    Shyr, Yu
    Chen, Xi
    BMC BIOINFORMATICS, 2014, 15
  • [33] InvBFM: finding genomic inversions from high-throughput sequence data based on feature mining
    Zhongjia Wu
    Yufeng Wu
    Jingyang Gao
    BMC Genomics, 21
  • [34] Filtering high-throughput protein-protein interaction data using a combination of genomic features
    Patil, A
    Nakamura, H
    BMC BIOINFORMATICS, 2005, 6 (1)
  • [35] Filtering high-throughput protein-protein interaction data using a combination of genomic features
    Ashwini Patil
    Haruki Nakamura
    BMC Bioinformatics, 6
  • [36] InvBFM: finding genomic inversions from high-throughput sequence data based on feature mining
    Wu, Zhongjia
    Wu, Yufeng
    Gao, Jingyang
    BMC GENOMICS, 2020, 21 (Suppl 1)
  • [37] Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems
    Minoche, Andre E.
    Dohm, Juliane C.
    Himmelbauer, Heinz
    GENOME BIOLOGY, 2011, 12 (11):
  • [38] DupChecker: a bioconductor package for checking high-throughput genomic data redundancy in meta-analysis
    Quanhu Sheng
    Yu Shyr
    Xi Chen
    BMC Bioinformatics, 15
  • [39] High-throughput imaging for the systematic spatial analysis of genomic positioning
    Shachar, S.
    Burman, B.
    Voss, T. C.
    Misteli, T.
    Pegoraro, G.
    MOLECULAR BIOLOGY OF THE CELL, 2015, 26
  • [40] High-throughput method for detecting genomic-deletion polymorphisms
    de la Salmonière, YOLG
    Kim, CC
    Tsolaki, AG
    Pym, AS
    Siegrist, MS
    Small, PM
    JOURNAL OF CLINICAL MICROBIOLOGY, 2004, 42 (07) : 2913 - 2918