CART variance stabilization and regularization for high-throughput genomic data

被引:6
|
作者
Papana, Ariadni
Ishwaran, Hemant
机构
[1] Cleveland Clin, Dept Quantitat Hlth Sci, Cleveland, OH 44195 USA
[2] Case Western Reserve Univ, Dept Stat, Cleveland, OH 44106 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btl384
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: mRNA expression data obtained from high-throughput DNA microarrays exhibit strong departures from homogeneity of variances. Often a complex relationship between mean expression value and variance is seen. Variance stabilization of such data is crucial for many types of statistical analyses, while regularization of variances (pooling of information) can greatly improve overall accuracy of test statistics. Results: A Classification and Regression Tree (CART) procedure is introduced for variance stabilization as well as regularization. The CART procedure adaptively clusters genes by variances. Using both local and cluster wide information leads to improved estimation of population variances which improves test statistics. Whereas making use of cluster wide information allows for variance stabilization of data.
引用
收藏
页码:2254 / 2261
页数:8
相关论文
共 50 条
  • [1] High-Throughput Genomic Data in Systematics and Phylogenetics
    Lemmon, Emily Moriarty
    Lemmon, Alan R.
    ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS, VOL 44, 2013, 44 : 99 - +
  • [2] Joint adaptive mean-variance regularization and variance stabilization of high dimensional data
    Dazard, Jean-Eudes
    Rao, J. Sunil
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (07) : 2317 - 2333
  • [3] High-throughput DNA sequencing: A genomic data manufacturing process
    Huang, GM
    DNA SEQUENCE, 1999, 10 (03): : 149 - 153
  • [4] NCBI GEO: archive for high-throughput functional genomic data
    Barrett, Tanya
    Troup, Dennis B.
    Wilhite, Stephen E.
    Ledoux, Pierre
    Rudnev, Dmitry
    Evangelista, Carlos
    Kim, Irene F.
    Soboleva, Alexandra
    Tomashevsky, Maxim
    Marshall, Kimberly A.
    Phillippy, Katherine H.
    Sherman, Patti M.
    Muertter, Rolf N.
    Edgar, Ron
    NUCLEIC ACIDS RESEARCH, 2009, 37 : D885 - D890
  • [5] Antisense for high-throughput genomic studies
    Hackett, PB
    Essner, JJ
    GENETIC ENGINEERING NEWS, 2003, 23 (05): : 34 - +
  • [6] Latent Feature Decompositions for Integrative Analysis of Diverse High-throughput Genomic Data
    Gregory, Karl B.
    Coombes, Kevin R.
    Momin, Amin
    Girard, Luc
    Byers, Lauren A.
    Lin, Steven
    Peyton, Michael
    Heymach, John V.
    Minna, John D.
    Baladandayuthapani, Veerabhadran
    2012 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS), 2012, : 130 - 134
  • [7] Super-sparse principal component analyses for high-throughput genomic data
    Lee, Donghwan
    Lee, Woojoo
    Lee, Youngjo
    Pawitan, Yudi
    BMC BIOINFORMATICS, 2010, 11
  • [8] Efficient high-throughput resequencing of genomic DNA
    Miller, RD
    Duan, S
    Lovins, EG
    Kloss, EF
    Kwok, PY
    GENOME RESEARCH, 2003, 13 (04) : 717 - 720
  • [9] Super-sparse principal component analyses for high-throughput genomic data
    Donghwan Lee
    Woojoo Lee
    Youngjo Lee
    Yudi Pawitan
    BMC Bioinformatics, 11
  • [10] Detecting genomic deletions from high-throughput sequence data with unsupervised learning
    Li X.
    Wu Y.
    BMC Bioinformatics, 2022, 23 (Suppl 8)