CART variance stabilization and regularization for high-throughput genomic data

被引：6

作者：

Papana, Ariadni

Ishwaran, Hemant

机构：

[1] Cleveland Clin, Dept Quantitat Hlth Sci, Cleveland, OH 44195 USA

[2] Case Western Reserve Univ, Dept Stat, Cleveland, OH 44106 USA

来源：

BIOINFORMATICS | 2006年 / 22卷 / 18期

基金：

美国国家科学基金会;

关键词：

D O I：

10.1093/bioinformatics/btl384

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: mRNA expression data obtained from high-throughput DNA microarrays exhibit strong departures from homogeneity of variances. Often a complex relationship between mean expression value and variance is seen. Variance stabilization of such data is crucial for many types of statistical analyses, while regularization of variances (pooling of information) can greatly improve overall accuracy of test statistics. Results: A Classification and Regression Tree (CART) procedure is introduced for variance stabilization as well as regularization. The CART procedure adaptively clusters genes by variances. Using both local and cluster wide information leads to improved estimation of population variances which improves test statistics. Whereas making use of cluster wide information allows for variance stabilization of data.

引用

页码：2254 / 2261

页数：8

共 50 条

[1] High-Throughput Genomic Data in Systematics and Phylogenetics
Lemmon, Emily Moriarty
Lemmon, Alan R.
ANNUAL REVIEW OF ECOLOGY, EVOLUTION, AND SYSTEMATICS, VOL 44, 2013, 44 : 99 - +
[2] Joint adaptive mean-variance regularization and variance stabilization of high dimensional data
Dazard, Jean-Eudes
Rao, J. Sunil
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2012, 56 (07) : 2317 - 2333
[3] High-throughput DNA sequencing: A genomic data manufacturing process
Huang, GM
DNA SEQUENCE, 1999, 10 (03): : 149 - 153
[4] NCBI GEO: archive for high-throughput functional genomic data
Barrett, Tanya
Troup, Dennis B.
Wilhite, Stephen E.
Ledoux, Pierre
Rudnev, Dmitry
Evangelista, Carlos
Kim, Irene F.
Soboleva, Alexandra
Tomashevsky, Maxim
Marshall, Kimberly A.
Phillippy, Katherine H.
Sherman, Patti M.
Muertter, Rolf N.
Edgar, Ron
NUCLEIC ACIDS RESEARCH, 2009, 37 : D885 - D890
[5] Antisense for high-throughput genomic studies
Hackett, PB
Essner, JJ
GENETIC ENGINEERING NEWS, 2003, 23 (05): : 34 - +
[6] Latent Feature Decompositions for Integrative Analysis of Diverse High-throughput Genomic Data
Gregory, Karl B.
Coombes, Kevin R.
Momin, Amin
Girard, Luc
Byers, Lauren A.
Lin, Steven
Peyton, Michael
Heymach, John V.
Minna, John D.
Baladandayuthapani, Veerabhadran
2012 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS), 2012, : 130 - 134
[7] Super-sparse principal component analyses for high-throughput genomic data
Lee, Donghwan
Lee, Woojoo
Lee, Youngjo
Pawitan, Yudi
BMC BIOINFORMATICS, 2010, 11
[8] Efficient high-throughput resequencing of genomic DNA
Miller, RD
Duan, S
Lovins, EG
Kloss, EF
Kwok, PY
GENOME RESEARCH, 2003, 13 (04) : 717 - 720
[9] Super-sparse principal component analyses for high-throughput genomic data
Donghwan Lee
Woojoo Lee
Youngjo Lee
Yudi Pawitan
BMC Bioinformatics, 11
[10] Detecting genomic deletions from high-throughput sequence data with unsupervised learning
Li X.
Wu Y.
BMC Bioinformatics, 2022, 23 (Suppl 8)

← 1 2 3 4 5 →