Independent filtering increases detection power for high-throughput experiments

被引:509
作者
Bourgon, Richard [2 ]
Gentleman, Robert [3 ]
Huber, Wolfgang [1 ]
机构
[1] European Mol Biol Lab, D-69117 Heidelberg, Germany
[2] European Bioinformat Inst, Cambridge CB10 1SD, England
[3] Genentech Inc, San Francisco, CA 94080 USA
关键词
gene expression; multiple testing; MICROARRAY; GENES;
D O I
10.1073/pnas.0914005107
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
With high-dimensional data, variable-by-variable statistical testing is often used to select variables whose behavior differs across conditions. Such an approach requires adjustment for multiple testing, which can result in low statistical power. A two-stage approach that first filters variables by a criterion independent of the test statistic, and then only tests variables which pass the filter, can provide higher power. We show that use of some filter/test statistics pairs presented in the literature may, however, lead to loss of type I error control. We describe other pairs which avoid this problem. In an application to microarray data, we found that gene-by-gene filtering by overall variance followed by a t-test increased the number of discoveries by 50%. We also show that this particular statistic pair induces a lower bound on fold-change among the set of discoveries. Independent filtering-using filter/test pairs that are independent under the null hypothesis but correlated under the alternative-is a general approach that can substantially increase the efficiency of experiments.
引用
收藏
页码:9546 / 9551
页数:6
相关论文
共 17 条
[1]  
*AFF INC, 2002, STAT ALG DESCR DOC T
[2]   Gene expression profiles of B-lineage adult acute lymphocytic leukemia reveal genetic patterns that identify lineage derivation and distinct mechanisms of transformation [J].
Chiaretti, S ;
Li, XC ;
Gentleman, R ;
Vitale, A ;
Wang, KS ;
Mandelli, F ;
Foà, R ;
Ritz, J .
CLINICAL CANCER RESEARCH, 2005, 11 (20) :7209-7219
[3]   Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival [J].
Chiaretti, S ;
Li, XC ;
Gentleman, R ;
Vitale, A ;
Vignetti, M ;
Mandelli, F ;
Ritz, J ;
Foa, R .
BLOOD, 2004, 103 (07) :2771-2778
[4]   Multiple hypothesis testing in microarray experiments [J].
Dudoit, S ;
Shaffer, JP ;
Boldrick, JC .
STATISTICAL SCIENCE, 2003, 18 (01) :71-103
[5]   Sure independence screening for ultrahigh dimensional feature space [J].
Fan, Jianqing ;
Lv, Jinchi .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :849-883
[6]   Filtering for increased power for microarray data analysis [J].
Hackstadt, Amber J. ;
Hess, Ann M. .
BMC BIOINFORMATICS, 2009, 10
[7]   Analysis of variance for gene expression microarray data [J].
Kerr, MK ;
Martin, M ;
Churchill, GA .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (06) :819-837
[8]  
Lönnstedt I, 2002, STAT SINICA, V12, P31
[9]   A class comparison method with filtering-enhanced variable selection for high-dimensional data sets [J].
Lusa, Lara ;
Korn, Edward L. ;
McShane, Lisa M. .
STATISTICS IN MEDICINE, 2008, 27 (28) :5834-5849
[10]   Effects of filtering by present call on analysis of microarray experiments [J].
McClintick, JN ;
Edenberg, HJ .
BMC BIOINFORMATICS, 2006, 7 (1)