A class comparison method with filtering-enhanced variable selection for high-dimensional data sets

被引:6
作者
Lusa, Lara [1 ]
Korn, Edward L. [2 ]
McShane, Lisa M. [2 ]
机构
[1] Univ Ljubljana, Dept Med Informat, Ljubljana 61000, Slovenia
[2] NCI, Biometr Res Branch, Bethesda, MD 20892 USA
关键词
multiple testing methods; multivariate permutation methods; high-dimensional data; microarrays; variable filtering;
D O I
10.1002/sim.3405
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
High-throughput molecular analysis technologies can produce thousands of measurements for each of the assayed samples. A common scientific question is to identify the variables whose distribution differ between some pre-specified classes (i.e. are differentially expressed). The statistical cost of examining thousands of variables is related to the risk of identifying many variables that truly are not differentially expressed, and many different multiple testing strategies have been used for the analysis of high-dimensional data sets to control the number of these false positives. An approach that is often used in practice to reduce the multiple comparisons problem is to lessen the number of comparisons being performed by filtering out variables that are considered non-informative 'before' the analysis. However, deciding which and how many variables should be filtered out can be highly arbitrary, and different filtering strategies can result in different variables being identified as differentially expressed. We propose the filtering-enhanced variable selection (FEVS) method, a new multiple testin strategy for identifying differentially expressed variables. This method identifies differentially expressed variables by combining the results obtained using a variety of filtering methods, instead of using a pre-specified filtering method or trying to identify an optimal filtering of the variables prior to class comparison analysis. We prove that the FEVS method probabilistically controls the the number of false discoveries, and we show with a set of simulations and an example form the literature that FEVS can be useful for gaining sensitivity for the detection of truly differentially expressed variables. Published in 2008 by John Wiley & Sons. Ltd.
引用
收藏
页码:5834 / 5849
页数:16
相关论文
共 22 条
[1]   Gene expression profiles in peripheral lymphocytes by arsenic exposure and skin lesion status in a Bangladeshi population [J].
Argos, Maria ;
Kibriya, Muhammad G. ;
Parvez, Faruque ;
Jasmine, Farzana ;
Rakibuz-Zaman, Muhammad ;
Ahsan, Habibul .
CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2006, 15 (07) :1367-1375
[2]   Expression profiling of serous low malignant potential, low-grade, and high-grade tumors of the ovary. [J].
Bonome, T ;
Lee, JY ;
Park, DC ;
Radonovich, M ;
Pise-Masison, C ;
Brady, J ;
Gardner, GJ ;
Hao, K ;
Wong, WH ;
Barrett, JC ;
Lu, KH ;
Sood, AK ;
Gershenson, DM ;
Mok, SC ;
Birrer, MJ .
CANCER RESEARCH, 2005, 65 (22) :10602-10612
[3]   Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer [J].
Chang, JC ;
Wooten, EC ;
Tsimelzon, A ;
Hilsenbeck, SG ;
Gutierrez, MC ;
Elledge, R ;
Mohsin, S ;
Osborne, CK ;
Chamness, GC ;
Allred, DC ;
O'Connell, P .
LANCET, 2003, 362 (9381) :362-369
[4]  
DAVID HA, 1981, ORDER STATISTICS, P15
[5]   Estimating the false discovery rate using nonparametric deconvolution [J].
de Wiel, Mark A. van ;
Kim, Kyung In .
BIOMETRICS, 2007, 63 (03) :806-815
[6]   Genome-wide analysis of acute myeloid leukemia with normal karyotype reveals a unique pattern of homeobox gene expression distinct from those with translocation-mediated fusion events [J].
Debernardi, S ;
Lillington, DM ;
Chaplin, T ;
Tomlinson, S ;
Amess, J ;
Rohatiner, A ;
Lister, TA ;
Young, BD .
GENES CHROMOSOMES & CANCER, 2003, 37 (02) :149-158
[7]  
Dudoit S, 2002, STAT SINICA, V12, P111
[8]  
Dudoit S, 2008, SPRINGER SER STAT, P1
[9]   Long-term global gene expression patterns in irradiated human lymphocytes [J].
Fält, S ;
Holmberg, K ;
Lambert, B ;
Wennborg, A .
CARCINOGENESIS, 2003, 24 (11) :1837-1845
[10]   FDR- and FWE-controlling methods using data-driven weights [J].
Finos, Livio ;
Salmaso, Luigi .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2007, 137 (12) :3859-3870