Adaptive Testing for High-Dimensional Data

被引:0
|
作者
Zhang, Yangfan [1 ]
Wang, Runmin [2 ]
Shao, Xiaofeng [3 ]
机构
[1] Two Sigma Investments, New York, NY USA
[2] Texas A&M Univ, Dept Stat, 3143 TAMU, College Stn, TX 77843 USA
[3] Univ Illinois, Dept Stat, Champaign, IL USA
关键词
Independence testing; Simultaneous testing; Spatial sign; U-statistics; HIGHER CRITICISM; COVARIANCE-MATRIX; 2-SAMPLE TEST; ASYMPTOTIC DISTRIBUTIONS; U-STATISTICS; INDEPENDENCE; COHERENCE; SIGNALS; ANOVA;
D O I
10.1080/01621459.2024.2439617
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this article, we propose a class of L-q -norm based U-statistics for a family of global testing problems related to high-dimensional data. This includes testing of mean vector and its spatial sign, simultaneous testing of linear model coefficients, and testing of component-wise independence for high-dimensional observations, among others. Under the null hypothesis, we derive asymptotic normality and independence between L-q -norm based U-statistics for several qs under mild moment and cumulant conditions. A simple combination of two studentized L-q -based test statistics via their p-values is proposed and is shown to attain great power against alternatives of different sparsity. Our work is a substantial extension of He et al., which is mostly focused on mean and covariance testing, and we manage to provide a general treatment of asymptotic independence of L-q -norm based U-statistics for a wide class of kernels. To alleviate the computation burden, we introduce a variant of the proposed U-statistics by using the monotone indices in the summation, resulting in a U-statistic with asymmetric kernel. A dynamic programming method is introduced to reduce the computational cost from O(n(qr)) , which is required for the calculation of the full U-statistic, to O(n (R)) where r is the order of the kernel. Numerical results further corroborate the advantage of the proposed adaptive test as compared to some existing competitors. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Adaptive Lasso in high-dimensional settings
    Lin, Zhengyan
    Xiang, Yanbiao
    Zhang, Caiya
    JOURNAL OF NONPARAMETRIC STATISTICS, 2009, 21 (06) : 683 - 696
  • [42] Testing covariates in high-dimensional regression
    Lan, Wei
    Wang, Hansheng
    Tsai, Chih-Ling
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2014, 66 (02) : 279 - 301
  • [43] On Criticality in High-Dimensional Data
    Saremi, Saeed
    Sejnowski, Terrence J.
    NEURAL COMPUTATION, 2014, 26 (07) : 1329 - 1339
  • [44] Testing covariates in high-dimensional regression
    Wei Lan
    Hansheng Wang
    Chih-Ling Tsai
    Annals of the Institute of Statistical Mathematics, 2014, 66 : 279 - 301
  • [45] Private High-Dimensional Hypothesis Testing
    Narayanan, Shyam
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [46] High-Dimensional Data Bootstrap
    Chernozhukov, Victor
    Chetverikov, Denis
    Kato, Kengo
    Koike, Yuta
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, 2023, 10 : 427 - 449
  • [47] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [49] High-dimensional data visualization
    Tang, Lin
    NATURE METHODS, 2020, 17 (02) : 129 - 129
  • [50] Testing for heteroscedasticity in high-dimensional regressions
    Li, Zhaoyuan
    Yao, Jianfeng
    ECONOMETRICS AND STATISTICS, 2019, 9 : 122 - 139