RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs

被引:39
|
作者
Fan, Yingying [1 ]
Demirkaya, Emre [2 ]
Li, Gaorong [3 ]
Lv, Jinchi [1 ]
机构
[1] Univ Southern Calif, Marshall Sch Business, Data Sci & Operat Dept, Los Angeles, CA 90089 USA
[2] Univ Tennessee, Dept Business Analyt & Stat, Haslam Coll Business, Knoxville, TN USA
[3] Beijing Univ Technol, Beijing Inst Sci & Engn Comp, Beijing, Peoples R China
关键词
Big data; Graphical nonlinear knockoffs; High-dimensional nonlinear models; Large-scale inference and FDR; Power; Reproducibility; Robustness; FALSE DISCOVERY RATE; VARIABLE SELECTION; UNKNOWN SPARSITY; REGRESSION; TESTS; IDENTIFICATION; BOOTSTRAP; RATES;
D O I
10.1080/01621459.2018.1546589
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this article, we provide theoretical foundations on the power and robustness for the model-X knockoffs procedure introduced recently in Candes, Fan, Janson and Lv in high-dimensional setting when the covariate distribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity. When moving away from the ideal case, we suggest the modified model-X knockoffs method called graphical nonlinear knockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) is asymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real dataset is analyzed to further assess the performance of the suggested knockoffs procedure. for this article are available online.
引用
收藏
页码:362 / 379
页数:18
相关论文
共 50 条
  • [21] ON LARGE-SCALE NONLINEAR NETWORK OPTIMIZATION
    TOINT, PL
    TUYTTENS, D
    MATHEMATICAL PROGRAMMING, 1990, 48 (01) : 125 - 159
  • [22] Erratum to: Algorithm of OMA for large-scale orthology inference
    Alexander CJ Roth
    Gaston H Gonnet
    Christophe Dessimoz
    BMC Bioinformatics, 10
  • [23] Accelerating Large-Scale Inference with Anisotropic Vector Quantization
    Guo, Ruiqi
    Sun, Philip
    Lindgren, Erik
    Geng, Quan
    Simcha, David
    Chern, Felix
    Kumar, Sanjiv
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [24] Distributed Bayesian Inference for Large-Scale IoT Systems
    Vlachou, Eleni
    Karras, Aristeidis
    Karras, Christos
    Theodorakopoulos, Leonidas
    Halkiopoulos, Constantinos
    Sioutas, Spyros
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (01)
  • [25] Fast Hamiltonian sampling for large-scale structure inference
    Jasche, Jens
    Kitaura, Francisco S.
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2010, 407 (01) : 29 - 42
  • [26] Recursive Regularization for Large-scale Classification with Hierarchical and Graphical Dependencies
    Gopal, Siddharth
    Yang, Yiming
    19TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'13), 2013, : 257 - 265
  • [27] Variable selection in latent variable models via knockoffs: an application to international large-scale assessment in education
    Xie, Zilong
    Chen, Yunxiao
    von Davier, Matthias
    Weng, Haolei
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2023, 187 (03) : 723 - 747
  • [28] Halo detection via large-scale Bayesian inference
    Merson, Alexander I.
    Jasche, Jens
    Abdalla, Filipe B.
    Lahav, Ofer
    Wandelt, Benjamin
    Jones, D. Heath
    Colless, Matthew
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2016, 460 (02) : 1340 - 1355
  • [29] A semiparametric graphical modelling approach for large-scale equity selection
    Liu, Han
    Mulvey, John
    Zhao, Tianqi
    QUANTITATIVE FINANCE, 2016, 16 (07) : 1053 - 1067
  • [30] LinkProbe: Probabilistic Inference on Large-Scale Social Networks
    Chen, Haiquan
    Ku, Wei-Shinn
    Wang, Haixun
    Tang, Liang
    Sun, Min-Te
    2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2013, : 290 - 301