NONPARAMETRIC CLASSIFICATION WITH MISSING DATA

被引:0
|
作者
Sell, Torben [1 ,2 ]
Berrett, Thomas b. [3 ]
Cannings, Timothy i. [1 ,2 ]
机构
[1] Univ Edinburgh, Sch Math, Edinburgh, Scotland
[2] Univ Edinburgh, Maxwell Inst Math Sci, Edinburgh, Scotland
[3] Univ Warwick, Dept Stat, Coventry, England
来源
ANNALS OF STATISTICS | 2024年 / 52卷 / 03期
基金
英国工程与自然科学研究理事会;
关键词
Missing data; classification; minimax; MINIMAX RATE; DISCRIMINATION;
D O I
10.1214/24-AOS2389
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We introduce a new nonparametric framework for classification problems in the presence of missing data. The key aspect of our framework is that the regression function decomposes into an anova-type sum of orthogonal functions, of which some (or even many) may be zero. Working under a general missingness setting, which allows features to be missing not at random, our main goal is to derive the minimax rate for the excess risk in this problem. In addition to the decomposition property, the rate depends on parameters that control the tail behaviour of the marginal feature distributions, the smoothness of the regression function and a margin condition. The ambient data dimension does not appear in the minimax rate, which can therefore be faster than in the classical nonparametric setting. We further propose a sifier, based on a careful combination of a k-nearest neighbour algorithm and a thresholding step. The HAM classifier attains the minimax rate up to polylogarithmic factors and numerical experiments further illustrate its utility.
引用
收藏
页码:1178 / 1200
页数:23
相关论文
共 50 条
  • [31] Curve fitting and jump detection on nonparametric regression with missing data
    Li, Qianyi
    Li, Jianbo
    Cheng, Yongran
    Zhang, Riquan
    JOURNAL OF APPLIED STATISTICS, 2023, 50 (04) : 963 - 983
  • [32] DOUBLY ROBUST NONPARAMETRIC MULTIPLE IMPUTATION FOR IGNORABLE MISSING DATA
    Long, Qi
    Hsu, Chiu-Hsieh
    Li, Yisheng
    STATISTICA SINICA, 2012, 22 (01) : 149 - 172
  • [33] Nonparametric spectral analysis with missing data via the em algorithm
    Li, J
    Wang, YW
    Stoica, P
    Marzetta, TL
    CONFERENCE RECORD OF THE THIRTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2004, : 8 - 12
  • [34] Nonparametric spectral analysis with missing data via the EM algorithm
    Wang, YW
    Stoica, P
    Li, J
    Marzetta, TL
    DIGITAL SIGNAL PROCESSING, 2005, 15 (02) : 191 - 206
  • [35] A Novel Nonparametric Multiple Imputation Algorithm for Estimating Missing Data
    Gheyas, Iffat A.
    Smith, Leslie S.
    WORLD CONGRESS ON ENGINEERING 2009, VOLS I AND II, 2009, : 1281 - 1286
  • [36] A Nonparametric Test of Missing Completely at Random for Incomplete Multivariate Data
    Li, Jun
    Yu, Yao
    PSYCHOMETRIKA, 2015, 80 (03) : 707 - 726
  • [37] Simultaneous confidence bands for nonparametric regression with missing covariate data
    Li Cai
    Lijie Gu
    Qihua Wang
    Suojin Wang
    Annals of the Institute of Statistical Mathematics, 2021, 73 : 1249 - 1279
  • [38] Simultaneous confidence bands for nonparametric regression with missing covariate data
    Cai, Li
    Gu, Lijie
    Wang, Qihua
    Wang, Suojin
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2021, 73 (06) : 1249 - 1279
  • [39] Fast nonparametric classification based on data depth
    Lange, Tatjana
    Mosler, Karl
    Mozharovskyi, Pavlo
    STATISTICAL PAPERS, 2014, 55 (01) : 49 - 69
  • [40] Ovarian cancer classification with missing data
    Renz, C
    Rajapakse, JC
    Razvi, K
    Liang, SKC
    ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE, 2002, : 809 - 813