Adaptive threshold-based classification of sparse high-dimensional data

被引:0
|
作者
Pavlenko, Tatjana [1 ]
Stepanova, Natalia [2 ]
Thompson, Lee [2 ]
机构
[1] Uppsala Univ, Dept Stat, Box 513, S-75120 Uppsala, Sweden
[2] Carleton Univ, Sch Math & Stat, 1125 Colonel By Dr, Ottawa, ON K1S 5B6, Canada
来源
ELECTRONIC JOURNAL OF STATISTICS | 2022年 / 16卷 / 01期
基金
加拿大自然科学与工程研究理事会;
关键词
High-dimensional data; sparse vectors; adaptive threshold-based classification; asymptotically optimal classifier; HIGHER CRITICISM; SELECTION;
D O I
10.1214/22-EJS1998
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We revisit the problem of designing an efficient binary classifier in a challenging high-dimensional framework. The model under study assumes some local dependence structure among feature variables represented by a block-diagonal covariance matrix with a growing number of blocks of an arbitrary, but fixed size. The blocks correspond to non-overlapping independent groups of strongly correlated features. To assess the relevance of a particular block in predicting the response, we introduce a measure of "signal strength" pertaining to each feature block. This measure is then used to specify a sparse model of our interest. We further propose a threshold-based feature selector which operates as a screen-and-clean scheme integrated into a linear classifier: the data is subject to screening and hard threshold cleaning to filter out the blocks that contain no signals. Asymptotic properties of the proposed classifiers are studied when the sample size n depends on the number of feature blocks b, and the sample size goes to infinity with b at a slower rate than b. The new classifiers, which are fully adaptive to unknown parameters of the model, are shown to perform asymptotically optimally in a large part of the classification region. The numerical study confirms good analytical properties of the new classifiers that compare favorably to the existing threshold-based procedure used in a similar context.
引用
收藏
页码:1952 / 1996
页数:45
相关论文
共 50 条
  • [31] Robust Classification of High-Dimensional Data Using Data-Adaptive Energy Distance
    Choudhury, Jyotishka Ray
    Saha, Aytijhya
    Roy, Sarbojit
    Dutta, Subhajit
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT V, 2023, 14173 : 86 - 101
  • [32] Adaptive Threshold-based Sparse Representation Network for Image Compressive Sensing Reconstruction
    Xuan, Yunyi
    Yang, Chunling
    Yang, Xin
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [33] Sparse bayesian kernel multinomial probit regression model for high-dimensional data classification
    Yang, Aijun
    Jiang, Xuejun
    Shu, Lianjie
    Liu, Pengfei
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2019, 48 (01) : 165 - 176
  • [34] Hierarchical classification of microorganisms based on high-dimensional phenotypic data
    Tafintseva, Valeria
    Vigneau, Evelyne
    Shapaval, Volha
    Cariou, Veronique
    Qannari, El Mostafa
    Kohler, Achim
    JOURNAL OF BIOPHOTONICS, 2018, 11 (03)
  • [35] Adaptive threshold-based block classification in medical image compression for teleradiology
    Singh, Sukhwinder
    Kumar, Vinod
    Verma, H. K.
    COMPUTERS IN BIOLOGY AND MEDICINE, 2007, 37 (06) : 811 - 819
  • [36] Adaptive Subspace Optimization Ensemble Method for High-Dimensional Imbalanced Data Classification
    Xu, Yuhong
    Yu, Zhiwen
    Chen, C. L. Philip
    Liu, Zhulin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2284 - 2297
  • [37] Principal component analysis for sparse high-dimensional data
    Raiko, Tapani
    Ilin, Alexander
    Karhunen, Juha
    NEURAL INFORMATION PROCESSING, PART I, 2008, 4984 : 566 - 575
  • [38] Sparse kernel methods for high-dimensional survival data
    Evers, Ludger
    Messow, Claudia-Martina
    BIOINFORMATICS, 2008, 24 (14) : 1632 - 1638
  • [39] Sparse meta-analysis with high-dimensional data
    He, Qianchuan
    Zhang, Hao Helen
    Avery, Christy L.
    Lin, D. Y.
    BIOSTATISTICS, 2016, 17 (02) : 205 - 220
  • [40] Efficient Sparse Representation for Learning With High-Dimensional Data
    Chen, Jie
    Yang, Shengxiang
    Wang, Zhu
    Mao, Hua
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 4208 - 4222