Adaptive threshold-based classification of sparse high-dimensional data

被引：0

作者：

Pavlenko, Tatjana ^{[1
]}

Stepanova, Natalia ^{[2
]}

Thompson, Lee ^{[2
]}

机构：

[1] Uppsala Univ, Dept Stat, Box 513, S-75120 Uppsala, Sweden

[2] Carleton Univ, Sch Math & Stat, 1125 Colonel By Dr, Ottawa, ON K1S 5B6, Canada

来源：

ELECTRONIC JOURNAL OF STATISTICS | 2022年 / 16卷 / 01期

基金：

加拿大自然科学与工程研究理事会;

关键词：

High-dimensional data; sparse vectors; adaptive threshold-based classification; asymptotically optimal classifier; HIGHER CRITICISM; SELECTION;

D O I：

10.1214/22-EJS1998

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We revisit the problem of designing an efficient binary classifier in a challenging high-dimensional framework. The model under study assumes some local dependence structure among feature variables represented by a block-diagonal covariance matrix with a growing number of blocks of an arbitrary, but fixed size. The blocks correspond to non-overlapping independent groups of strongly correlated features. To assess the relevance of a particular block in predicting the response, we introduce a measure of "signal strength" pertaining to each feature block. This measure is then used to specify a sparse model of our interest. We further propose a threshold-based feature selector which operates as a screen-and-clean scheme integrated into a linear classifier: the data is subject to screening and hard threshold cleaning to filter out the blocks that contain no signals. Asymptotic properties of the proposed classifiers are studied when the sample size n depends on the number of feature blocks b, and the sample size goes to infinity with b at a slower rate than b. The new classifiers, which are fully adaptive to unknown parameters of the model, are shown to perform asymptotically optimally in a large part of the classification region. The numerical study confirms good analytical properties of the new classifiers that compare favorably to the existing threshold-based procedure used in a similar context.

引用

页码：1952 / 1996

页数：45

共 50 条

[21] On the classification consistency of high-dimensional sparse neural network
Yang, Kaixu
Maiti, Taps
2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, : 173 - 182
[22] The adaptive lasso in high-dimensional sparse heteroscedastic models
Wagener J.
Dette H.
Mathematical Methods of Statistics, 2013, 22 (2) : 137 - 154
[23] High-Dimensional Adaptive Minimax Sparse Estimation With Interactions
Ye, Chenglong
Yang, Yuhong
IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (09) : 5367 - 5379
[24] A Sparse Genetic Algorithm to Solve Feature Selection of Sparse High-dimensional Data and Liver Totxicity Classification
Liu, Yu
Wang, Jie-Sheng
Wen, Jia-Yao
Li, Yu-Tong
Yan, Peng-Guo
ENGINEERING LETTERS, 2025, 33 (04) : 1045 - 1060
[25] Adaptive Testing for High-Dimensional Data
Zhang, Yangfan
Wang, Runmin
Shao, Xiaofeng
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2025,
[26] A classification algorithm for high-dimensional data
Roy, Asim
INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 2015, 53 : 345 - 355
[27] Unsupervised Feature Selection Based on Ultrametricity and Sparse Training Data: A Case Study for the Classification of High-Dimensional Hyperspectral Data
Bradley, Patrick Erik
Keller, Sina
Weinmann, Martin
REMOTE SENSING, 2018, 10 (10)
[28] Effective clustering algorithm for high-dimensional sparse data based on SOM
2013, Institute of Computer Science Izhevsk (23)
[29] EFFECTIVE CLUSTERING ALGORITHM FOR HIGH-DIMENSIONAL SPARSE DATA BASED ON SOM
Martinovic, Jan
Slaninova, Katerina
Vojacek, Lukas
Drazdilova, Pavla
Dvorsky, Jiri
Vondrak, Ivo
NEURAL NETWORK WORLD, 2013, 23 (02) : 131 - 147
[30] Graph-based sparse linear discriminant analysis for high-dimensional classification
Liu, Jianyu
Yu, Guan
Liu, Yufeng
JOURNAL OF MULTIVARIATE ANALYSIS, 2019, 171 : 250 - 269

← 1 2 3 4 5 →