Ultra-high-dimensional feature screening of binary categorical response data based on Jensen-Shannon divergence

被引:0
|
作者
Jiang, Qingqing [1 ]
Deng, Guangming [1 ,2 ]
机构
[1] Guilin Univ Technol, Sch Math & Stat, Guilin 541000, Guangxi, Peoples R China
[2] Guilin Univ Technol, Appl Stat Inst, Guilin 541000, Guangxi, Peoples R China
来源
AIMS MATHEMATICS | 2024年 / 9卷 / 02期
基金
中国国家自然科学基金;
关键词
ultra-high-dimensional; binary categorical; Jensen-Shannon divergence; model-free; feature screening; SELECTION; MODELS;
D O I
10.3934/math.2024142
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Currently, most of the ultra-high-dimensional feature screening methods for categorical data are based on the correlation between covariates and response variables, using some statistics as the screening index to screen important covariates. Thus, with the increasing number of data types and model availability limitations, there may be a potential problem with the existence of a class of unimportant covariates that are also highly correlated with the response variable due to their high correlation with the other covariates. To address this issue, in this paper, we establish a model-free feature screening procedure for binary categorical response variables from the perspective of the contribution of features to classification. The idea is to introduce the Jensen-Shannon divergence to measure the difference between the conditional probability distributions of the covariates when the response variables take on different values. The larger the value of the Jensen-Shannon divergence, the stronger the covariate's contribution to the classification of the response variable, and the more important the covariate is. We propose two kinds of model-free ultra-high-dimensional feature screening methods for binary response data. Meanwhile, the methods are suitable for continuous or categorical covariates. When the numbers of covariate categories are the same, the feature screening is based on traditional Jensen-Shannon divergence. When the numbers of covariate categories are different, the Jensen-Shannon divergence is adjusted using the logarithmic factor of the number of categories. We theoretically prove that the proposed methods have sure screening and ranking consistency properties, and through simulations and real data analysis, we demonstrate that, in feature screening, the approaches proposed in this paper have the advantages of effectiveness, stability, and less computing time compared with an existing method.
引用
收藏
页码:2874 / 2907
页数:34
相关论文
共 48 条
  • [1] Feature Selection Stability Assessment Based on the Jensen-Shannon Divergence
    Guzman-Martinez, Roberto
    Alaiz-Rodriguez, Rocio
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 597 - 612
  • [2] Optimizing Data Distributions Based on Jensen-Shannon Divergence for Federated Learning
    Hu, Zhiyao
    Li, Dongsheng
    Yang, Ke
    Xu, Ying
    Peng, Baoyun
    TSINGHUA SCIENCE AND TECHNOLOGY, 2025, 30 (02): : 670 - 681
  • [3] Fault Detection Based on Multi-Dimensional KDE and Jensen-Shannon Divergence
    Wei, Juhui
    He, Zhangming
    Wang, Jiongqi
    Wang, Dayi
    Zhou, Xuanying
    ENTROPY, 2021, 23 (03) : 1 - 24
  • [4] Outlier mining in high-dimensional data using the Jensen-Shannon divergence and graph structure analysis
    Toledo, Alex S. O.
    Silini, Riccardo
    Carpi, Laura C.
    Masoller, Cristina
    JOURNAL OF PHYSICS-COMPLEXITY, 2022, 3 (04):
  • [5] Differential Privacy Preserving Dynamic Data Release Scheme Based on Jensen-Shannon Divergence
    Ying Cai
    Yu Zhang
    Jingjing Qu
    Wenjin Li
    China Communications, 2022, 19 (06) : 11 - 21
  • [6] Feature screening for ultra-high-dimensional data via multiscale graph correlation
    Deng, Luojia
    Wu, Jinhai
    Zhang, Bin
    Zhang, Yue
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2024, 53 (22) : 7942 - 7979
  • [7] Differential privacy preserving dynamic data release scheme based on Jensen-Shannon divergence
    Cai, Ying
    Zhang, Yu
    Qu, Jingjing
    Li, Wenjin
    CHINA COMMUNICATIONS, 2022, 19 (06) : 11 - 21
  • [8] Incipient fault detection and estimation based on Jensen-Shannon divergence in a data-driven approach
    Zhang, Xiaoxia
    Delpha, Claude
    Diallo, Demba
    SIGNAL PROCESSING, 2020, 169
  • [9] Group screening for ultra-high-dimensional feature under linear model
    Niu, Yong
    Zhang, Riquan
    Liu, Jicai
    Li, Huapeng
    STATISTICAL THEORY AND RELATED FIELDS, 2020, 4 (01) : 43 - 54
  • [10] A Novel Sequence-Based Feature for the Identification of DNA-Binding Sites in Proteins Using Jensen-Shannon Divergence
    Dang, Truong Khanh Linh
    Meckbach, Cornelia
    Tacke, Rebecca
    Waack, Stephan
    Gueltas, Mehmet
    ENTROPY, 2016, 18 (10)