Testing the statistical significance of an ultra-high-dimensional naive Bayes classifier

被引:0
|
作者
An, Baiguo [1 ]
Wang, Hansheng [1 ]
Guo, Jianhua [1 ]
机构
[1] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Binary Predictor; Hypothesis Testing; Naive Bayes; Supervised Learning; Text Classification; Ultra-High-Dimensional Data; SELECTION;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The naive Bayes approach is one of the most popular methods used for classification. Nevertheless, how to test its statistical significance under an ultra-high-dimensional (UHD) setup is not well understood. To fill this important theoretical gap, we propose a novel testing statistic with a standard normal asymptotic null distribution, even if the predictor dimension is considerably larger than the sample size. This makes the proposed method useful for UHD data analysis. Simulation studies are presented to demonstrate its finite sample performance and a text classification example is described for illustration.
引用
收藏
页码:223 / 229
页数:7
相关论文
共 50 条
  • [41] SCAD-penalized regression in additive partially linear proportional hazards models with an ultra-high-dimensional linear part
    Lian, Heng
    Li, Jianbo
    Tang, Xingyu
    JOURNAL OF MULTIVARIATE ANALYSIS, 2014, 125 : 50 - 64
  • [42] Multinomial naive Bayesian classifier with generalized Dirichlet priors for high-dimensional imbalanced data
    Wong, Tzu-Tsung
    Tsai, Hsing-Chen
    KNOWLEDGE-BASED SYSTEMS, 2021, 228
  • [43] Predicting compressive strength of ultra-high-performance concrete using Naive Bayes regression in novel approaches
    Zhao, Zheng
    MULTISCALE AND MULTIDISCIPLINARY MODELING EXPERIMENTS AND DESIGN, 2024, 7 (06) : 5235 - 5249
  • [44] Feature selection using a one dimensional naive Bayes' classifier increases the accuracy of support vector machine classification of CDR3 repertoires
    Cinelli, Mattia
    Sun, Yuxin
    Best, Katharine
    Heather, James M.
    Reich-Zeliger, Shlomit
    Shifrut, Eric
    Friedman, Nir
    Shawe-Taylor, John
    Chain, Benny
    BIOINFORMATICS, 2017, 33 (07) : 951 - 955
  • [45] 'SGoFicance Trace': Assessing Significance in High Dimensional Testing Problems
    de Una-Alvarez, Jacobo
    Carvajal-Rodriguez, Antonio
    PLOS ONE, 2010, 5 (12):
  • [46] Joint feature screening for ultra-high-dimensional sparse additive hazards model by the sparsity-restricted pseudo-score estimator
    Chen, Xiaolin
    Liu, Yi
    Wang, Qihua
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2019, 71 (05) : 1007 - 1031
  • [47] Joint feature screening for ultra-high-dimensional sparse additive hazards model by the sparsity-restricted pseudo-score estimator
    Xiaolin Chen
    Yi Liu
    Qihua Wang
    Annals of the Institute of Statistical Mathematics, 2019, 71 : 1007 - 1031
  • [48] Study on driving safety of vehicle-bridge interaction system based on ultra-high-dimensional point-selected extremum methods
    Zhou, Ziji
    Liu, Wei
    Tao, Qi
    Cheng, Zenong
    Zhang, Nan
    INTERNATIONAL JOURNAL OF RAIL TRANSPORTATION, 2024,
  • [49] On the statistical significance of the GZK feature in the spectrum of ultra-high energy cosmic rays
    De Marco, D
    Blasi, P
    Olinto, AV
    ASTROPARTICLE PHYSICS, 2003, 20 (01) : 53 - 65
  • [50] Statistical significance of variables driving systematic variation in high-dimensional data
    Chung, Neo Christopher
    Storey, John D.
    BIOINFORMATICS, 2015, 31 (04) : 545 - 554