Testing the statistical significance of an ultra-high-dimensional naive Bayes classifier

被引:0
|
作者
An, Baiguo [1 ]
Wang, Hansheng [1 ]
Guo, Jianhua [1 ]
机构
[1] Peking Univ, Guanghua Sch Management, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Binary Predictor; Hypothesis Testing; Naive Bayes; Supervised Learning; Text Classification; Ultra-High-Dimensional Data; SELECTION;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The naive Bayes approach is one of the most popular methods used for classification. Nevertheless, how to test its statistical significance under an ultra-high-dimensional (UHD) setup is not well understood. To fill this important theoretical gap, we propose a novel testing statistic with a standard normal asymptotic null distribution, even if the predictor dimension is considerably larger than the sample size. This makes the proposed method useful for UHD data analysis. Simulation studies are presented to demonstrate its finite sample performance and a text classification example is described for illustration.
引用
收藏
页码:223 / 229
页数:7
相关论文
共 50 条
  • [31] Ultra-high-dimensional feature screening of binary categorical response data based on Jensen-Shannon divergence
    Jiang, Qingqing
    Deng, Guangming
    AIMS MATHEMATICS, 2024, 9 (02): : 2874 - 2907
  • [32] Naive Bayes combined with partial least squares for classification of high dimensional microarray data
    Mehmood, Tahir
    Kanwal, Arzoo
    Butt, Muhammad Moeen
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2022, 222
  • [33] The VAE-FastGA anomaly detection model based on subspace and weakly correlated ultra-high-dimensional data
    Wan, Junhang
    Chen, Yanping
    Gao, Cong
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (03):
  • [34] A Naive-Bayes Approach to Bolstered Error Estimation in High-Dimensional Spaces
    Jiang, Xingde
    Braga-Neto, Ulisses
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 1398 - 1401
  • [35] Intelligent diagnosis of flip chip solder bumps using high-frequency ultrasound and a naive Bayes classifier
    Su, Lei
    Liao, Guanglan
    Shi, Tielin
    Zhang, Yichun
    INSIGHT, 2018, 60 (05) : 264 - 269
  • [36] Statistical significance in high-dimensional linear models
    Buehlmann, Peter
    BERNOULLI, 2013, 19 (04) : 1212 - 1242
  • [37] CORRELATION ANALYSIS ALGORITHM FOR MASSIVE ULTRA-HIGH-DIMENSIONAL BREAST ULTRASOUND RADIOMICS FEATURE DATA IN A DISTRIBUTED ENVIRONMENT
    Tang, Yuehong
    Chen, Yan
    Liu, Wen
    Gu, Zheng
    Yao, Hui
    COMPUTING AND INFORMATICS, 2024, 43 (03) : 756 - 776
  • [38] Nonparametric Bayes multiresolution testing for high-dimensional rare events
    Datta, Jyotishka
    Banerjee, Sayantan
    Dunson, David B.
    JOURNAL OF NONPARAMETRIC STATISTICS, 2024,
  • [39] Testing high dimensional covariance matrices via posterior Bayes factor
    Wang, Zhendong
    Xu, Xingzhong
    JOURNAL OF MULTIVARIATE ANALYSIS, 2021, 181
  • [40] Landslide spatial modelling using novel bivariate statistical based Naive Bayes, RBF Classifier, and RBF Network machine learning algorithms
    He, Qingfeng
    Shahabi, Himan
    Shirzadi, Ataollah
    Li, Shaojun
    Chen, Wei
    Wang, Nianqin
    Chai, Huichan
    Bian, Huiyuan
    Ma, Jianquan
    Chen, Yingtao
    Wang, Xiaojing
    Chapi, Kamran
    Bin Ahmad, Baharin
    SCIENCE OF THE TOTAL ENVIRONMENT, 2019, 663 : 1 - 15