Compensation strategy of unseen feature words in naive Bayes text classification

被引：0

作者：

School of Management, Harbin Institute of Technology, Harbin 150001, China ^{[1
]}

不详 ^{[2
]}

机构：

来源：

Harbin Gongye Daxue Xuebao | 2008年 / 6卷 / 956-960期

关键词：

Compensation strategy - Data smoothing - Feature words - Maximum entropy modeling - Naive Bayes classification - Smoothing algorithms - Statistical language modeling - Text classification;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

When applied to deal with text classification task, naive Bayes is always suffered from the unseen feature words problem. Moreover, this problem is hardly to be solved by expanding the corpora for there is the sparse data problem in the corpora, in which the distribution of words complies with Zipf law. Inspired by statistical language model, a novel approach is proposed, which applies the smoothing algorithms to naive Bayes for text classification task to overcome the unseen feature words problem. The experimental corpora come from the data in National 863 Evaluation on text classification, and in the open test with removing the stop words, the naive Bayes performance with Good-Turing algorithm is 3.05% higher than that with Laplace, and 1.00% higher than that with Lidstone. And in the experiment with cross entropy extracting feature words, the performance of naive Bayes with Good-Turing algorithm is even 1.95% higher than that of Maximum Entropy model. The smoothing algorithm is helpful to solve the unseen feature words problem due to the sparse data.

引用

共 50 条

[1] Feature selection for text classification with Naive Bayes
Chen, Jingnian
Huang, Houkuan
Tian, Shengfeng
Qu, Youli
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
[2] Text Classification Based on Naive Bayes Algorithm with Feature Selection
Chen, Zhenguo
Shi, Guang
Wang, Xiaoju
INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (10): : 4255 - 4260
[3] DEEP FEATURE WEIGHTING IN NAIVE BAYES FOR CHINESE TEXT CLASSIFICATION
Jiang, Qiaowei
Wang, Wen
Han, Xu
Zhang, Shasha
Wang, Xinyan
Wang, Cong
PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 160 - 164
[4] Feature subset selection using naive Bayes for text classification
Feng, Guozhong
Guo, Jianhua
Jing, Bing-Yi
Sun, Tieli
PATTERN RECOGNITION LETTERS, 2015, 65 : 109 - 115
[5] Deep feature weighting for naive Bayes and its application to text classification
Jiang, Liangxiao
Li, Chaoqun
Wang, Shasha
Zhang, Lungan
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2016, 52 : 26 - 39
[6] Divergence-Based Feature Selection for Naive Bayes Text Classification
Wang, Huizhen
Zhu, Jingbo
Su, Keh-Yih
IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 209 - +
[7] An Improvement to Naive Bayes for Text Classification
Zhang, Wei
Gao, Feng
CEIS 2011, 2011, 15
[8] Discrimination-based feature selection for multinomial naive Bayes text classification
Zhu, Jingbo
Wang, Huizhen
Zhang, Xijuan
COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 149 - +
[9] Adapting Hidden Naive Bayes for Text Classification
Gan, Shengfeng
Shao, Shiqi
Chen, Long
Yu, Liangjun
Jiang, Liangxiao
MATHEMATICS, 2021, 9 (19)
[10] Adapting naive Bayes tree for text classification
Shasha Wang
Liangxiao Jiang
Chaoqun Li
Knowledge and Information Systems, 2015, 44 : 77 - 89

← 1 2 3 4 5 →