Combining Naive Bayes and Tri-gram Language Model for Spam Filtering

被引:0
|
作者
Ma, Xi [1 ]
Shen, Yao [1 ]
Chen, Junbo [2 ]
Xue, Guirong [2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[2] Alibaba Cloud Comp, Hangzhou 310012, Peoples R China
来源
关键词
Naive Bayes; tri-gram; email anti-spam; machine learning; statistical approach;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The increasing volume of bulk unsolicited emails (also known as spam) brings huge damage to email service providers and inconvenience to individual users. Among the approaches to stop spam, Naive Bayes filter is very popular. In this paper, we propose the standard Naive Bayes combining with a In-grain language model, namely TGNB model to filter spam emails. The TGNB model solves the problem of strong independence assumption of standard Naive Bayes model. Our experiment results on three public datasets indicate that the TGNB model can achieve higher spam recall and lower false positive, and even achieve better performance than support vector machine method which is state-of-the-art on all the three datasets.
引用
收藏
页码:509 / +
页数:3
相关论文
共 22 条
  • [1] Understanding of the Naive Bayes Classifier in Spam Filtering
    Wei, Qijia
    6TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION (CDMMS 2018), 2018, 1967
  • [2] Spam Filtering:Online Naive Bayes Based on TONE
    Guanglu Sun
    Hongyue Sun
    Yingcai Ma
    Yuewu Shen
    ZTECommunications, 2013, 11 (02) : 51 - 54
  • [3] Combining naive Bayes and n-gram language models for text classification
    Peng, FC
    Schuurmans, D
    ADVANCES IN INFORMATION RETRIEVAL, 2003, 2633 : 335 - 350
  • [4] Spam Filtering using Association Rules and Naive Bayes Classifier
    Yang, Tianda
    Qian, Kai
    Lo, Dan Chia-Tien
    Al Nasr, Kamal
    Qian, Ying
    PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATCS AND COMPUTING (IEEE PIC), 2015, : 638 - 642
  • [5] Web Service-enabled Spam Filtering with Naive Bayes Classification
    You, Wanqing
    Qian, Kai
    Lo, Dan
    Bhattacharya, Prahir
    Guo, Minzhe
    Qian, Ying
    2015 IEEE FIRST INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (BIGDATASERVICE 2015), 2015, : 99 - 104
  • [6] Word Embedding based Multinomial Naive Bayes Algorithm for Spam Filtering
    Kadam, Sumedh
    Gala, Aayush
    Gehlot, Pritesh
    Kurup, Aditya
    Ghag, Kranti
    2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [7] Label flipping attacks against Naive Bayes on spam filtering systems
    Hongpo Zhang
    Ning Cheng
    Yang Zhang
    Zhanbo Li
    Applied Intelligence, 2021, 51 : 4503 - 4514
  • [8] Label flipping attacks against Naive Bayes on spam filtering systems
    Zhang, Hongpo
    Cheng, Ning
    Zhang, Yang
    Li, Zhanbo
    APPLIED INTELLIGENCE, 2021, 51 (07) : 4503 - 4514
  • [9] A Support Vector Machine based Naive Bayes Algorithm for Spam Filtering
    Feng, Weimiao
    Sun, Jianguo
    Zhang, Liguo
    Cao, Cuiling
    Yang, Qing
    2016 IEEE 35TH INTERNATIONAL PERFORMANCE COMPUTING AND COMMUNICATIONS CONFERENCE (IPCCC), 2016,
  • [10] REVISED NAIVE BAYES CL ASSIFIER FOR COMBATING THE FOCUS ATTACK IN SPAM FILTERING
    Peng, Junyan
    Chan, Patrick P. K.
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 610 - 614