A Comparative Approach to Email Classification Using Naive Bayes Classifier and Hidden Markov Model

被引:0
|
作者
Gomes, Sebastian Romv [1 ]
Saroar, Sk Golam [1 ]
Telot, Md Mosfaiul Alam [1 ]
Khan, Behroz Newaz [1 ]
Chakrabarty, Amitabha [1 ]
Mostakim, Moin [1 ]
机构
[1] BRAC Univ, Dept Comp Sci & Engn, 66 Bir Uttam AK Khandakar Rd, Dhaka 1212, Bangladesh
关键词
Email Classification; Hidden Markov Model; Naive Bayes; Natural Language Processing; NLTK; Supervised Learning;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This research investigates a comparison between two different approaches for classifying emails based on their categories. Naive Bayes and Hidden Markov Model (HMM), two different machine learning algorithms, both have been used for detecting whether an email is important or spam. Naive Bayes Classifier is based on conditional probabilities. It is fast and works great with small dataset. It considers independent words as a feature. HMM is a generative, probabilistic model that provides us with distribution over the sequences of observations. HMMs can handle inputs of variable length and help programs come to the most likely decision, based on both previous decisions and current data. Various combinations of NLP techniques-stopwords removing, stemming, lemmatizing have been tried on both the algorithms to inspect the differences in accuracy as well as to find the best method among them.
引用
收藏
页码:482 / 487
页数:6
相关论文
共 50 条
  • [21] Classification of Citizen Tweets Using Naive Bayes Classifier for Predictive Public Complaints
    Suryotrisongko, Hatma
    Suryadi, Oky
    Mustaqim, Achmad Farhan
    Tjahyanto, Aris
    2018 IEEE 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION AND INFORMATION SYSTEMS (ICCIS), 2018, : 177 - 182
  • [22] Scalable Sentiment Classification for Big Data Analysis Using Naive Bayes Classifier
    Liu, Bingwei
    Blasch, Erik
    Chen, Yu
    Shen, Dan
    Chen, Genshe
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [23] An Adaptive Method of PCA for Minimization of Classification Error Using Naive Bayes Classifier
    Kumar, Devesh
    Singh, Ravinder
    Kumar, Abhishek
    Sharma, Nagesh
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON ECO-FRIENDLY COMPUTING AND COMMUNICATION SYSTEMS, 2015, 70 : 9 - 15
  • [24] Cardiac arrhythmia classification using Wavelets and Hidden Markov Models - A comparative approach
    Gomes, Pedro R.
    Soares, Filomena O.
    Correia, J. H.
    Lima, C. S.
    2009 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-20, 2009, : 4727 - +
  • [25] A Model for Accurate Prediction in GeoRSS Data Using Naive Bayes Classifier
    Netti, K.
    Radhika, Y.
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2017, 76 (08): : 473 - 476
  • [26] Topic document model approach for naive Bayes text classification
    Kim, SB
    Rim, HC
    Kim, JD
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1091 - 1094
  • [27] A comparative study of Naive Bayes classifier and Bayes net classifier for fault diagnosis of monoblock centrifugal pump using wavelet analysis
    Muralidharan, V.
    Sugumaran, V.
    APPLIED SOFT COMPUTING, 2012, 12 (08) : 2023 - 2029
  • [28] Naive Bayes Approach for Website Classification
    Rajalakshmi, R.
    Aravindan, C.
    INFORMATION TECHNOLOGY AND MOBILE COMMUNICATION, 2011, 147 : 323 - 326
  • [29] Heart disease prediction system based on hidden naive bayes classifier
    Jabbar, M. A.
    Samreen, Shirina
    2016 INTERNATIONAL CONFERENCE ON CIRCUITS, CONTROLS, COMMUNICATIONS AND COMPUTING (I4C), 2016,
  • [30] Email Spam Detection using integrated approach of Naive Bayes and Particle Swarm Optimization
    Agarwal, Kriti
    Kumar, Tarun
    PROCEEDINGS OF THE 2018 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS), 2018, : 685 - 690