Efficient email classification approach based on semantic methods

被引：26

作者：

Bahgat, Eman M. ^{[1
]}

Rady, Sherine ^{[1
]}

Gad, Walaa ^{[1
]}

Moawad, Ibrahim F. ^{[1
]}

机构：

[1] Ain Shams Univ, Fac Comp & Informat Sci, Cairo, Egypt

来源：

AIN SHAMS ENGINEERING JOURNAL | 2018年 / 9卷 / 04期

关键词：

Email classification; Spam; WordNet ontology; Semantic similarity; Features reduction;

D O I：

10.1016/j.asej.2018.06.001

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Emails have become one of the major applications in daily life. The continuous growth in the number of email users has led to a massive increase of unsolicited emails, which are also known as spam emails. Managing and classifying this huge number of emails is an important challenge. Most of the approaches introduced to solve this problem handled the high dimensionality of emails by using syntactic feature selection. In this paper, an efficient email filtering approach based on semantic methods is addressed. The proposed approach employs the WordNet ontology and applies different semantic based methods and similarity measures for reducing the huge number of extracted textual features, and hence the space and time complexities are reduced. Moreover, to get the minimal optimal features' set, feature dimensionality reduction has been integrated using feature selection techniques such as the Principal Component Analysis (PCA) and the Correlation Feature Selection (CFS). Experimental results on the standard benchmark Enron Dataset showed that the proposed semantic filtering approach combined with the feature selection achieves high computational performance at high space and time reduction rates. A comparative study for several classification algorithms indicated that the Logistic Regression achieves the highest accuracy compared to Naive Bayes, Support Vector Machine, J48, Random Forest, and radial basis function networks. By integrating the CFS feature selection technique, the average recorded accuracy for the all used algorithms is above 90%, with more than 90% feature reduction. Besides, the conducted experiments showed that the proposed work has a highly significant performance with higher accuracy and less time compared to other related works. (C) 2018 Production and hosting by Elsevier B.V. on behalf of Ain Shams University.

引用

页码：3259 / 3269

页数：11

共 50 条

[21] Research of english text classification methods based on semantic meaning
Lv, L
Liu, YS
ENABLING TECHNOLOGIES FOR THE NEW KNOWLEDGE SOCIETY, 2005, : 689 - 700
[22] Adaptive Machine Learning Approach for Emotional Email Classification
Karthik, K.
Ponnusamy, R.
HUMAN-COMPUTER INTERACTION: TOWARDS MOBILE AND INTELLIGENT INTERACTION ENVIRONMENTS, PT III, 2011, 6763 : 552 - 558
[23] An empirical evaluation for feature selection methods in phishing email classification
2013, CRL Publishing (28):
[24] A mapreduce based parallel SVM for email classification
Xu, Ke
Wen, Cui
Yuan, Qiong
He, Xiangzhu
Tie, Jun
Journal of Networks, 2014, 9 (06) : 1640 - 1647
[25] Content-Based Email Classification at Scale
Early, Kirstin
O'Hare, Neil
LuVogt, Christopher
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 4559 - 4566
[26] eMailSift: Email classification based on structure and content
Aery, M
Chakravarthy, S
FIFTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2005, : 18 - 25
[27] An Approach of Semantic Web Service Classification Based on Naive Bayes
Liu, Jianxiao
Tian, Zonglin
Liu, Panbiao
Jiang, Jiawei
Li, Zhao
PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2016), 2016, : 356 - 362
[28] A semantic-based classification approach for an enhanced spam detection
Saidani, Nadjate
Adi, Kamel
Allili, Mohand Said
COMPUTERS & SECURITY, 2020, 94
[29] A cluster-based classification approach to semantic role labeling
Ozgencil, Necati E.
McCracken, Nancy
Mehrotra, Kishan
NEW FRONTIERS IN APPLIED ARTIFICIAL INTELLIGENCE, 2008, 5027 : 265 - 275
[30] Context-based email classification model
Wasi, Shaukat
Jami, Syed Imran
Shaikh, Zubair Ahmed
EXPERT SYSTEMS, 2016, 33 (02) : 129 - 144

← 1 2 3 4 5 →