Efficient email classification approach based on semantic methods

被引:26
|
作者
Bahgat, Eman M. [1 ]
Rady, Sherine [1 ]
Gad, Walaa [1 ]
Moawad, Ibrahim F. [1 ]
机构
[1] Ain Shams Univ, Fac Comp & Informat Sci, Cairo, Egypt
关键词
Email classification; Spam; WordNet ontology; Semantic similarity; Features reduction;
D O I
10.1016/j.asej.2018.06.001
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Emails have become one of the major applications in daily life. The continuous growth in the number of email users has led to a massive increase of unsolicited emails, which are also known as spam emails. Managing and classifying this huge number of emails is an important challenge. Most of the approaches introduced to solve this problem handled the high dimensionality of emails by using syntactic feature selection. In this paper, an efficient email filtering approach based on semantic methods is addressed. The proposed approach employs the WordNet ontology and applies different semantic based methods and similarity measures for reducing the huge number of extracted textual features, and hence the space and time complexities are reduced. Moreover, to get the minimal optimal features' set, feature dimensionality reduction has been integrated using feature selection techniques such as the Principal Component Analysis (PCA) and the Correlation Feature Selection (CFS). Experimental results on the standard benchmark Enron Dataset showed that the proposed semantic filtering approach combined with the feature selection achieves high computational performance at high space and time reduction rates. A comparative study for several classification algorithms indicated that the Logistic Regression achieves the highest accuracy compared to Naive Bayes, Support Vector Machine, J48, Random Forest, and radial basis function networks. By integrating the CFS feature selection technique, the average recorded accuracy for the all used algorithms is above 90%, with more than 90% feature reduction. Besides, the conducted experiments showed that the proposed work has a highly significant performance with higher accuracy and less time compared to other related works. (C) 2018 Production and hosting by Elsevier B.V. on behalf of Ain Shams University.
引用
收藏
页码:3259 / 3269
页数:11
相关论文
共 50 条
  • [1] Spam email classification and sentiment analysis based on semantic similarity methods
    Srinivasarao, Ulligaddala
    Sharaff, Aakanksha
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2023, 26 (01) : 65 - 77
  • [2] HOLMES: An Efficient and Lightweight Semantic Based Anomalous Email Detector
    Wu, Peilun
    Guo, Hui
    2022 IEEE INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, 2022, : 1360 - 1367
  • [3] Latent Semantic Indexing Based SVM Model for Email Spam Classification
    Renuka, Karthika D.
    Visalakshi, P.
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2014, 73 (07): : 437 - 442
  • [4] Effective Methods for Email Classification: Is it a Business or Personal Email?
    Sosic, Milena
    Graovac, Jelena
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2022, 19 (03) : 1155 - 1175
  • [5] Email classification Using Semantic Feature Space
    Yi, Yun Fei
    Li, Cheng Hua
    Song, Wei
    ALPIT 2008: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED LANGUAGE PROCESSING AND WEB INFORMATION TECHNOLOGY, PROCEEDINGS, 2008, : 32 - +
  • [6] An Optimized Approach for Detection and Classification of Spam Email's Using Ensemble Methods
    Fatima, Rubab
    Fareed, Mian Muhammad Sadiq
    Ullah, Saleem
    Ahmad, Gulnaz
    Mahmood, Saqib
    WIRELESS PERSONAL COMMUNICATIONS, 2024, 139 (01) : 347 - 373
  • [7] An efficient Wikipedia semantic matching approach to text document classification
    Wu, Zongda
    Zhu, Hui
    Li, Guiling
    Cui, Zongmin
    Huang, Hui
    Li, Jun
    Chen, Enhong
    Xu, Guandong
    INFORMATION SCIENCES, 2017, 393 : 15 - 28
  • [8] An Efficient Approach for Semantic Relatedness Evaluation based on Semantic Neighborhood
    Lopes, Alcides
    Alvarenga, Renata
    Carbonera, Joel
    Abel, Mara
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 316 - 323
  • [9] Combining neural networks and semantic feature space for email classification
    Yu, Bo
    Zhu, Dong-hua
    KNOWLEDGE-BASED SYSTEMS, 2009, 22 (05) : 376 - 381
  • [10] An efficient ir approach based semantic segmentation
    Achref Ouni
    Thierry Chateau
    Eric Royer
    Marc Chevaldonné
    Michel Dhome
    Multimedia Tools and Applications, 2023, 82 : 10145 - 10163