Efficient email classification approach based on semantic methods

被引:26
|
作者
Bahgat, Eman M. [1 ]
Rady, Sherine [1 ]
Gad, Walaa [1 ]
Moawad, Ibrahim F. [1 ]
机构
[1] Ain Shams Univ, Fac Comp & Informat Sci, Cairo, Egypt
关键词
Email classification; Spam; WordNet ontology; Semantic similarity; Features reduction;
D O I
10.1016/j.asej.2018.06.001
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Emails have become one of the major applications in daily life. The continuous growth in the number of email users has led to a massive increase of unsolicited emails, which are also known as spam emails. Managing and classifying this huge number of emails is an important challenge. Most of the approaches introduced to solve this problem handled the high dimensionality of emails by using syntactic feature selection. In this paper, an efficient email filtering approach based on semantic methods is addressed. The proposed approach employs the WordNet ontology and applies different semantic based methods and similarity measures for reducing the huge number of extracted textual features, and hence the space and time complexities are reduced. Moreover, to get the minimal optimal features' set, feature dimensionality reduction has been integrated using feature selection techniques such as the Principal Component Analysis (PCA) and the Correlation Feature Selection (CFS). Experimental results on the standard benchmark Enron Dataset showed that the proposed semantic filtering approach combined with the feature selection achieves high computational performance at high space and time reduction rates. A comparative study for several classification algorithms indicated that the Logistic Regression achieves the highest accuracy compared to Naive Bayes, Support Vector Machine, J48, Random Forest, and radial basis function networks. By integrating the CFS feature selection technique, the average recorded accuracy for the all used algorithms is above 90%, with more than 90% feature reduction. Besides, the conducted experiments showed that the proposed work has a highly significant performance with higher accuracy and less time compared to other related works. (C) 2018 Production and hosting by Elsevier B.V. on behalf of Ain Shams University.
引用
收藏
页码:3259 / 3269
页数:11
相关论文
共 50 条
  • [31] Efficient classification based methods for global sensitivity analysis
    Reuter, Uwe
    Mehmood, Zeeshan
    Gebhardt, Clemens
    COMPUTERS & STRUCTURES, 2012, 110 : 79 - 92
  • [32] Analysis of web data classification methods based on semantic similarity measure
    Ramesh, Kante
    Mohanasundaram, R.
    INFORMATION SECURITY JOURNAL, 2023, 32 (05): : 315 - 330
  • [33] SENTIMENTAL CLASSIFICATION BASED ON KERNEL METHODS AND DOMAIN SEMANTIC ORIENTATION DICTIONARIES
    Quan, Changqin
    Ren, Fuji
    He, Tingting
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (06): : 2681 - 2690
  • [34] An Efficient Approach to Semantic Segmentation
    Gabriela Csurka
    Florent Perronnin
    International Journal of Computer Vision, 2011, 95 : 198 - 212
  • [35] Semanta - Semantic Email in Action
    Scerri, Simon
    Giurgiu, Ioana
    Davis, Brian
    Handschuh, Siegfried
    SEMANTIC WEB: RESEARCH AND APPLICATIONS, 2009, 5554 : 883 - 887
  • [36] An Efficient Approach to Semantic Segmentation
    Csurka, Gabriela
    Perronnin, Florent
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2011, 95 (02) : 198 - 212
  • [37] Social feature-based enterprise email classification without examining email contents
    Wang, Min-Feng
    Tsai, Meng-Feng
    Jheng, Sie-Long
    Tang, Cheng-Hsien
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2012, 35 (02) : 770 - 777
  • [38] Improved email classification through enhanced data preprocessing approach
    B. Aruna Kumara
    Mallikarjun M. Kodabagi
    Tanupriya Choudhury
    Jung-Sup Um
    Spatial Information Research, 2021, 29 : 247 - 255
  • [39] Extra-Tree Classifier with Metaheuristics Approach for Email Classification
    Sharaff, Aakanksha
    Gupta, Harshil
    ADVANCES IN COMPUTER COMMUNICATION AND COMPUTATIONAL SCIENCES, IC4S 2018, 2019, 924 : 189 - 197
  • [40] Improved email classification through enhanced data preprocessing approach
    Kumara, B. Aruna
    Kodabagi, Mallikarjun M.
    Choudhury, Tanupriya
    Um, Jung-Sup
    SPATIAL INFORMATION RESEARCH, 2021, 29 (02) : 247 - 255