Semi-supervised learning using frequent itemset and ensemble learning for SMS classification

被引:37
|
作者
Ahmed, Ishtiaq [1 ]
Ali, Rahman [1 ]
Guan, Donghai [2 ]
Lee, Young-Koo [1 ]
Lee, Sungyoung [1 ]
Chung, TaeChoong [1 ]
机构
[1] Kyung Hee Univ, Dept Comp Engn, Seoul, South Korea
[2] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
基金
新加坡国家研究基金会;
关键词
Short Message Service (SMS); Ham; Spam; Frequent itemset; Ensemble learning; Semi-supervised classification;
D O I
10.1016/j.eswa.2014.08.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Short Message Service (SMS) has become one of the most important media of communications due to the rapid increase of mobile users and it's easy to use operating mechanism. This flood of SMS goes with the problem of spam SMS that are generated by spurious users. The detection of spam SMS has gotten more attention of researchers in recent times and is treated with a number of different machine learning approaches. Supervised machine learning approaches, used so far, demands a large amount of labeled data which is not always available in real applications. The traditional semi-supervised methods can alleviate this problem but may not produce good results if they are provided with only positive and unlabeled data. In this paper, we have proposed a novel semi-supervised learning method which makes use of frequent itemset and ensemble learning (FIEL) to overcome this limitation. In this approach, Apriori algorithm has been used for finding the frequent itemset while Multinomial Naive Bayes, Random Forest and LibSVM are used as base learners for ensemble learning which uses majority voting scheme. Our proposed approach works well with small number of positive data and different amounts of unlabeled dataset with higher accuracy. Extensive experiments have been conducted over UCI SMS spam collection data set, SMS spam collection Corpus v.0.1 Small and Big which show significant improvements in accuracy with very small amount of positive data. We have compared our proposed FIEL approach with the existing SPY-EM and PEBL approaches and the results show that our approach is more stable than the compared approaches with minimum support. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1065 / 1073
页数:9
相关论文
共 50 条
  • [31] Semi-supervised learning for question classification in CQA
    Yiyang Li
    Lei Su
    Jun Chen
    Liwei Yuan
    Natural Computing, 2017, 16 : 567 - 577
  • [32] Semi-supervised tensor learning for image classification
    Jianguang Zhang
    Yahong Han
    Jianmin Jiang
    Multimedia Systems, 2017, 23 : 63 - 73
  • [33] Graph Ensemble Networks for Semi-supervised Embedding Learning
    Tang, Hui
    Liang, Xun
    Wu, Bo
    Guan, Zhenyu
    Guo, Yuhui
    Zheng, Xiangping
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, 2021, 12815 : 408 - 420
  • [34] A robust semi-supervised SVM via ensemble learning
    Zhang, Dan
    Jiao, Licheng
    Bai, Xue
    Wang, Shuang
    Hou, Biao
    APPLIED SOFT COMPUTING, 2018, 65 : 632 - 643
  • [35] Semi-supervised Learning for Image Modality Classification
    de Herrera, Alba Garcia Seco
    Markonis, Dimitrios
    Joyseeree, Ranveer
    Schaer, Roger
    Foncubierta-Rodriguez, Antonio
    Mueller, Henning
    MULTIMODAL RETRIEVAL IN THE MEDICAL DOMAIN, MRMD 2015, 2015, 9059 : 85 - 98
  • [36] VideoSSL: Semi-Supervised Learning for Video Classification
    Jing, Longlong
    Parag, Toufiq
    Wu, Zhe
    Tian, Yingli
    Wang, Hongcheng
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1109 - 1118
  • [37] Semi-Supervised Classification Based on Transformed Learning
    Kang Z.
    Liu L.
    Han M.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (01): : 103 - 111
  • [38] A semi-supervised feature ranking method with ensemble learning
    Bellal, Fazia
    Elghazel, Haytham
    Aussem, Alex
    PATTERN RECOGNITION LETTERS, 2012, 33 (10) : 1426 - 1433
  • [39] A review of semi-supervised learning for text classification
    Duarte, Jose Marcio
    Berton, Lilian
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (09) : 9401 - 9469
  • [40] Safe semi-supervised learning for pattern classification
    Ma, Jun
    Yu, Guolin
    Xiong, Weizhi
    Zhu, Xiaolong
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 121