Semi-supervised learning using frequent itemset and ensemble learning for SMS classification

被引:37
|
作者
Ahmed, Ishtiaq [1 ]
Ali, Rahman [1 ]
Guan, Donghai [2 ]
Lee, Young-Koo [1 ]
Lee, Sungyoung [1 ]
Chung, TaeChoong [1 ]
机构
[1] Kyung Hee Univ, Dept Comp Engn, Seoul, South Korea
[2] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
基金
新加坡国家研究基金会;
关键词
Short Message Service (SMS); Ham; Spam; Frequent itemset; Ensemble learning; Semi-supervised classification;
D O I
10.1016/j.eswa.2014.08.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Short Message Service (SMS) has become one of the most important media of communications due to the rapid increase of mobile users and it's easy to use operating mechanism. This flood of SMS goes with the problem of spam SMS that are generated by spurious users. The detection of spam SMS has gotten more attention of researchers in recent times and is treated with a number of different machine learning approaches. Supervised machine learning approaches, used so far, demands a large amount of labeled data which is not always available in real applications. The traditional semi-supervised methods can alleviate this problem but may not produce good results if they are provided with only positive and unlabeled data. In this paper, we have proposed a novel semi-supervised learning method which makes use of frequent itemset and ensemble learning (FIEL) to overcome this limitation. In this approach, Apriori algorithm has been used for finding the frequent itemset while Multinomial Naive Bayes, Random Forest and LibSVM are used as base learners for ensemble learning which uses majority voting scheme. Our proposed approach works well with small number of positive data and different amounts of unlabeled dataset with higher accuracy. Extensive experiments have been conducted over UCI SMS spam collection data set, SMS spam collection Corpus v.0.1 Small and Big which show significant improvements in accuracy with very small amount of positive data. We have compared our proposed FIEL approach with the existing SPY-EM and PEBL approaches and the results show that our approach is more stable than the compared approaches with minimum support. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1065 / 1073
页数:9
相关论文
共 50 条
  • [41] Semi-supervised learning for question classification in CQA
    Li, Yiyang
    Su, Lei
    Chen, Jun
    Yuan, Liwei
    NATURAL COMPUTING, 2017, 16 (04) : 567 - 577
  • [42] Integrated Semi-Supervised Model for Learning and Classification
    Bhalla, Vandna
    Chaudhury, Santanu
    PROCEEDINGS OF 3RD INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE PROCESSING, CVIP 2018, VOL 1, 2020, 1022 : 183 - 195
  • [43] SEMI-SUPERVISED LEARNING FOR MARS IMAGERY CLASSIFICATION
    Wang, Wenjing
    Lin, Lilang
    Fan, Zejia
    Liu, Baying
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 499 - 503
  • [44] Semi-supervised learning for photometric supernova classification
    Richards, Joseph W.
    Homrighausen, Darren
    Freeman, Peter E.
    Schafer, Chad M.
    Poznanski, Dovi
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2012, 419 (02) : 1121 - 1135
  • [45] Multimodal semi-supervised learning for image classification
    Guillaumin, Matthieu
    Verbeek, Jakob
    Schmid, Cordelia
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 902 - 909
  • [46] Interactive Image Segmentation by Semi-supervised Learning Ensemble
    Xu, Jiazhen
    Chen, Xinmeng
    Huang, Xuejuan
    KAM: 2008 INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING, PROCEEDINGS, 2008, : 645 - 648
  • [47] A reliable ensemble based approach to semi-supervised learning
    de Vries, Sjoerd
    Thierens, Dirk
    KNOWLEDGE-BASED SYSTEMS, 2021, 215
  • [48] TEXT CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING
    Vo Duy Thanh
    Vo Trung Hung
    Pham Minh Tuan
    Doan Van Ban
    2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 232 - 236
  • [49] Extreme semi-supervised learning for multiclass classification
    Chen, Chuangquan
    Gan, Yanfen
    Vong, Chi-Man
    NEUROCOMPUTING, 2020, 376 : 103 - 118
  • [50] Semi-Supervised Text Classification With Universum Learning
    Liu, Chien-Liang
    Hsaio, Wen-Hoar
    Lee, Chia-Hoang
    Chang, Tao-Hsing
    Kuo, Tsung-Hsun
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (02) : 462 - 473