Semi-supervised learning using frequent itemset and ensemble learning for SMS classification

被引:37
|
作者
Ahmed, Ishtiaq [1 ]
Ali, Rahman [1 ]
Guan, Donghai [2 ]
Lee, Young-Koo [1 ]
Lee, Sungyoung [1 ]
Chung, TaeChoong [1 ]
机构
[1] Kyung Hee Univ, Dept Comp Engn, Seoul, South Korea
[2] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
基金
新加坡国家研究基金会;
关键词
Short Message Service (SMS); Ham; Spam; Frequent itemset; Ensemble learning; Semi-supervised classification;
D O I
10.1016/j.eswa.2014.08.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Short Message Service (SMS) has become one of the most important media of communications due to the rapid increase of mobile users and it's easy to use operating mechanism. This flood of SMS goes with the problem of spam SMS that are generated by spurious users. The detection of spam SMS has gotten more attention of researchers in recent times and is treated with a number of different machine learning approaches. Supervised machine learning approaches, used so far, demands a large amount of labeled data which is not always available in real applications. The traditional semi-supervised methods can alleviate this problem but may not produce good results if they are provided with only positive and unlabeled data. In this paper, we have proposed a novel semi-supervised learning method which makes use of frequent itemset and ensemble learning (FIEL) to overcome this limitation. In this approach, Apriori algorithm has been used for finding the frequent itemset while Multinomial Naive Bayes, Random Forest and LibSVM are used as base learners for ensemble learning which uses majority voting scheme. Our proposed approach works well with small number of positive data and different amounts of unlabeled dataset with higher accuracy. Extensive experiments have been conducted over UCI SMS spam collection data set, SMS spam collection Corpus v.0.1 Small and Big which show significant improvements in accuracy with very small amount of positive data. We have compared our proposed FIEL approach with the existing SPY-EM and PEBL approaches and the results show that our approach is more stable than the compared approaches with minimum support. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1065 / 1073
页数:9
相关论文
共 50 条
  • [1] A NOVEL SEMI-SUPERVISED LEARNING FOR SMS CLASSIFICATION
    Ahmed, Ishtiaq
    Guan, Donghai
    Chung, Teachoong
    PROCEEDINGS OF 2014 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2014, : 856 - 861
  • [2] Semi-supervised learning based on one-class classification and ensemble learning
    Pan, Zhi-Song
    Yan, Yue-Song
    Miao, Zhi-Min
    Ni, Gui-Qiang
    Zhang, Hui
    Jiefangjun Ligong Daxue Xuebao/Journal of PLA University of Science and Technology (Natural Science Edition), 2010, 11 (04): : 397 - 402
  • [3] Using semi-supervised learning for question classification
    Tri, Nguyen Thanh
    Le, Nguyen Minh
    Shimazu, Akira
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES, PROCEEDINGS: BEYOND THE ORIENT: THE RESEARCH CHALLENGES AHEAD, 2006, 4285 : 31 - +
  • [4] A SEMI-SUPERVISED ENSEMBLE LEARNING ALGORITHM
    Jiang, Zhen
    Zhang, Shiyong
    2012 IEEE 2ND INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENT SYSTEMS (CCIS) VOLS 1-3, 2012, : 913 - 918
  • [5] Semi-supervised ensemble learning based on observational learning
    Yang, Liying
    Zhong, Shanli
    International Journal of Advancements in Computing Technology, 2012, 4 (09) : 298 - 306
  • [6] When Semi-supervised Learning Meets Ensemble Learning
    Zhou, Zhi-Hua
    MULTIPLE CLASSIFIER SYSTEMS, PROCEEDINGS, 2009, 5519 : 529 - 538
  • [7] Semi-supervised Learning with Ensemble Learning and Graph Sharpening
    Choi, Inae
    Shin, Hyunjung
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2008, 2008, 5326 : 172 - 179
  • [8] Hyperspectral Image Labeling and Classification Using an Ensemble Semi-Supervised Machine Learning Approach
    Manian, Vidya
    Alfaro-Mejia, Estefania
    Tokars, Roger P.
    SENSORS, 2022, 22 (04)
  • [9] Semi-supervised learning with ensemble self-training for cancer classification
    Wang, Qingyong
    Xia, Liang-Yong
    Chai, Hua
    Zhou, Yun
    2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2018, : 796 - 803
  • [10] Classification Risk-Based Semi-supervised Ensemble Learning Algorithm
    He Y.
    Zhu P.
    Huang Z.
    Philippe F.-V.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2024, 37 (04): : 339 - 351