Semi-supervised learning using frequent itemset and ensemble learning for SMS classification

被引:37
|
作者
Ahmed, Ishtiaq [1 ]
Ali, Rahman [1 ]
Guan, Donghai [2 ]
Lee, Young-Koo [1 ]
Lee, Sungyoung [1 ]
Chung, TaeChoong [1 ]
机构
[1] Kyung Hee Univ, Dept Comp Engn, Seoul, South Korea
[2] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
基金
新加坡国家研究基金会;
关键词
Short Message Service (SMS); Ham; Spam; Frequent itemset; Ensemble learning; Semi-supervised classification;
D O I
10.1016/j.eswa.2014.08.054
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Short Message Service (SMS) has become one of the most important media of communications due to the rapid increase of mobile users and it's easy to use operating mechanism. This flood of SMS goes with the problem of spam SMS that are generated by spurious users. The detection of spam SMS has gotten more attention of researchers in recent times and is treated with a number of different machine learning approaches. Supervised machine learning approaches, used so far, demands a large amount of labeled data which is not always available in real applications. The traditional semi-supervised methods can alleviate this problem but may not produce good results if they are provided with only positive and unlabeled data. In this paper, we have proposed a novel semi-supervised learning method which makes use of frequent itemset and ensemble learning (FIEL) to overcome this limitation. In this approach, Apriori algorithm has been used for finding the frequent itemset while Multinomial Naive Bayes, Random Forest and LibSVM are used as base learners for ensemble learning which uses majority voting scheme. Our proposed approach works well with small number of positive data and different amounts of unlabeled dataset with higher accuracy. Extensive experiments have been conducted over UCI SMS spam collection data set, SMS spam collection Corpus v.0.1 Small and Big which show significant improvements in accuracy with very small amount of positive data. We have compared our proposed FIEL approach with the existing SPY-EM and PEBL approaches and the results show that our approach is more stable than the compared approaches with minimum support. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1065 / 1073
页数:9
相关论文
共 50 条
  • [21] Rough set and ensemble learning based semi-supervised algorithm for text classification
    Shi, Lei
    Ma, Xinming
    Xi, Lei
    Duan, Qiguo
    Zhao, Jingying
    EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (05) : 6300 - 6306
  • [22] Classification of real and bogus transients using active learning and semi-supervised learning
    Liu, Yating
    Fan, Lulu
    Hu, Lei
    Lu, Junqiang
    Lu, Yan
    Xu, Zelin
    Zhu, Jiazheng
    Wang, Haochen
    Kong, Xu
    ASTRONOMY & ASTROPHYSICS, 2025, 693
  • [23] News Article Classification with Clustering using Semi-Supervised Learning
    Krishnamoorthy, Arjun
    Patil, Akshay Kishor
    Vasudevan, N.
    Pathari, Vinod
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 86 - 91
  • [24] Wheat Head Detection using Deep, Semi-Supervised and Ensemble Learning
    Fourati, Fares
    Mseddi, Wided Souidene
    Attia, Rabah
    CANADIAN JOURNAL OF REMOTE SENSING, 2021, 47 (02) : 198 - 208
  • [25] Offline/realtime traffic classification using semi-supervised learning
    Erman, Jeffrey
    Mahanti, Anirban
    Arlitt, Martin
    Cohen, Ira
    Williamson, Carey
    PERFORMANCE EVALUATION, 2007, 64 (9-12) : 1194 - 1213
  • [26] Instantaneous Mental Workload Classification Using Semi-Supervised Learning
    Zhang, Jianhua
    Li, Jianrong
    Nichele, Stefano
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 410 - 416
  • [27] A Semi-supervised Classification Method of Parasites Using Contrastive Learning
    Ren, Yanni
    Jiang, Hao
    Zhu, Huilin
    Tian, Yanling
    Hu, Jinglu
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2022, 17 (03) : 445 - 453
  • [28] A review of semi-supervised learning for text classification
    José Marcio Duarte
    Lilian Berton
    Artificial Intelligence Review, 2023, 56 : 9401 - 9469
  • [29] Semi-supervised tensor learning for image classification
    Zhang, Jianguang
    Han, Yahong
    Jiang, Jianmin
    MULTIMEDIA SYSTEMS, 2017, 23 (01) : 63 - 73
  • [30] A Semi-Supervised Learning Algorithm for Data Classification
    Kuo, Cheng-Chien
    Shieh, Horng-Lin
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (05)