Twitter spam account detection based on clustering and classification methods

被引:0
|
作者
Kayode Sakariyah Adewole
Tao Han
Wanqing Wu
Houbing Song
Arun Kumar Sangaiah
机构
[1] University of Ilorin,Faculty of Communication and Information Sciences
[2] Dongguan University of Technology,DGUT
[3] Shenzhen Institutes of Advanced Technology (SIAT),CNAM Institute
[4] Chinese Academy of Sciences (CAS),CAS Key Laboratory of Human
[5] Embry-Riddle Aeronautical University,Machine Intelligence
[6] Vellore Institute of Technology,Synergy Systems
来源
关键词
Online social network; Spam detection; Fake account; Clustering; Classification;
D O I
暂无
中图分类号
学科分类号
摘要
Twitter social network has gained more popularity due to the increase in social activities of registered users. Twitter performs dual functions of online social network (OSN), acting as a microblogging OSN, and at the same time as a news update platform. Recently, the growth in Twitter social interactions has attracted the attention of cybercriminals. Spammers have used Twitter to spread malicious messages, post phishing links, flood the network with fake accounts, and engage in other malicious activities. The process of detecting the network of spammers who engage in these activities is an important step toward identifying individual spam account. Researchers have proposed a number of approaches to identify a group of spammers. However, each of these approaches addressed a specific category of spammer. This paper proposes a different approach to detect spammers on Twitter based on the similarities that exist among spam accounts. A number of features were introduced to improve the performance of the three classification algorithms selected in this study. The proposed approach applied principal component analysis and tuned K-means algorithm to cluster over 200,000 accounts, randomly selected from more than 2 million tweets to detect the clusters of spammers. Experimental results show that Random Forest achieved the highest accuracy of 96.30%. This result is followed by multilayer perceptron with 96.00% and support vector machine, which achieved 95.60%. The performance of the selected classifiers based on class imbalance also revealed that Random Forest achieved the highest accuracy, precision, recall, and F-measure.
引用
收藏
页码:4802 / 4837
页数:35
相关论文
共 50 条
  • [1] Twitter spam account detection based on clustering and classification methods
    Adewole, Kayode Sakariyah
    Hang, Tao
    Wu, Wanqing
    Songs, Houbing
    Sangaiah, Arun Kumar
    JOURNAL OF SUPERCOMPUTING, 2020, 76 (07): : 4802 - 4837
  • [2] Tweet and Account Based Spam Detection on Twitter
    Gungor, Kubra Nur
    Erdem, O. Ayhan
    Dogru, Ibrahim Alper
    ARTIFICIAL INTELLIGENCE AND APPLIED MATHEMATICS IN ENGINEERING PROBLEMS, 2020, 43 : 898 - 905
  • [3] MACHINE LEARNING BASED TWITTER SPAM ACCOUNT DETECTION: A REVIEW
    Gheewala, Shivangi
    Patel, Rakesh
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2018), 2018, : 79 - 84
  • [4] Assisted Labeling for Spam Account Detection on Twitter
    Concone, Federico
    Lo Re, Giuseppe
    Morana, Marco
    Ruocco, Claudio
    2019 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2019), 2019, : 359 - 366
  • [5] A Novel Stream Clustering Framework for Spam Detection in Twitter
    Tajalizadeh, Hadi
    Boostani, Reza
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2019, 6 (03) : 525 - 534
  • [6] Threshold and Associative Based Classification for Social Spam Profile Detection on Twitter
    Hua, Willian
    Zhang, Yanqing
    2013 NINTH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2013, : 113 - 120
  • [7] Sentiment Based Twitter Spam Detection
    Perveen, Nasira
    Missen, Malik M. Saad
    Rasool, Qaisar
    Akhtar, Nadeem
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (07) : 568 - 573
  • [8] Adaptive Classification for Spam Detection on Twitter with Specific Data
    Dangkesee, Thayakorn
    Puntheeranurak, Sutheera
    2017 21ST INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE (ICSEC 2017), 2017, : 243 - 246
  • [9] A hybrid classification method for Twitter spam detection based on differential evolution and random forest
    Bazzaz Abkenar, Sepideh
    Mahdipour, Ebrahim
    Jameii, Seyed Mahdi
    Haghi Kashani, Mostafa
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (21):
  • [10] Spam Detection on Twitter : A Survey
    Kaur, Prabhjot
    Singhal, Anuhha
    Kaur, Jasleen
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2570 - 2573