Machine Learning Framework for Detecting Offensive Swahili Messages in Social Networks with Apache Spark Implementation

被引:1
|
作者
Jonathan, Francis [1 ]
Yang, Dong [1 ]
Gowing, Glyn [2 ]
Wei, Songjie [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] LeTourneau Univ, Comp Sci Dept, Longview, TX 75602 USA
基金
中国国家自然科学基金;
关键词
Machine learning; offensive language; Pyspark; Apache Spark;
D O I
10.1109/PIC53636.2021.9687001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Languages morphological context varies by community. The linguistic analysis became more complex due to grammatical variations, cultural, traditional, slang, misspellings, and language variance. Many studies in sentimental analysis have focused on natural language processing and peoples opinions. Text language processing takes time, requires lots of storage space, and a fast computer to work in distributed networks. Many developers choose Hadoop and Map Reduce to process Big Data. This study developed a methodology that employs Apache Spark as a text classification processing engine since it is faster in cluster computing systems. African libraries and packages for language lemmatization and stemming are still lacking. The proposed approach was utilized to detect offensive Swahili texts in social networks. Swahili is the third most widely spoken language in Africa. Four different machine learning techniques were tested as benchmarks, with the multinomial logistic model proving to be the most effective. The evaluation measures show that the proposed machine learning framework is versatile and suitable for usage in centralized and distributed systems.
引用
收藏
页码:293 / 297
页数:5
相关论文
共 50 条
  • [1] Machine Learning Driven Responsible Gaming Framework with Apache Spark
    Mijic, Dejan
    Varga, Ervin
    2017 25TH TELECOMMUNICATION FORUM (TELFOR), 2017, : 796 - 799
  • [2] MLlib: Machine learning in Apache Spark
    Meng, Xiangrui
    Bradley, Joseph
    Yavuz, Burak
    Sparks, Evan
    Venkataraman, Shivaram
    Liu, Davies
    Freeman, Jeremy
    Tsai, D.B.
    Amde, Manish
    Owen, Sean
    Xin, Doris
    Xin, Reynold
    Franklin, Michael J.
    Zadeh, Reza
    Zaharia, Matei
    Talwalkar, Ameet
    Journal of Machine Learning Research, 2016, 17
  • [3] MLlib: Machine Learning in Apache Spark
    Meng, Xiangrui
    Bradley, Joseph
    Yavuz, Burak
    Sparks, Evan
    Venkataraman, Shivaram
    Liu, Davies
    Freeman, Jeremy
    Tsai, D. B.
    Amde, Manish
    Owen, Sean
    Xin, Doris
    Xin, Reynold
    Franklin, Michael J.
    Zadeh, Reza
    Zaharia, Matei
    Talwalkar, Ameet
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [4] A COMPUTATIONAL FRAMEWORK FOR DETECTING OFFENSIVE LANGUAGE WITH SUPPORT VECTOR MACHINE IN SOCIAL COMMUNITIES
    Shende, Snehal B.
    Deshpande, Leena
    2017 8TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2017,
  • [5] Privacy-Preserving Machine Learning on Apache Spark
    Brito, Claudia V.
    Ferreira, Pedro G.
    Portela, Bernardo L.
    Oliveira, Rui C.
    Paulo, Joao T.
    IEEE ACCESS, 2023, 11 : 127907 - 127930
  • [6] Optimizing Machine Learning on Apache Spark in HPC Environments
    Li, Zhenyu
    Davis, James
    Jarvis, Stephen A.
    PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), 2018, : 95 - 105
  • [7] A framework for detecting malformed messages in SIP networks
    Geneiatakis, D
    Kambourakis, G
    Dagiuklas, T
    Lambrinoudakis, C
    Gritzalis, S
    2005 14TH IEEE WORKSHOP ON LOCAL & METROPOLITAN AREA NETWORKS (LANMAN), 2005, : 189 - 193
  • [8] Detecting Signs of Depression in Social Networks Users: A Framework for Enhancing the Quality of Machine Learning Models
    Gorrab, Abir
    Ben Rabah, Nourhene
    Le Grand, Benedicte
    Deneckere, Rebecca
    Bonnerot, Thomas
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 2, AINA 2024, 2024, 200 : 303 - 315
  • [9] On Scalability of Distributed Machine Learning with Big Data on Apache Spark
    Hai, Ameen Abdel
    Forouraghi, Babak
    BIG DATA - BIGDATA 2018, 2018, 10968 : 209 - 219
  • [10] Network Intrusion Detection on Apache Spark with Machine Learning Algorithms
    Kurt, Elif Merve
    Becerikli, Yasar
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2018, 2018, 893 : 130 - 141