Machine Learning Framework for Detecting Offensive Swahili Messages in Social Networks with Apache Spark Implementation

被引:1
|
作者
Jonathan, Francis [1 ]
Yang, Dong [1 ]
Gowing, Glyn [2 ]
Wei, Songjie [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] LeTourneau Univ, Comp Sci Dept, Longview, TX 75602 USA
基金
中国国家自然科学基金;
关键词
Machine learning; offensive language; Pyspark; Apache Spark;
D O I
10.1109/PIC53636.2021.9687001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Languages morphological context varies by community. The linguistic analysis became more complex due to grammatical variations, cultural, traditional, slang, misspellings, and language variance. Many studies in sentimental analysis have focused on natural language processing and peoples opinions. Text language processing takes time, requires lots of storage space, and a fast computer to work in distributed networks. Many developers choose Hadoop and Map Reduce to process Big Data. This study developed a methodology that employs Apache Spark as a text classification processing engine since it is faster in cluster computing systems. African libraries and packages for language lemmatization and stemming are still lacking. The proposed approach was utilized to detect offensive Swahili texts in social networks. Swahili is the third most widely spoken language in Africa. Four different machine learning techniques were tested as benchmarks, with the multinomial logistic model proving to be the most effective. The evaluation measures show that the proposed machine learning framework is versatile and suitable for usage in centralized and distributed systems.
引用
收藏
页码:293 / 297
页数:5
相关论文
共 50 条
  • [11] Big Data Machine Learning using Apache Spark MLlib
    Assefi, Mehdi
    Behravesh, Ehsun
    Liu, Guangchi
    Tafti, Ahmad P.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
  • [12] A Novel Machine Learning-based Framework for Detecting Religious Arabic Hatred Speech in Social Networks
    Masadeh, Mahmoud
    Davanager, Hanumanthappa Jayappa
    Muaad, Abdullah Y.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 767 - 776
  • [13] Performance Evaluation of Machine Learning Algorithms in Apache Spark for Intrusion Detection
    Dobson, Anthony
    Roy, Kaushik
    Yuan, Xiaohong
    Xu, Jinsheng
    2018 28TH INTERNATIONAL TELECOMMUNICATION NETWORKS AND APPLICATIONS CONFERENCE (ITNAC), 2018, : 374 - 379
  • [14] Model averaging in distributed machine learning: a case study with Apache Spark
    Guo, Yunyan
    Zhang, Zhipeng
    Jiang, Jiawei
    Wu, Wentao
    Zhang, Ce
    Cui, Bin
    Li, Jianzhong
    VLDB JOURNAL, 2021, 30 (04): : 693 - 712
  • [15] Cloud-agnostic architectures for machine learning based on Apache Spark
    Nagy, Eniko
    Lovas, Robert
    Pintye, Istvan
    Hajnal, Akos
    Kacsuk, Peter
    ADVANCES IN ENGINEERING SOFTWARE, 2021, 159
  • [16] Understanding and Optimizing the Performance of Distributed Machine Learning Applications on Apache Spark
    Dunner, Celestine
    Parnell, Thomas
    Atasu, Kubilay
    Sifalakis, Manolis
    Pozidis, Haralampos
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 331 - 338
  • [17] Model averaging in distributed machine learning: a case study with Apache Spark
    Yunyan Guo
    Zhipeng Zhang
    Jiawei Jiang
    Wentao Wu
    Ce Zhang
    Bin Cui
    Jianzhong Li
    The VLDB Journal, 2021, 30 : 693 - 712
  • [18] Predicting Diabetes using Distributed Machine Learning based on Apache Spark
    Ahmed, Hager
    Younis, Eman M. G.
    Ali, Abdelmgeid A.
    PROCEEDINGS OF 2020 INTERNATIONAL CONFERENCE ON INNOVATIVE TRENDS IN COMMUNICATION AND COMPUTER ENGINEERING (ITCE), 2020, : 44 - 49
  • [19] Machine learning approach on apache spark for credit card fraud detection
    Santosh T.
    Ramesh D.
    Ingenierie des Systemes d'Information, 2020, 25 (01): : 101 - 106
  • [20] Machine Learning-based Product Recommendation using Apache Spark
    Chen, Lin
    Li, Rui
    Liu, Yige
    Zhang, Ruixuan
    Woodbridge, Diane Myung-kyung
    2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,