Machine Learning Framework for Detecting Offensive Swahili Messages in Social Networks with Apache Spark Implementation

被引:1
|
作者
Jonathan, Francis [1 ]
Yang, Dong [1 ]
Gowing, Glyn [2 ]
Wei, Songjie [1 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Nanjing 210094, Peoples R China
[2] LeTourneau Univ, Comp Sci Dept, Longview, TX 75602 USA
基金
中国国家自然科学基金;
关键词
Machine learning; offensive language; Pyspark; Apache Spark;
D O I
10.1109/PIC53636.2021.9687001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Languages morphological context varies by community. The linguistic analysis became more complex due to grammatical variations, cultural, traditional, slang, misspellings, and language variance. Many studies in sentimental analysis have focused on natural language processing and peoples opinions. Text language processing takes time, requires lots of storage space, and a fast computer to work in distributed networks. Many developers choose Hadoop and Map Reduce to process Big Data. This study developed a methodology that employs Apache Spark as a text classification processing engine since it is faster in cluster computing systems. African libraries and packages for language lemmatization and stemming are still lacking. The proposed approach was utilized to detect offensive Swahili texts in social networks. Swahili is the third most widely spoken language in Africa. Four different machine learning techniques were tested as benchmarks, with the multinomial logistic model proving to be the most effective. The evaluation measures show that the proposed machine learning framework is versatile and suitable for usage in centralized and distributed systems.
引用
收藏
页码:293 / 297
页数:5
相关论文
共 50 条
  • [21] Big data Predictive Analytics for Apache Spark using Machine Learning
    Junaid, Muhammad
    Wagan, Shiraz Ali
    Qureshi, Nawab Muhammad Faseeh
    Nam, Choon Sung
    Shin, Dong Ryeol
    2020 GLOBAL CONFERENCE ON WIRELESS AND OPTICAL TECHNOLOGIES (GCWOT), 2020,
  • [22] A Big Data Analysis Framework Using Apache Spark and Deep Learning
    Gupta, Anand
    Thakur, Hardeo Kumar
    Shrivastava, Ritvik
    Kumar, Pulkit
    Nag, Sreyashi
    2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017), 2017, : 9 - 16
  • [23] Big data processing with Apache Spark in university institutions: spark streaming and machine learning algorithm
    Boachie, Emmanuel
    Li, Chunlin
    INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2019, 29 (1-2) : 5 - 20
  • [24] A Machine Learning Model for detecting Covid-19 Misinformation in Swahili Language
    Mlawa, Filbert
    Mkoba, Elizabeth
    Mduma, Neema
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2023, 13 (03) : 10856 - 10860
  • [25] Design and implementation of an analytical framework for interference aware job scheduling on Apache Spark platform
    Kewen Wang
    Mohammad Maifi Hasan Khan
    Nhan Nguyen
    Swapna Gokhale
    Cluster Computing, 2019, 22 : 2223 - 2237
  • [26] Design and implementation of an analytical framework for interference aware job scheduling on Apache Spark platform
    Wang, Kewen
    Khan, Mohammad Maifi Hasan
    Nhan Nguyen
    Gokhale, Swapna
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 1): : 2223 - 2237
  • [27] Performance Analysis of Java']Java Virtual Machine for Machine Learning Workloads using Apache Spark
    Hema, N.
    Srinivasa, K. G.
    Chidambaram, Saravanan
    Saraswat, Sandeep
    Saraswati, Sujoy
    Ramachandra, Ranganath
    Huttanagoudar, Jayashree B.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATICS AND ANALYTICS (ICIA' 16), 2016,
  • [28] Performance Analysis of Machine Learning Techniques on Big Data Using Apache Spark
    Mogha, Garima
    Ahlawat, Khyati
    Singh, Amit Prakash
    DATA SCIENCE AND ANALYTICS, 2018, 799 : 17 - 26
  • [29] Scaling Machine Learning for Target Prediction in Drug Discovery using Apache Spark
    Harnie, Dries
    Vapirev, Alexander E.
    Wegner, Jorg Kurt
    Gedich, Andrey
    Steijaert, Marvin
    Wuyts, Roel
    De Meuter, Wolfgang
    2015 15TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING, 2015, : 871 - 879
  • [30] A COMPARISON OF MACHINE LEARNING TECHNIQUES FOR ANDROID MALWARE DETECTION USING APACHE SPARK
    Memon, Laraib U.
    Bawany, Narmeen Z.
    Shamsi, Jawwad A.
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2019, 14 (03): : 1572 - 1586