Multi-label Classification for Hate Speech and Abusive Language in Indonesian-Local Languages

被引:0
|
作者
Asti, Ajeng Dwi [1 ]
Budi, Indra [1 ]
Ibrohim, Muhammad Okky [1 ]
机构
[1] Univ Indonesia, Fac Comp Sci, Depok, Indonesia
关键词
hate speech; multi-label classification; Indonesian local language; Twitter;
D O I
10.1109/ICACSIS53237.2021.9631316
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Each hate speech has a target, category, and level that needs to be detected to help the authorities prioritize hate speech cases that need to be solved first. Various studies have been conducted in Indonesia on abusive speech and hate speech and their targets, categories, and levels, but only in Indonesian and English. On the other hand, various local languages in Indonesia open up opportunities for hate speech to occur using the local language. This study aims to compare some of the best machine learning algorithms, transformation methods, and feature extraction techniques in classifying abusive language and hate speech and their targets, categories, and levels using Twitter data in Indonesian and local languages. This study uses five local languages in Indonesia with the most speakers: Javanese, Sundanese, Madurese, Minangkabau, and Musi (Palembang). The algorithms used are Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Random Forest Decision Tree (RFDT) with Binary Relevance (BR), Classifier Chains (CC), and Label Powerset (LP) as transformation methods. The term weighting used in this study is TF-IDF with word n-gram and char n-gram features. The results showed that the SVM algorithm with the CC transformation method and unigram feature extraction gave the highest F1-score results, 66.33% for Javanese and 65.68% for Sundanese. In Madurese, Minangkabau, and Musi language data, the best F1-score was obtained using the RFDT algorithm with the CC transformation method and unigram feature extraction with F1-score 76.37% 80.75%, and 77.34%.
引用
收藏
页码:325 / 330
页数:6
相关论文
共 50 条
  • [1] Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter
    Ibrohim, Muhammad Okky
    Budi, Indra
    THIRD WORKSHOP ON ABUSIVE LANGUAGE ONLINE, 2019, : 46 - 57
  • [2] Multi-label text classification on unbalanced Twitter with monolingual model and hyperparameter optimization for hate speech and abusive language detection
    Alzahrani, Ahmad A.
    Bramantoro, Arif
    Permana, Asep
    INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES, 2024, 11 (05): : 177 - 185
  • [3] Separating Hate Speech from Abusive Language on Indonesian Twitter
    Ibrahim, Muhammad Amien
    Sagala, Noviyanti Tri Maretta
    Arifin, Samsul
    Nariswari, Rinda
    Murnaka, Nerru Pranuta
    Prasetyo, Puguh Wahyu
    2022 INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ITS APPLICATIONS (ICODSA), 2022, : 187 - 191
  • [4] ETHOS: a multi-label hate speech detection dataset
    Mollas, Ioannis
    Chrysopoulou, Zoe
    Karlos, Stamatis
    Tsoumakas, Grigorios
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (06) : 4663 - 4678
  • [5] ETHOS: a multi-label hate speech detection dataset
    Ioannis Mollas
    Zoe Chrysopoulou
    Stamatis Karlos
    Grigorios Tsoumakas
    Complex & Intelligent Systems, 2022, 8 : 4663 - 4678
  • [6] Hate speech and abusive language detection in Indonesian social media: Progress and challenges
    Ibrohim, Muhammad Okky
    Budi, Indra
    HELIYON, 2023, 9 (08)
  • [7] Multi-Label Classification of Hate Speech Severity on Social Media using BERT Model
    Dirting, Bakwa Dunka
    Chukwudebe, Gloria A.
    Nwokorie, Euphemia Chioma
    Ayogu, Ikechukwu Ignatius
    2022 IEEE NIGERIA 4TH INTERNATIONAL CONFERENCE ON DISRUPTIVE TECHNOLOGIES FOR SUSTAINABLE DEVELOPMENT (IEEE NIGERCON), 2022, : 267 - 271
  • [8] Language comprehension as a multi-label classification problem
    Sering, Konstantin
    Milin, Petar
    Baayen, R. Harald
    STATISTICA NEERLANDICA, 2018, 72 (03) : 339 - 353
  • [9] Hate Speech Classification in Indonesian Language Tweets Convolutional Neural Network
    Taradhita, Dewa Ayu Nadia
    Putra, I. Ketut Gede Darma
    JOURNAL OF ICT RESEARCH AND APPLICATIONS, 2021, 14 (03) : 225 - 239
  • [10] Sparse Local Embeddings for Extreme Multi-label Classification
    Bhatia, Kush
    Jain, Himanshu
    Kar, Purushottam
    Varma, Manik
    Jain, Prateek
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28