Multi-label Classification for Hate Speech and Abusive Language in Indonesian-Local Languages

被引：0

作者：

Asti, Ajeng Dwi ^{[1
]}

Budi, Indra ^{[1
]}

Ibrohim, Muhammad Okky ^{[1
]}

机构：

[1] Univ Indonesia, Fac Comp Sci, Depok, Indonesia

来源：

13TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS 2021) | 2021年

关键词：

hate speech; multi-label classification; Indonesian local language; Twitter;

D O I：

10.1109/ICACSIS53237.2021.9631316

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Each hate speech has a target, category, and level that needs to be detected to help the authorities prioritize hate speech cases that need to be solved first. Various studies have been conducted in Indonesia on abusive speech and hate speech and their targets, categories, and levels, but only in Indonesian and English. On the other hand, various local languages in Indonesia open up opportunities for hate speech to occur using the local language. This study aims to compare some of the best machine learning algorithms, transformation methods, and feature extraction techniques in classifying abusive language and hate speech and their targets, categories, and levels using Twitter data in Indonesian and local languages. This study uses five local languages in Indonesia with the most speakers: Javanese, Sundanese, Madurese, Minangkabau, and Musi (Palembang). The algorithms used are Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Random Forest Decision Tree (RFDT) with Binary Relevance (BR), Classifier Chains (CC), and Label Powerset (LP) as transformation methods. The term weighting used in this study is TF-IDF with word n-gram and char n-gram features. The results showed that the SVM algorithm with the CC transformation method and unigram feature extraction gave the highest F1-score results, 66.33% for Javanese and 65.68% for Sundanese. In Madurese, Minangkabau, and Musi language data, the best F1-score was obtained using the RFDT algorithm with the CC transformation method and unigram feature extraction with F1-score 76.37% 80.75%, and 77.34%.

引用

页码：325 / 330

页数：6

共 50 条

[31] Multi-label Dysfluency Classification
Jouaiti, Melanie
Dautenhahn, Kerstin
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 290 - 301
[32] Multi-label Deepfake Classification
Singh, Inder Pal
Mejri, Nesryne
Nguyen, Van Dat
Ghorbel, Enjie
Aouada, Djamila
2023 IEEE 25TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, MMSP, 2023,
[33] Multi-Label Topic Classification of Hadith of Bukhari (Indonesian Language translation) using Information Gain and Backpropagation Neural Network
Abu Bakar, Muhammad Yuslan
Adiwijaya
Al Faraby, Said
2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 344 - 350
[34] Spanning the Spectrum of Hatred Detection: A Persian Multi-Label Hate Speech Dataset with Annotator Rationales
Delbari, Zahra
Moosavi, Nafise Sadat
Pilehvar, Mohammad Taher
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17889 - 17897
[35] Multi-kernel learning for multi-label classification with local Rademacher complexity
Wang, Zhenxin
Chen, Degang
Che, Xiaoya
INFORMATION SCIENCES, 2023, 647
[36] Learning label-specific features with global and local label correlation for multi-label classification
Wei Weng
Bowen Wei
Wen Ke
Yuling Fan
Jinbo Wang
Yuwen Li
Applied Intelligence, 2023, 53 : 3017 - 3033
[37] Learning label-specific features with global and local label correlation for multi-label classification
Weng, Wei
Wei, Bowen
Ke, Wen
Fan, Yuling
Wang, Jinbo
Li, Yuwen
APPLIED INTELLIGENCE, 2023, 53 (03) : 3017 - 3033
[38] Calibrated Multi-label Classification with Label Correlations
Zhi-Fen He
Ming Yang
Hui-Dong Liu
Lei Wang
Neural Processing Letters, 2019, 50 : 1361 - 1380
[39] Robust label compression for multi-label classification
Zhang, Ju-Jie
Fang, Min
Wu, Jin-Qiao
Li, Xiao
KNOWLEDGE-BASED SYSTEMS, 2016, 107 : 32 - 42
[40] Calibrated Multi-label Classification with Label Correlations
He, Zhi-Fen
Yang, Ming
Liu, Hui-Dong
Wang, Lei
NEURAL PROCESSING LETTERS, 2019, 50 (02) : 1361 - 1380

← 1 2 3 4 5 →