Multi-label Classification for Hate Speech and Abusive Language in Indonesian-Local Languages

被引:0
|
作者
Asti, Ajeng Dwi [1 ]
Budi, Indra [1 ]
Ibrohim, Muhammad Okky [1 ]
机构
[1] Univ Indonesia, Fac Comp Sci, Depok, Indonesia
关键词
hate speech; multi-label classification; Indonesian local language; Twitter;
D O I
10.1109/ICACSIS53237.2021.9631316
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Each hate speech has a target, category, and level that needs to be detected to help the authorities prioritize hate speech cases that need to be solved first. Various studies have been conducted in Indonesia on abusive speech and hate speech and their targets, categories, and levels, but only in Indonesian and English. On the other hand, various local languages in Indonesia open up opportunities for hate speech to occur using the local language. This study aims to compare some of the best machine learning algorithms, transformation methods, and feature extraction techniques in classifying abusive language and hate speech and their targets, categories, and levels using Twitter data in Indonesian and local languages. This study uses five local languages in Indonesia with the most speakers: Javanese, Sundanese, Madurese, Minangkabau, and Musi (Palembang). The algorithms used are Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), and Random Forest Decision Tree (RFDT) with Binary Relevance (BR), Classifier Chains (CC), and Label Powerset (LP) as transformation methods. The term weighting used in this study is TF-IDF with word n-gram and char n-gram features. The results showed that the SVM algorithm with the CC transformation method and unigram feature extraction gave the highest F1-score results, 66.33% for Javanese and 65.68% for Sundanese. In Madurese, Minangkabau, and Musi language data, the best F1-score was obtained using the RFDT algorithm with the CC transformation method and unigram feature extraction with F1-score 76.37% 80.75%, and 77.34%.
引用
收藏
页码:325 / 330
页数:6
相关论文
共 50 条
  • [41] Label prompt for multi-label text classification
    Rui Song
    Zelong Liu
    Xingbing Chen
    Haining An
    Zhiqi Zhang
    Xiaoguang Wang
    Hao Xu
    Applied Intelligence, 2023, 53 : 8761 - 8775
  • [42] Multi-label classification by exploiting label correlations
    Yu, Ying
    Pedrycz, Witold
    Miao, Duoqian
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (06) : 2989 - 3004
  • [43] Multi-Label Classification with Label Graph Superimposing
    Wang, Ya
    He, Dongliang
    Li, Fu
    Long, Xiang
    Zhou, Zhichao
    Ma, Jinwen
    Wen, Shilei
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12265 - 12272
  • [44] Comparing Deep Learning Models for Multi-label Classification of Arabic Abusive Texts in Social Media
    Azzi, Salma Abid
    Zribi, Chiraz Ben Othmane
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON SOFTWARE TECHNOLOGIES (ICSOFT), 2022, : 374 - 381
  • [45] A multi-label classification on topics of Indonesian news using K-Nearest Neighbor
    Isnaini, Nikmah
    Adiwijaya
    Mubarok, Mohamad Syahrul
    Abu Bakar, Muhammad Yuslan
    2ND INTERNATIONAL CONFERENCE ON DATA AND INFORMATION SCIENCE, 2019, 1192
  • [46] Multi-label Classification on Natural Language Sentences for Video Game Design
    Zhan, Yue
    Hsiao, Michael S.
    2019 IEEE INTERNATIONAL CONFERENCE ON HUMANIZED COMPUTING AND COMMUNICATION (HCC 2019), 2019, : 52 - 59
  • [47] Spatial relation extraction in natural language with multi-label classification model
    Zhou, J. (zhoujs@njnu.edu.cn), 1600, ICIC Express Letters Office, Tokai University, Kumamoto Campus, 9-1-1, Toroku, Kumamoto, 862-8652, Japan (03):
  • [48] Multi-label classification of Indonesian news topics using Pseudo Nearest Neighbor Rule
    Pambudi, Reza Agung
    Adiwijaya
    Mubarok, Mohamad Syahrul
    2ND INTERNATIONAL CONFERENCE ON DATA AND INFORMATION SCIENCE, 2019, 1192
  • [49] Hate Speech Detection in the Indonesian Language: A Dataset and Preliminary Study
    Alfina, Ika
    Mulia, Rio
    Fanany, Mohamad Ivan
    Ekanata, Yudo
    2017 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2017, : 233 - 237
  • [50] OPTIMAL RANKING IN MULTI-LABEL CLASSIFICATION USING LOCAL PRECISION RATES
    Jiang, Ci-Ren
    Liu, Chun-Chi
    Zhou, Xianghong J.
    Huang, Haiyan
    STATISTICA SINICA, 2014, 24 (04) : 1547 - 1570