Multi-label Hate Speech and Abusive Language Detection in Indonesian Twitter

被引:0
|
作者
Ibrohim, Muhammad Okky [1 ]
Budi, Indra [1 ]
机构
[1] Univ Indonesia, Fac Comp Sci, Kampus UI, Depok 16424, Indonesia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hate speech and abusive language spreading on social media need to be detected automatically to avoid conflicts between citizens. Moreover, hate speech has a target, category, and level that also need to be detected to help the authority in prioritizing which hate speech must be addressed immediately. This research discusses multi-label text classification for abusive language and hate speech detection including detecting the target, category, and level of hate speech in Indonesian Twitter using machine learning approaches with Support Vector Machine (SVM), Naive Bayes (NB), and Random Forest Decision Tree (RFDT) classifier and Binary Relevance (BR), Label Power-set (LP), and Classifier Chains (CC) as the data transformation method. We used several kinds of feature extractions which are term frequency, orthography, and lexicon features. Our experiment results show that in general the RFDT classifier using LP as the transformation method gives the best accuracy with fast computational time.
引用
收藏
页码:46 / 57
页数:12
相关论文
共 50 条
  • [21] AraCOVID19-MFH: Arabic COVID-19 Multi-label Fake News & Hate Speech Detection Dataset
    Hadj Ameur, Mohamed Seghir
    Aliane, Hassina
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 232 - 241
  • [22] Hate speech detection on Twitter using transfer learning
    Ali, Raza
    Farooq, Umar
    Arshad, Umair
    Shahzad, Waseem
    Beg, Mirza Omer
    COMPUTER SPEECH AND LANGUAGE, 2022, 74
  • [23] Hate Speech Detection in Twitter using Transformer Methods
    Mutanga, Raymond T.
    Naicker, Nalindren
    Olugbara, Oludayo O.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (09) : 614 - 620
  • [24] Language Agnostic Hate Speech Detection
    Arango, Ayme
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 2475 - 2475
  • [25] A Large-Scale English Multi-Label Twitter Dataset for Online Abuse Detection
    Salawu, Semiu
    Lumsden, Jo
    He, Yulan
    WOAH 2021: THE 5TH WORKSHOP ON ONLINE ABUSE AND HARMS, 2021, : 146 - 156
  • [26] Hate Speech on Twitter: A Pragmatic Approach to Collect Hateful and Offensive Expressions and Perform Hate Speech Detection
    Watanabe, Hajime
    Bouazizi, Mondher
    Ohtsuki, Tomoaki
    IEEE ACCESS, 2018, 6 : 13825 - 13835
  • [27] Detection of political hate speech in Korean language
    Ryu, Hyo-sun
    Lee, Jae Kook
    LANGUAGE RESOURCES AND EVALUATION, 2024,
  • [28] Hate Speech Classification in Indonesian Language Tweets Convolutional Neural Network
    Taradhita, Dewa Ayu Nadia
    Putra, I. Ketut Gede Darma
    JOURNAL OF ICT RESEARCH AND APPLICATIONS, 2021, 14 (03) : 225 - 239
  • [29] Twitter-based Polarised Embeddings for Abusive Language Detection
    Graumas, Leon
    David, Roy
    Caselli, Tommaso
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 198 - 204
  • [30] Offensive Language and Hate Speech Detection for Danish
    Sigurbergsson, Gudbjartur Ingi
    Derczynski, Leon
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3498 - 3508