Detection of Hate Speech Texts Using Machine Learning Algorithm

被引:6
|
作者
Sanoussi, Mahamat Saleh Adoum [1 ]
Chen Xiaohua [1 ]
Agordzo, George K. [2 ]
Guindo, Mahamed Lamine [3 ]
Al Omari, Abdullah Mma [1 ]
Issa, Boukhari Mahamat [4 ]
机构
[1] Huzhou Univ, Sch Informat Engn, Huzhou, Zhejiang, Peoples R China
[2] Anhui Univ Sci & Technol, Sch Math & Big Data, Hefei, Anhui, Peoples R China
[3] Zhejiang Univ, Coll Biosyst Engn, Hangzhou, Peoples R China
[4] Abeche Inst Sci & Technol, Dept Elect Engn, Abeche, Chad
关键词
hate speech; natural language processing; social media; text classification; word embedding;
D O I
10.1109/CCWC54503.2022.9720792
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Identifying hate speech on social media has become increasingly crucial for society. It has been shown that cyberbullying significantly affects the social tranquillity of the Chadian population, mainly in places of conflict. This article aims to detect hate speech for texts written in "lingua franca", a mix of the local Chadian and French languages. The dataset consists of 14,000 comments extracted from the most visited Facebook pages and annotated in four categories (hate, offence, insult and neutral) were used for this study. The data were cleaned by Natural Language Processing techniques (NLP) and applied to three word embedding methods such as Word2Vec, Doc2Vec, and Fasttext. Finally, four Machine Learning methods, namely Logistic Regression (LR), Support Vector Machine (SVM), Random Forest (RF), and K-Nearest Neighbours (KNN), were computed to classify the different categories. The result showed that FastText features representation as input to SVM classifier was the best with 95.4% accuracy for predicting the comment contained insult statement followed by hate statement 93.9%. The result demonstrated our model could be used to detect the hate speech made by Chadians on social media texts.
引用
收藏
页码:266 / 273
页数:8
相关论文
共 50 条
  • [31] A Hate Speech Detection Approach Using Transfer Learning with Multiple Idioms
    de Oliveira, Aillkeen Bezerra
    de Souza Baptista, Claudio
    Firmino, Anderson Almeida
    de Paiva, Anselmo Cardoso
    ENTERPRISE INFORMATION SYSTEMS, ICEIS 2023, PT I, 2024, 518 : 144 - 160
  • [32] Improving hate speech detection using Cross-Lingual Learning
    Firmino, Anderson Almeida
    Baptista, Claudio de Souza
    de Paiva, Anselmo Cardoso
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 235
  • [33] UHated: hate speech detection in Urdu language using transfer learning
    Muhammad Umair Arshad
    Raza Ali
    Mirza Omer Beg
    Waseem Shahzad
    Language Resources and Evaluation, 2023, 57 : 713 - 732
  • [34] UHated: hate speech detection in Urdu language using transfer learning
    Arshad, Muhammad Umair
    Ali, Raza
    Beg, Mirza Omer
    Shahzad, Waseem
    LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (02) : 713 - 732
  • [35] Deep Learning Ensembles for Hate Speech Detection
    Alsafari, Safa
    Sadaoui, Samira
    Mouhoub, Malek
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 526 - 531
  • [36] Deep Learning for Hate Speech Detection in Tweets
    Badjatiya, Pinkesh
    Gupta, Shashank
    Gupta, Manish
    Varma, Vasudeva
    WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 759 - 760
  • [37] YouTube based religious hate speech and extremism detection dataset with machine learning baselines
    Ashraf, Noman
    Rafiq, Abid
    Butt, Sabur
    Shehzad, Hafiz Muhammad Faisal
    Sidorov, Grigori
    Gelbukh, Alexander
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4769 - 4777
  • [38] Machine Learning Analysis on Hate Speech Against Asians
    Sanefuji, Gabriel Oga
    Nihama, Sandra Ayumi
    de Azevedo da Rocha, Ricardo Luis
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 183 - 194
  • [39] Improving Arabic Hate Speech Identification Using Online Machine Learning and Deep Learning Models
    Elzayady, Hossam
    Mohamed, Mohamed S.
    Badran, Khaled
    Salama, Gouda
    PROCEEDINGS OF SEVENTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, ICICT 2022, VOL. 2, 2023, 448 : 533 - 541
  • [40] A survey on hate speech detection and sentiment analysis using machine learning and deep learning models (vol 80, pg 110, 2023)
    Subramanian, Malliga
    Sathishkumar, Veerappampalayam Easwaramoorthy
    Deepalakshmi, G.
    Cho, Jaehyuk
    Manikandan, G.
    ALEXANDRIA ENGINEERING JOURNAL, 2023, 82 : 167 - 167