Explainability and Hate Speech: Structured Explanations Make Social Media Moderators Faster

被引：0

作者：

Calabrese, Agostina ^{[1
,2
]}

Neves, Leonardo ^{[1
]}

Shah, Neil ^{[2
]}

Bos, Maarten W. ^{[1
]}

Ross, Bjorn ^{[2
]}

Lapata, Mirella ^{[1
]}

Barbieri, Francesco ^{[1
]}

机构：

[1] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland

[2] Snap Inc, Santa Monica, CA USA

来源：

PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS | 2024年

基金：

欧洲研究理事会; 英国工程与自然科学研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Content moderators play a key role in keeping the conversation on social media healthy. While the high volume of content they need to judge represents a bottleneck to the moderation pipeline, no studies have explored how models could support them to make faster decisions. There is, by now, a vast body of research into detecting hate speech, sometimes explicitly motivated by a desire to help improve content moderation, but published research using real content moderators is scarce. In this work we investigate the effect of explanations on the speed of real-world moderators. Our experiments show that while generic explanations do not affect their speed and are often ignored, structured explanations lower moderators' decision making time by 7.4%.

引用

页码：398 / 408

页数：11

共 50 条

[21] Lifelong Learning of Hate Speech Classification on Social Media
Qian, Jing
Wang, Hong
ElSherief, Mai
Yan, Xifeng
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 2304 - 2314
[22] Hate Speech Detection in Social Media for the Kurdish Language
Saeed, Ari M.
Ismael, Aso N.
Rasul, Danya L.
Majeed, Rayan S.
Rashid, Tarik A.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INNOVATIONS IN COMPUTING RESEARCH (ICR'22), 2022, 1431 : 253 - 260
[23] The eradication of hate speech on social media: a systematic review
Gracia-Calandin, Javier
Suarez-Montoya, Leonardo
JOURNAL OF INFORMATION COMMUNICATION & ETHICS IN SOCIETY, 2023, 21 (04): : 406 - 421
[24] Time of Your Hate: The Challenge of Time in Hate Speech Detection on Social Media
Florio, Komal
Basile, Valerio
Polignano, Marco
Basile, Pierpaolo
Patti, Viviana
APPLIED SCIENCES-BASEL, 2020, 10 (12):
[25] Transfer learning for hate speech detection in social media
Lanqin Yuan
Tianyu Wang
Gabriela Ferraro
Hanna Suominen
Marian-Andrei Rizoiu
Journal of Computational Social Science, 2023, 6 : 1081 - 1101
[26] HATE SPEECH ON SOCIAL MEDIA: FREEDOM OF EXPRESSION AT A CROSSROADS
Bueso, Laura Diez
REVISTA CATALANA DE DRET PUBLIC, 2020, (61): : 50 - 64
[27] Towards Conceptualization of "Fair Explanation": Disparate Impacts of anti-Asian Hate Speech Explanations on Content Moderators
Nguyen, Tin
Xu, Jiannan
Roy, Aayushi
Daume, Hal, III
Carpuat, Marine
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9696 - 9717
[28] Detecting Hate Speech on Social Media with Respect to Adolescent Vulnerability
Chiu, Anna
Sood, Kanika
Rincon, Ariadne
Doran, Davina
2023 IEEE 13TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE, CCWC, 2023, : 724 - 728
[29] Racism, Hate Speech, and Social Media: A Systematic Review and Critique
Matamoros-Fernandez, Ariadna
Farkas, Johan
TELEVISION & NEW MEDIA, 2021, 22 (02) : 205 - 224
[30] A curated dataset for hate speech detection on social media text
Mody, Devansh
Huang, YiDong
de Oliveira, Thiago Eustaquio Alves
DATA IN BRIEF, 2023, 46

← 1 2 3 4 5 →