A Dataset of Offensive German Language Tweets Annotated for Speech Acts

被引:0
|
作者
Plakidis, Melina [1 ,2 ]
Rehm, Georg [1 ,2 ]
机构
[1] DFKI GmbH, Alt Moabit 91C, D-10559 Berlin, Germany
[2] Humboldt Univ, Dorotheenstr 24, D-10117 Berlin, Germany
关键词
Speech acts; hate speech detection; offensive language; annotation; corpus annotation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present a dataset consisting of German offensive and non-offensive tweets, annotated for speech acts. These 600 tweets are a subset of the dataset by (Stru beta et al., 2019) and comprises three levels of annotation, i. e., six coarse-grained speech acts, 23 fine-grained speech acts and 14 different sentence types. Furthermore, we provide an evaluation in both qualitative and quantitative terms. The dataset is made publicly available under a CC-BY-4.0 license.
引用
收藏
页码:4799 / 4807
页数:9
相关论文
共 50 条
  • [1] A Dataset for Investigating the Impact of Context for Offensive Language Detection in Tweets
    Ihtiyar, Musa Nuri
    Ozdemir, Omer
    Erengul, Mustafa Emre
    Ozgur, Arzucan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 1543 - 1549
  • [2] CoRoSeOf - An Annotated Corpus of Romanian Sexist and Offensive Tweets
    Hoefels, Diana Constantina
    Coltekin, Cagri
    Madroane, Irina Diana
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2269 - 2281
  • [3] Annotated dataset of history-related tweets
    Sumikawa, Yasunobu
    Jatowt, Adam
    DATA IN BRIEF, 2021, 38
  • [4] SOLD: Sinhala offensive language dataset
    Ranasinghe, Tharindu
    Anuradha, Isuri
    Premasiri, Damith
    Silva, Kanishka
    Hettiarachchi, Hansi
    Uyangodage, Lasitha
    Zampieri, Marcos
    LANGUAGE RESOURCES AND EVALUATION, 2025, 59 (01) : 297 - 337
  • [5] CLASSIFICATION OF SPEECH ACTS WITH FUTURE SEMANTICS (IN THE GERMAN LANGUAGE)
    Bodnaruk, E. V.
    VESTNIK SANKT-PETERBURGSKOGO UNIVERSITETA-YAZYK I LITERATURA, 2015, (02): : 62 - 75
  • [6] A Dataset of Offensive Language in Kosovo Social Media
    Ajvazi, Adem
    Hardmeier, Christian
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1860 - 1869
  • [7] HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection
    Vargas, Francielle
    Carvalho, Isabelle
    Goes, Fabiana
    Pardo, Thiago A. S.
    Benevenuto, Fabricio
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7174 - 7183
  • [8] LANGUAGE, SPEECH, AND SPEECH-ACTS
    MACKINNO.E
    PHILOSOPHY AND PHENOMENOLOGICAL RESEARCH, 1973, 34 (02) : 224 - 238
  • [9] LASTD: A Manually Annotated and Tested Large Arabic Sentiment Tweets Dataset
    Elshakankery, Kariman
    Fayek, Magda
    Farouk, Mona
    5TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND DATA MINING (ICISDM 2021), 2021, : 62 - 66
  • [10] From Insult to Hate Speech: Mapping Offensive Language in German User Comments on Immigration
    Paasch-Colberg, Sunje
    Strippel, Christian
    Trebbe, Joachim
    Emmer, Martin
    MEDIA AND COMMUNICATION, 2021, 9 (01): : 171 - 180