A Dataset of Offensive German Language Tweets Annotated for Speech Acts

被引:0
|
作者
Plakidis, Melina [1 ,2 ]
Rehm, Georg [1 ,2 ]
机构
[1] DFKI GmbH, Alt Moabit 91C, D-10559 Berlin, Germany
[2] Humboldt Univ, Dorotheenstr 24, D-10117 Berlin, Germany
关键词
Speech acts; hate speech detection; offensive language; annotation; corpus annotation;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We present a dataset consisting of German offensive and non-offensive tweets, annotated for speech acts. These 600 tweets are a subset of the dataset by (Stru beta et al., 2019) and comprises three levels of annotation, i. e., six coarse-grained speech acts, 23 fine-grained speech acts and 14 different sentence types. Furthermore, we provide an evaluation in both qualitative and quantitative terms. The dataset is made publicly available under a CC-BY-4.0 license.
引用
收藏
页码:4799 / 4807
页数:9
相关论文
共 50 条
  • [41] Expert-Annotated Dataset to Study Cyberbullying in Polish Language
    Ptaszynski, Michal
    Pieciukiewicz, Agata
    Dybala, Pawel
    Skrzek, Pawel
    Soliwoda, Kamil
    Fortuna, Marcin
    Leliwa, Gniewosz
    Wroczynski, Michal
    DATA, 2024, 9 (01)
  • [42] Language and therapeutic change:: A speech acts analysis
    Reyes, Lucia
    Aristegui, Roberto
    Krause, Mariane
    Strasser, Katherine
    Tomicic, Alemka
    Valdes, Nelson
    Altimir, Carolina
    Ramirez, Ivonne
    De La Parra, Guillermo
    Dagnino, Paula
    Echavarri, Orietta
    Vilches, Oriana
    Ben-Dov, Perla
    PSYCHOTHERAPY RESEARCH, 2008, 18 (03) : 355 - 362
  • [43] Tonsawang Language Speech Acts in Traditional Medicine
    Rorong, Ferdy Dj
    Lensun, Sherly
    Sompotan, Amelia Gladys
    Pandi, Helena
    Sambeka, Fince Leny
    Aror, Susanti
    PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON SOCIAL SCIENCES (ICSS 2018), 2018, 226 : 903 - 907
  • [44] LINGUISTIC SIGNS AND SPEECH ACTS - GERMAN - NEHRING,A
    BUYSSENS, E
    PHONETICA, 1965, 12 (02) : 122 - 124
  • [45] LINGUISTIC SIGNS AND SPEECH ACTS - GERMAN - NEHRING,A
    ANTAL, L
    LINGUISTICS, 1965, (14) : 76 - 89
  • [46] A Multi-Platform Arabic News Comment Dataset for Offensive Language Detection
    Chowdhury, Shammur A.
    Mubarak, Hamdy
    Abdelali, Ahmed
    Jung, Soon-gyo
    Jansen, Bernard J.
    Salminen, Joni
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6203 - 6212
  • [47] Overview of the HASOC Track at FIRE 2020: Hate Speech and Offensive Language Identification in Tamil, Malayalam, Hindi, English and German
    Mandl, Thomas
    Modha, Sandip
    Kumar, Anand M.
    Chakravarthi, Bharathi Raja
    PROCEEDINGS OF THE 12TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2020), 2020, : 29 - 32
  • [48] On the Impact ofWord Representation in Hate Speech and Offensive Language Detection and Explanation
    Hu, Ruijia
    Dorris, Wyatt
    Vishwamitra, Nishant
    Luo, Feng
    Costello, Matthew
    PROCEEDINGS OF THE TENTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, CODASPY 2020, 2020, : 171 - 173
  • [49] An Automatic Approach for the Identification of Offensive Language in Perso-Arabic Urdu Language: Dataset Creation and Evaluation
    Din, Salah Ud
    Khusro, Shah
    Khan, Farman Ali
    Ahmad, Munir
    Ali, Oualid
    Ghazal, Taher M.
    IEEE ACCESS, 2025, 13 : 19755 - 19769
  • [50] "Archiving the Haystack": Archiving Initiative for German-language Tweets
    Schlesinger, Claus-michael
    Woldering, Britta
    ZEITSCHRIFT FUR BIBLIOTHEKSWESEN UND BIBLIOGRAPHIE, 2024, 71 (04): : 236 - 242