An Annotated Corpus of Direct Speech

被引:0
|
作者
Lee, John [1 ]
Yeung, Chak Yan [1 ]
机构
[1] City Univ Hong Kong, Halliday Ctr Intelligent Applicat Language Studie, Dept Linguist & Translat, Hong Kong, Peoples R China
关键词
direct speech; coreference; corpus annotation;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
We propose a scheme for annotating direct speech in literary texts, based on the Text Encoding Initiative (TEI) and the coreference annotation guidelines from the Message Understanding Conference (MUC). The scheme encodes the speakers and listeners of utterances in a text, as well as the quotative verbs that reports the utterances. We measure inter-annotator agreement on this annotation task. We then present statistics on a manually annotated corpus that consists of books from the New Testament. Finally, we visualize the corpus as a conversational network.
引用
收藏
页码:1059 / 1063
页数:5
相关论文
共 50 条
  • [41] Polish Corpus of Annotated Descriptions of Images
    Wroblewska, Alina
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2141 - 2146
  • [42] Corpus Linguistics and Linguistically Annotated Corpora
    Rodriguez-Fuentes, Rodrigo A.
    LANGUAGE LEARNING & TECHNOLOGY, 2015, 19 (03): : 56 - 60
  • [43] TimeBankPT: A TimeML Annotated Corpus of Portuguese
    Costa, Francisco
    Branco, Antonio
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3727 - 3734
  • [44] FactBank: a corpus annotated with event factuality
    Roser Saurí
    James Pustejovsky
    Language Resources and Evaluation, 2009, 43
  • [45] A Morphologically Annotated Corpus of Emirati Arabic
    Khalifa, Salam
    Habash, Nizar
    Eryani, Fadhl
    Obeid, Ossama
    Abdulrahim, Dana
    Al Kaabi, Meera
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3839 - 3846
  • [46] A semantically annotated corpus of tombstone inscriptions
    Johan Bos
    International Journal of Digital Humanities, 2022, 3 (1-3) : 1 - 33
  • [47] A Manually Annotated Corpus of Pharmaceutical Patents
    Kiss, Marton
    Nagy, Agoston
    Vincze, Veronika
    Almasi, Attila
    Alexin, Zoltan
    Csirik, Janos
    TEXT, SPEECH AND DIALOGUE, TSD 2012, 2012, 7499 : 135 - 142
  • [48] ROMBAC: The Romanian Balanced Annotated Corpus
    Ion, Radu
    Irimia, Elena
    Stefanescu, Dan
    Tufis, Dan
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 339 - 344
  • [49] Annotated Corpus of Polish Spoken Dialogues
    Mykowiecka, Agnieszka
    Marasek, Krzysztof
    Marciniak, Malgorzata
    Rabiega-Wisniewska, Joanna
    Gubrynowicz, Ryszard
    HUMAN LANGUAGE TECHNOLOGY: CHALLENGES OF THE INFORMATION SOCIETY, 2009, 5603 : 50 - +
  • [50] NoNiRes: A Catalan corpus annotated with negation
    Tana Velasco, Laura
    Nofre Maiz, Montserrat
    Calvo Figueras, Blanca
    Armentano-Oller, Carme
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2023, (71): : 39 - 51