An Annotated Corpus of Direct Speech

被引:0
|
作者
Lee, John [1 ]
Yeung, Chak Yan [1 ]
机构
[1] City Univ Hong Kong, Halliday Ctr Intelligent Applicat Language Studie, Dept Linguist & Translat, Hong Kong, Peoples R China
关键词
direct speech; coreference; corpus annotation;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
We propose a scheme for annotating direct speech in literary texts, based on the Text Encoding Initiative (TEI) and the coreference annotation guidelines from the Message Understanding Conference (MUC). The scheme encodes the speakers and listeners of utterances in a text, as well as the quotative verbs that reports the utterances. We measure inter-annotator agreement on this annotation task. We then present statistics on a manually annotated corpus that consists of books from the New Testament. Finally, we visualize the corpus as a conversational network.
引用
收藏
页码:1059 / 1063
页数:5
相关论文
共 50 条
  • [1] A Fully Annotated Corpus of Russian Speech
    Skrelin, Pavel
    Volskaya, Nina
    Kocharov, Daniil
    Evgrafova, Karina
    Glotova, Olga
    Evdokimova, Vera
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 109 - 112
  • [2] A Danish phonetically annotated spontaneous speech corpus (DanPASS)
    Gronnum, Nina
    SPEECH COMMUNICATION, 2009, 51 (07) : 594 - 603
  • [3] RUNDKAST: An Annotated Norwegian Broadcast News Speech Corpus
    Amdal, Ingunn
    Strand, Ole Morten
    Almberg, Jorn
    Svendsen, Torbjorn
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1907 - 1913
  • [4] CORPUS DESIGN AND DEVELOPMENT OF AN ANNOTATED SPEECH DATABASE FOR PUNJABI
    Bansal, Shweta
    Sharan, Shambhu
    Agrawal, S. S.
    2015 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2015 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2015, : 32 - 37
  • [5] CORILGA: a Galician Multilevel Annotated Speech Corpus for Linguistic Analysis
    Garcia-Mateo, Carmen
    Cardenal, Antonio
    Luis Regueira, Xose
    Fernandez Rei, Elisa
    Martinez, Marta
    Seara, Roberto
    Varela, Rocio
    Basanta, Noemi
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2653 - 2657
  • [6] Designing an Annotated Longitudinal Latvian Children's Speech Corpus
    Auzina, Ilze
    Levane-Petrova, Kristine
    Rabante-Busa, Guna
    Dargis, Roberts
    Fabregas, Antonio
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2016, 289 : 46 - 50
  • [7] QASR: QCRI aljazeera speech resource a large scale annotated Arabic speech corpus
    Mubarak, Hamdy
    Hussein, Amir
    Chowdhury, Shammur Absar
    Ali, Ahmed
    ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference, 2021, : 2274 - 2285
  • [8] QASR: QCRI Aljazeera Speech Resource A Large Scale Annotated Arabic Speech Corpus
    Mubarak, Hamdy
    Hussein, Amir
    Chowdhury, Shammur Absar
    Ali, Ahmed
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2274 - 2285
  • [9] STRESS ANNOTATED URDU SPEECH CORPUS TO BUILD FEMALE VOICE FOR TTS
    Mumtaz, Benazir
    Urooj, Saba
    Hussain, Sarmad
    Habib, Wajiha
    2015 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2015 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2015, : 13 - 20
  • [10] Developing a corpus of clinical notes manually annotated for part-of-speech
    Pakhomov, Serguei V.
    Coden, Anni
    Chute, Christopher G.
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2006, 75 (06) : 418 - 429