EP-Poland: Building A Bilingual Parallel Corpus For Interpreting Research

被引:4
|
作者
Bartlomiejczyk, Magdalena [1 ]
Gumul, Ewa [1 ]
Korzinek, Danijel [2 ]
机构
[1] Univ Silesia, Inst Linguist, Katowice, Poland
[2] Polish Japanese Acad Informat Technol, Warsaw, Poland
来源
关键词
interpreting corpus; parallel corpus; simultaneous interpreting; political discourse; parliamentary interpreting; EUROPEAN PARLIAMENT; LEXICAL PATTERNS; ENGLISH; PREDICTORS;
D O I
10.17576/gema-2022-2201-06
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This paper reports on the process of building the EP-Poland corpus and on the first empirical applications thereof. This extensive bidirectional English-Polish corpus of original parliamentary contributions paired with professional simultaneous interpretations includes 11 European Parliament debates held between January 2016 and February 2020. The main topic of these debates is the rule of law crisis triggered by the Law and Justice government in Poland. The corpus contains over 157,000 tokens and about 20 h 45 min of recordings, counting both source and target texts. The two interpreting directions (English-Polish and Polish-English) are represented almost evenly. The annotation of the corpus completed so far includes mark-up information, POS tagging, labelling disfluency phenomena, and all forms of explicitating shifts. Manual annotation for personal deixis is in progress. An additional interesting feature is the speaker identification performed employing the X-vector method, which allowed us to identify 36 interpreters. We begin with an overview of the existing interpreting corpora. Then we proceed to explain the design features of the EP-Poland and report on two completed empirical studies analysing idiosyncratic interpreting behaviour. We conclude by outlining future development pathways and offering some remarks on corpus significance and its limitations.
引用
收藏
页码:110 / 126
页数:17
相关论文
共 50 条
  • [1] Building A Parallel Corpus with Bilingual Discourse Alignment
    Feng, Wenhe
    Ren, Han
    Li, Xia
    Guo, Haifang
    CHINESE LEXICAL SEMANTICS, CLSW 2017, 2018, 10709 : 374 - 382
  • [2] Building a parallel bilingual syntactically annotated corpus
    Curín, J
    Cmejrek, M
    Havelka, J
    Kubon, V
    NATURAL LANGUAGE PROCESSING - IJCNLP 2004, 2005, 3248 : 168 - 176
  • [4] A Translation Research for an Excerpt of “The Legend of the Condor Heroes” Based on Bilingual Parallel Corpus
    周经华
    校园英语, 2019, (46) : 246 - 246
  • [5] A bilingual parallel corpus-based perspective
    Qu, YH
    Feng, ZW
    Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 307 - 311
  • [6] Extraction of a bilingual phraseology and speciality language: Parallel corpus and comparable corpus
    Maniez, F
    META, 2001, 46 (03) : 552 - 563
  • [7] A Bilingual Lexicosemantic Network of Bread Based on a Parallel Corpus
    Derzhanski, Ivan
    Siruk, Olena
    PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE COMPUTATIONAL LINGUISTICS IN BULGARIA (CLIB '20), 2020, : 137 - 146
  • [8] A corpus for signed language interpreting research
    Wehrmeyer, Ella
    INTERPRETING, 2019, 21 (01) : 62 - 90
  • [9] FooTweets: A Bilingual Parallel Corpus of World Cup Tweets
    Sluyter-Gaethje, Henny
    Lohar, Pintu
    Afli, Haithem
    Way, Andy
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 2666 - 2670
  • [10] Building Bilingual Parallel Corpora based on Wikipedia
    Mohammadi, Mehdi
    GhasemAghaee, Nasser
    2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS: ICCEA 2010, PROCEEDINGS, VOL 2, 2010, : 264 - 268