Swedish-Turkish Parallel Treebank

被引:0
|
作者
Megyesi, Beata [1 ]
Dahlqvist, Bengt [1 ]
Pettersson, Eva [1 ]
Nivre, Joakim [1 ]
机构
[1] Uppsala Univ, Dept Linguist & Philol, Uppsala, Sweden
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
In this paper, we describe our work on building a parallel treebank for a less studied and typologically dissimilar language pair, namely Swedish and Turkish. The treebank is a balanced syntactically annotated corpus containing both fiction and technical documents. In total, it consists of approximately 160,000 tokens in Swedish and 145,000 in Turkish. The texts are linguistically annotated using different layers from part of speech tags and morphological features to dependency annotation. Each layer is automatically processed by using basic language resources for the involved languages. The sentences and words are aligned, and partly manually corrected. We create the treebank by reusing and adjusting existing tools for the automatic annotation, alignment, and their correction and visualization. The treebank was developed within the project Supporting research environment for minor languages aiming at to create representative language resources for language pairs dissimilar in language structure. Therefore, efforts are put on developing a general method for formatting and annotation procedure, as well as using tools that can be applied to other language pairs easily.
引用
收藏
页码:470 / 473
页数:4
相关论文
共 50 条
  • [1] The English-Swedish-Turkish Parallel Treebank
    Megyesi, Beata
    Dahlqvist, Bengt
    Csato, Eva A.
    Nivre, Joakim
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3393 - 3397
  • [2] Constructing a Turkish-English Parallel TreeBank
    Yildiz, Olcay Taner
    Solak, Ercan
    Gorgun, Onur
    Ehsani, Razieh
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 112 - 117
  • [3] Evaluating the English-Turkish parallel treebank for machine translation
    Gorgun, Onur
    Yildiz, Olcay Taner
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2022, 30 (01) : 184 - 199
  • [4] A Gold Standard Dependency Treebank for Turkish
    Kayadelen, Tolga
    Ozturel, Adnan
    Bohnet, Bernd
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5156 - 5163
  • [5] Constructing a Turkish Constituency Parse TreeBank
    Yildiz, Olcay Taner
    Solak, Ercan
    Candir, Semsinur
    Ehsani, Razieh
    Gorgun, Onur
    INFORMATION SCIENCES AND SYSTEMS 2015, 2016, 363 : 339 - 347
  • [6] Exploiting catenae in a parallel treebank alignment
    Sanguinetti, Manuela
    Bosco, Cristina
    Cupi, Loredana
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1824 - 1831
  • [7] Two languages, one treebank: building a Turkish–German code-switching treebank and its challenges
    Özlem Çetinoğlu
    Çağrı Çöltekin
    Language Resources and Evaluation, 2023, 57 : 545 - 579
  • [8] Two languages, one treebank: building a Turkish-German code-switching treebank and its challenges
    Cetinoglu, Oezlem
    Coeltekin, Cagri
    LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (02) : 545 - 579
  • [9] The Parallel-TUT: a multilingual and multiformat treebank
    Bosco, Cristina
    Sanguinetti, Manuela
    Lesmo, Leonardo
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1932 - 1938
  • [10] Converting an HPSG-based Treebank into its Parallel Dependency-based Treebank
    Ghayoomi, Masood
    Kuhn, Jonas
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 802 - 809