Building The Sense-Tagged Multilingual Parallel Corpus

被引:0
|
作者
Wang, Shan [1 ]
Bond, Francis [1 ]
机构
[1] Nanyang Technol Univ, Div Linguist & Multilingual Studies, Singapore, Singapore
来源
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年
关键词
sense-tagging; multilingual corpus; parallel corpus;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Sense-annotated parallel corpora play a crucial role in natural language processing. This paper introduces our progress in creating such a corpus for Asian languages using English as a pivot, which is the first such corpus for these languages (Chinese, Japanese and Indonesian). Two sets of tools have been developed for sequential and targeted tagging, which are also easy to be set up for any new languages. This paper also briefly presents the general guidelines for doing this project. The current results of the monolingual sense-tagging and multilingual linking are illustrated, which indicate the differences among genres and language pairs. All the tools, guidelines and the manually annotated corpus will be freely available at http://compling.ntu.edu.sg/ntumc.
引用
收藏
页码:2403 / 2409
页数:7
相关论文
共 50 条
  • [31] Building sense tagged corpora with volunteer contributions over the Web
    Mihalcea, R
    Chklovski, T
    RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING III, 2004, 260 : 357 - 366
  • [32] 4FX: Light Verb Constructions in a Multilingual Parallel Corpus
    Racz, Anita
    Nagy, Istvan T.
    Vincze, Veronika
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 710 - 715
  • [33] Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings
    Artetxe, Mikel
    Schwenk, Holger
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3197 - 3203
  • [34] Development of a multilingual parallel corpus and a part-of-speech tagger for Afrikaans
    Trushkina, Julia
    Intelligent Information Processing III, 2006, 228 : 453 - 462
  • [35] Building A Parallel Corpus with Bilingual Discourse Alignment
    Feng, Wenhe
    Ren, Han
    Li, Xia
    Guo, Haifang
    CHINESE LEXICAL SEMANTICS, CLSW 2017, 2018, 10709 : 374 - 382
  • [36] Building the Macedonian-Croatian Parallel Corpus
    Cebovic, Ines
    Tadic, Marko
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4241 - 4244
  • [37] Building the Spanish-Croatian Parallel Corpus
    Mikelenic, Bojana
    Tadic, Marko
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 3932 - 3936
  • [38] Exploiting Parallel Corpus for Automatic Extraction of Multilingual Names: Transliteration Perspective
    Kundu, Bibekananda
    Choudhury, Sanjay Kumar
    2012 ANNUAL IEEE INDIA CONFERENCE (INDICON), 2012, : 608 - 612
  • [39] Towards the linguistic approach to ideasthesia (case study of the multilingual parallel corpus)
    Iaroshenko, Polina, V
    VESTNIK SANKT-PETERBURGSKOGO UNIVERSITETA-YAZYK I LITERATURA, 2023, 20 (01): : 156 - 169
  • [40] ISO-based Annotated Multilingual Parallel Corpus for Discourse Markers
    Silvano, Purificacao
    Damova, Mariana
    Oleskeviciene, Giedre Valunaite
    Liebeskind, Chaya
    Chiarcos, Christian
    Trajanov, Dimitar
    Truica, Ciprian-Octavian
    Apostol, Elena-Simona
    Baczkowska, Anna
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2739 - 2749