Building The Sense-Tagged Multilingual Parallel Corpus

被引:0
|
作者
Wang, Shan [1 ]
Bond, Francis [1 ]
机构
[1] Nanyang Technol Univ, Div Linguist & Multilingual Studies, Singapore, Singapore
来源
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2014年
关键词
sense-tagging; multilingual corpus; parallel corpus;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Sense-annotated parallel corpora play a crucial role in natural language processing. This paper introduces our progress in creating such a corpus for Asian languages using English as a pivot, which is the first such corpus for these languages (Chinese, Japanese and Indonesian). Two sets of tools have been developed for sequential and targeted tagging, which are also easy to be set up for any new languages. This paper also briefly presents the general guidelines for doing this project. The current results of the monolingual sense-tagging and multilingual linking are illustrated, which indicate the differences among genres and language pairs. All the tools, guidelines and the manually annotated corpus will be freely available at http://compling.ntu.edu.sg/ntumc.
引用
收藏
页码:2403 / 2409
页数:7
相关论文
共 50 条
  • [21] A Corpus for Evaluating Semantic Multilingual Web Retrieval Systems: The Sense Folder Corpus
    De Luca, Ernesto William
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3475 - 3480
  • [22] Towards Automatic Acquisition of a Fully Sense Tagged Corpus for Persian
    Sarrafzadeh, Bahareh
    Yakovets, Nikolay
    Cercone, Nick
    An, Aijun
    FOUNDATIONS OF INTELLIGENT SYSTEMS, 2011, 6804 : 449 - 455
  • [23] Wikicorpus: A Word-Sense Disambiguated Multilingual Wikipedia Corpus
    Reese, Samuel
    Boleda, Gemma
    Cuadros, Montse
    Padro, Lluis
    Rigau, German
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1418 - 1421
  • [24] Unsupervised Multilingual Sentence Embeddings for Parallel Corpus Mining
    Kvapilikova, Ivana
    Artetxe, Mikel
    Labaka, Gorka
    Agirre, Eneko
    Bojar, Ondrej
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): STUDENT RESEARCH WORKSHOP, 2020, : 255 - 262
  • [25] Interference and the Translation of Phraseological Units in a Parallel and Multilingual Corpus
    Sanz-Villar, Zurine
    META, 2018, 63 (01) : 72 - 93
  • [26] Comparing two acquisition systems for automatically building an English-Croatian parallel corpus from multilingual websites
    Espla-Gomis, Miquel
    Klubicka, Filip
    Ljubesic, Nikola
    Ortiz-Rojas, Sergio
    Papavassiliou, Vassilis
    Prokopidis, Prokopis
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 1252 - 1258
  • [27] ANNOTATION OF COMPLEX NOUN PHRASES FROM MULTILINGUAL PARALLEL CORPUS
    Cao, Jingxiang
    Huang, Degen
    2012 IEEE 2ND INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENT SYSTEMS (CCIS) VOLS 1-3, 2012, : 1440 - 1444
  • [28] Building a Thai part-of-speech tagged corpus (ORCHID)
    Sornlertlamvanich, Virach
    Takahashi, Naoto
    Isahara, Hitoshi
    Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi), 1999, 20 (03): : 189 - 198
  • [29] A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation
    Avramidis, Eleftherios
    Costa-Jussa, Marta R.
    Federmann, Christian
    Melero, Maite
    Pecina, Pavel
    van Genabith, Josef
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2189 - 2193
  • [30] Building a Part-of-Speech Tagged Corpus for Drenjongke (Bhutia)
    Ashida, Mana
    Lee, Seunghun J.
    Namgyal, Kunzang
    AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 57 - 63