Challenges and Solutions for Consistent Annotation of Vietnamese Treebank

被引:0
|
作者
Nguyen, Quy T. [1 ,2 ]
Miyao, Yusuke [1 ,2 ]
Le, Ha T. T. [3 ]
Nguyen, Ngan L. T. [4 ]
机构
[1] Grad Univ Adv Studies, Hayama, Kanagawa, Japan
[2] Natl Inst Informat, Tokyo, Japan
[3] Univ Social Sci & Humanities, Warsaw, Poland
[4] Univ Informat Technol, Ho Chi Minh City, Vietnam
关键词
Vietnamese Treebank; Consistent Annotation; Challenges and Solutions;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Treebanks are important resources for research in natural language processing, speech recognition, theoretical linguistics, etc. To strengthen the automatic processing of the Vietnamese language, a Vietnamese treebank has been built. However, the quality of this treebank is not satisfactory and is a possible source for the low performance of Vietnamese language processing. We have been building a new treebank for Vietnamese with about 40,000 sentences annotated with three layers: word segmentation, part-of-speech tagging, and bracketing. In this paper, we describe several challenges of Vietnamese language and how we solve them in developing annotation guidelines. We also present our methods to improve the quality of the annotation guidelines and ensure annotation accuracy and consistency. Experiment results show that inter-annotator agreement ratios and accuracy are higher than 90% which is satisfactory.
引用
收藏
页码:1532 / 1539
页数:8
相关论文
共 50 条
  • [21] Projection-based Annotation of a Polish Dependency Treebank
    Wroblewska, Alina
    Przepiorkowski, Adam
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2306 - 2312
  • [22] Building Vietnamese Dependency Treebank Based on Chinese-Vietnamese Bilingual Word Alignment
    Li, Ying
    Guo, Jianyi
    Yu, Zhengtao
    Wang, Hongbin
    Wen, Yonghua
    2016 12TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2016, : 1330 - 1335
  • [23] Syntactic Annotation Guidelines for the Quranic Arabic Dependency Treebank
    Dukes, Kais
    Atwell, Eric
    Sharaf, Abdul-Baquee M.
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 1822 - 1827
  • [24] Prague Dependency Treebank Annotation Errors A Preliminary Analysis
    Kovar, Vojtech
    Jakubicek, Milos
    RASLAN 2009: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING, 2009, : 101 - 108
  • [25] A dependency-based analysis of treebank annotation errors
    Haverinen, Katri
    Ginter, Filip
    Laippala, Veronika
    Kohonen, Samuel
    Viljanen, Timo
    Nyblom, Jenna
    Salakoski, Tapio
    1600, IOS Press BV (258): : 47 - 61
  • [26] The Procedure of Lexico-Semantic Annotation of Skladnica Treebank
    Hajnicz, Elzbieta
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2290 - 2297
  • [27] Dependency structure annotation in the IULA Spanish LSP Treebank
    Montserrat Marimon
    Núria Bel
    Language Resources and Evaluation, 2015, 49 : 433 - 454
  • [28] Dependency structure annotation in the IULA Spanish LSP Treebank
    Marimon, Montserrat
    Bel, Nuria
    LANGUAGE RESOURCES AND EVALUATION, 2015, 49 (02) : 433 - 454
  • [29] Vietnamese treebank construction and entropy-based error detection
    Phuong-Thai Nguyen
    Anh-Cuong Le
    Tu-Bao Ho
    Van-Hiep Nguyen
    Language Resources and Evaluation, 2015, 49 : 487 - 519
  • [30] Vietnamese treebank construction and entropy-based error detection
    Phuong-Thai Nguyen
    Anh-Cuong Le
    Tu-Bao Ho
    Van-Hiep Nguyen
    LANGUAGE RESOURCES AND EVALUATION, 2015, 49 (03) : 487 - 519