A Neural Joint Model with BERT for Burmese Syllable Segmentation, Word Segmentation, and POS Tagging

被引:4
|
作者
Mao, Cunli [1 ]
Man, Zhibo [1 ]
Yu, Zhengtao [1 ]
Gao, Shengxiang [1 ]
Wang, Zhenhan [1 ]
Wang, Hongbin [1 ]
机构
[1] Kunming Univ Sci & Technol, Key Lab Artificial Intelligence Informat Engn & A, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Burmese; word segmentation; POS tagging; joint training; BiLSTM-CRF; BERT;
D O I
10.1145/3436818
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The smallest semantic unit of the Burmese language is called the syllable. In the present study, it is intended to propose the first neural joint learning model for Burmese syllable segmentation, word segmentation, and part-of-speech (POS) tagging with the BERT. The proposed model alleviates the error propagation problem of the syllable segmentation. More specifically, it extends the neural joint model for Vietnamese word segmentation, POS tagging, and dependency parsing [28] with the pre-training method of the Burmese character, syllable, and word embedding with BiLSTM-CRF-based neural layers. In order to evaluate the performance of the proposed model, experiments are carried out on Burmese benchmark datasets, and we fine-tune the model of multilingual BERT. Obtained results show that the proposed joint model can result in an excellent performance.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] A Feature-Enriched Neural Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
    Chen, Xinchi
    Qiu, Xipeng
    Huang, Xuanjing
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3960 - 3966
  • [22] Word Segmentation for Burmese (Myanmar)
    Ding, Chenchen
    Thu, Ye Kyaw
    Utiyama, Masao
    Sumita, Eiichiro
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2016, 15 (04)
  • [23] Simple semi-supervised learning for chinese word segmentation and pos tagging
    Li, Xinxin
    Wang, Xuan
    Waqas, Muhammad
    Harbin, Anwar
    Information Technology Journal, 2013, 12 (20) : 5955 - 5961
  • [24] Tibetan Word Segmentation as Sub-syllable Tagging with Syllable's Part-of-Speech Property
    Liu, Huidan
    Long, Congjun
    Nuo, Minghua
    Wu, Jian
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA (CCL 2015), 2015, 9427 : 189 - 201
  • [25] Joint Segmentation and POS Tagging for Arabic Using a CRF-based Classifier
    Gahbiche-Braham, Souhir
    Bonneau-Maynard, Helene
    Lavergne, Thomas
    Yvon, Francois
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 2107 - 2113
  • [26] Thai Personal Named Entity Extraction without using Word Segmentation or POS Tagging
    Sutheebanjard, P.
    Premchaiswadi, W.
    2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 221 - 226
  • [27] Research on the Method and System of Word Segmentation and POS Tagging for Ancient Chinese Medicine Literature
    Fu, Xianjun
    Yuan, Ting
    Li, Xuebo
    Wang, Zhenguo
    Zhou, Yang
    Ju, Fangning
    Li, Jintong
    Chen, Xiaokang
    Sang Xiaoming
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2493 - 2498
  • [28] Supervised Urdu Word Segmentation model Based on POS Information
    Khan, Sadiq Nawaz
    Khan, Khairullah
    Khan, Wahab
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2018, 5 (19):
  • [29] Burmese Word Segmentation with Character Clustering and CRFs
    Phyu, Myat Lay
    Hashimoto, Kiyota
    PROCEEDINGS OF 2017 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE), 2017,
  • [30] Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging
    Na, Seung-Hoon
    Kim, Young-Kil
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 512 - 522