A Neural Joint Model with BERT for Burmese Syllable Segmentation, Word Segmentation, and POS Tagging

被引:4
|
作者
Mao, Cunli [1 ]
Man, Zhibo [1 ]
Yu, Zhengtao [1 ]
Gao, Shengxiang [1 ]
Wang, Zhenhan [1 ]
Wang, Hongbin [1 ]
机构
[1] Kunming Univ Sci & Technol, Key Lab Artificial Intelligence Informat Engn & A, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Burmese; word segmentation; POS tagging; joint training; BiLSTM-CRF; BERT;
D O I
10.1145/3436818
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The smallest semantic unit of the Burmese language is called the syllable. In the present study, it is intended to propose the first neural joint learning model for Burmese syllable segmentation, word segmentation, and part-of-speech (POS) tagging with the BERT. The proposed model alleviates the error propagation problem of the syllable segmentation. More specifically, it extends the neural joint model for Vietnamese word segmentation, POS tagging, and dependency parsing [28] with the pre-training method of the Burmese character, syllable, and word embedding with BiLSTM-CRF-based neural layers. In order to evaluate the performance of the proposed model, experiments are carried out on Burmese benchmark datasets, and we fine-tune the model of multilingual BERT. Obtained results show that the proposed joint model can result in an excellent performance.
引用
收藏
页数:23
相关论文
共 50 条
  • [41] Overview of the NLPCC 2015 Shared Task: Chinese Word Segmentation and POS Tagging for Micro-blog Texts
    Qiu, Xipeng
    Qian, Peng
    Yin, Liusong
    Wu, Shiyu
    Huang, Xuanjing
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2015, 2015, 9362 : 541 - 549
  • [42] Free as in Free Word Order: An Energy Based Model for Word Segmentation and Morphological Tagging in Sanskrit
    Krishna, Amrith
    Santral, Bishal
    Bandaru, Sasi Prasanth
    Sahu, Gaurav
    Sharma, Vishnu Dutt
    Satuluritand, Pavankumar
    Goyal, Pawan
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2550 - 2561
  • [43] Research and Implementation of Tibetan Word Segmentation Based on Syllable Methods
    Jiang, Jing
    Li, Yachao
    Jiang, Tao
    Yu, Hongzhi
    2017 INTERNATIONAL SYMPOSIUM ON APPLICATION OF MATERIALS SCIENCE AND ENERGY MATERIALS (SAMSE 2017), 2018, 322
  • [44] Word Segmentation for Burmese Based on Dual-Layer CRFs
    Zhang, Shaoning
    Mao, Cunli
    Yu, Zhengtao
    Wang, Hongbin
    Li, Zhongwei
    Zhang, Jiafu
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (01)
  • [45] Research on the model of integrating Chinese word segmentation with part-of-speech tagging
    Tong, Xiaojun
    Cui, Minggen
    Song, Guolong
    DCABES 2007 Proceedings, Vols I and II, 2007, : 1062 - 1065
  • [46] Character Tagging-Based Word Segmentation for Uyghur
    Yang, Yating
    Mi, Chenggang
    Ma, Bo
    Dong, Rui
    Wang, Lei
    Li, Xiao
    MACHINE TRANSLATION, CWMT 2014, 2014, 493 : 61 - 69
  • [47] Chinese word segmentation as POC-NLW tagging
    Chen, Bo
    He, Hui
    Guo, Jun
    Xu, Weiran
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1770 - +
  • [48] Text Word Segmentation of Livestock and Poultry Diseases Based on BERT BiLSTM CRF Model
    Yu L.
    Guo X.
    Zhao H.
    Yang C.
    Zhang J.
    Li Q.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2024, 55 (02): : 287 - 294
  • [49] GeoBERTSegmenter: Word Segmentation of Chinese Texts in the Geoscience Domain Using the Improved BERT Model
    Wei, Dongqi
    Liu, Zhihao
    Xu, Dexin
    Ma, Kai
    Tao, Liufeng
    Xie, Zhong
    Qiu, Qinjun
    Pan, Shengyong
    EARTH AND SPACE SCIENCE, 2022, 9 (10)
  • [50] Hidden Markov model for POS tagging in Word Sense Disambiguation
    Alva, Pooja
    Hegde, Vinay
    2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, : 279 - 284