Chinese Word Segmentation Method on the Basis of Bidirectional Long-Short Term Memory Model

被引:3
|
作者
Zhang H.-G. [1 ]
Li H. [1 ]
机构
[1] School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing
来源
| 2017年 / South China University of Technology卷 / 45期
关键词
Chinese word segmentation; Deep leaning; Long-short term memory; Neural network;
D O I
10.3969/j.issn.1000-565X.2017.03.009
中图分类号
学科分类号
摘要
Chinese word segmentation is one of the fundamental technologies of Chinese natural language processing. At present, most conventional Chinese word segmentation methods rely on feature engineering, which requires intensive labor to verify the effectiveness. With the rapid development of deep learning, it becomes realistic to learn features automatically by using neural network. In this paper, on the basis of bidirectional long short-term memory (BLSTM) model, a novel Chinese word segmentation method is proposed. In this method, Chinese characters are represented into embedding vectors from a large-scale corpus, and then the vectors are applied to BLSTM model for segmentation. It is found from the experiments without feature engineering that the proposed method is of high performance in Chinese word segmentation on simplified Chinese datasets(PKU, MSRA and CTB) and traditional Chinese dataset(HKCityU). © 2017, Editorial Department, Journal of South China University of Technology. All right reserved.
引用
收藏
页码:61 / 67
页数:6
相关论文
共 23 条
  • [1] Xue N., Chinese word segmentation as character tagging, Computational Linguistics Chinese Language Processing, 8, 1, pp. 29-48, (2003)
  • [2] Liu Q., Zhang H.-P., Yu H.-K., Et al., Chinese lexical analysis using cascaded hidden Markov model, Journal of Computer Research and Development, 41, 8, pp. 1421-1429, (2004)
  • [3] Peng F., Feng F., Mccallum A., Chinese segmentation and new word detection using conditional random fields, Proceedings of the 20th International Conference on Computational Linguistic, pp. 562-568, (2004)
  • [4] Tang B., Wang X., Wang X., Chinese word segmentation based on large margin methods, International Journal on Asian Language Processing, 19, 2, pp. 55-68, (2009)
  • [5] Zhao H., Li M., Lu B., Et al., Effective tag set selection in Chinese word segmentation via conditional random field modeling, Proceedings of the 20th Pacific Asia Conference on Language Information and Computation, pp. 87-94, (2006)
  • [6] Zhao H., Integrating unsupervised and supervised word segmentation: the role of goodness measures, Information Sciences, 181, 1, pp. 163-183, (2011)
  • [7] Collobert R., Weston J., A unified architecture for natural language processing: deep neural networks with multitask learning, Proceedings of the 25th International Conference on Machine Learning, pp. 160-167, (2008)
  • [8] Zheng X., Chen H., Xu T., Deep learning for Chinese word segmentation and POS tagging, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 647-657, (2013)
  • [9] Chen X., Qiu X., Zhu C., Et al., Gated recursive neural network for Chinese word segmentation, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pp. 567-572, (2015)
  • [10] Collobert R., Weston J., Bottou L., Et al., Natural language processing (almost) from scratch, Journal of Machine Learning Research, 12, 1, pp. 2493-2537, (2011)