A Comparative Study of Hidden Markov Model and Conditional Random Fields on a Yoruba Part-of-Speech Tagging Task

被引:0
作者
Ayogu, Ikechukwu I. [1 ]
Adetunmbi, Adebayo O. [1 ]
Ojokoh, Bolanle A. [1 ]
Oluwadare, Samuel A. [1 ]
机构
[1] Fed Univ Technol Akure, Dept Comp Sci, Akure, Ondo State, Nigeria
来源
PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTING NETWORKING AND INFORMATICS (ICCNI 2017) | 2017年
关键词
Yoruba language; Part-of-speech tagging; Features; Bigram HMM; linear-chain CRF;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Parts-of-speech tagging, the predictive sequential labeling of words in a sentence, given a context, is a challenging problem both because of ambiguity and the infinite nature of natural language vocabulary. Unlike English and most European languages, Yoruba a language has no publicly available part-ofspeech tagging tool. In this paper, we present the achievements of variants of a bigram hidden Markov model (HMM) as compared to the achievement of a linear-chain conditional random fields (CRF) on a Yoruba a part-of-speech tagging task. We have investigated the likely improvements due to using smoothing techniques and morphological affixes on the HMM-based models. For the CRF model, we defined feature functions to capture similar contexts available to the HMM-based models. Both kinds of models were trained and evaluated on the same data set. Experimental results show that the performance of the two kinds of models are encouraging with the CRF model being able to recognize more out-of-vocabulary (OOV) words than the best HMM model by a margin of 3.05 %. While the overall accuracy of the best HMM-based model is 83.62 %, that of CRF is 84.66 %. Although CRF model gives marginal superior performance, both HMM and CRF modeling approaches are clearly promising, given their OOV words recognition rates.
引用
收藏
页数:6
相关论文
共 38 条
  • [1] Albared M, 2010, LECT NOTES ARTIF INT, V6401, P361, DOI 10.1007/978-3-642-16248-0_52
  • [2] Ali B.B., 2013, International Journal on Natural Language Computing, V2, P1
  • [3] [Anonymous], [No title captured]
  • [4] [Anonymous], 1993, COMPUT LINGUIST, DOI DOI 10.21236/ADA273556
  • [5] [Anonymous], 2001, P INT C MACH LEARN
  • [6] [Anonymous], 3P HLT NAACL
  • [7] Ayogu I. I., 2017, P 5 ANN C SCH SCI
  • [8] Brants, 2000, P 6 APPL NAT LANG PR
  • [9] Brill E, 1995, COMPUT LINGUIST, V21, P543
  • [10] Can B., 2009, CLEF WORKING NOTES