Urdu part of speech tagging using conditional random fields

被引:17
|
作者
Khan, Wahab [1 ]
Daud, Ali [1 ,2 ]
Nasir, Jamal Abdul [1 ]
Amjad, Tehmina [1 ]
Arafat, Sachi [2 ]
Aljohani, Naif [2 ]
Alotaibi, Fahd S. [2 ]
机构
[1] IIU, Dept Comp Sci & Software Engn, Islamabad 44000, Pakistan
[2] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
关键词
Urdu; Part of speech (POS); Conditional random field (CRF); Support vector machine (SVM);
D O I
10.1007/s10579-018-9439-6
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Part of speech (POS) tagging, the assignment of syntactic categories for words in running text, is significant to natural language processing as a preliminary task in applications such as speech processing, information extraction, and others. Urdu language processing presents a challenge due to the dual behaviour of various Urdu POS tags in differing situations (morphosyntactic ambiguity). This paper addresses this challenge by developing a novel tagging approach using linear-chain conditional random fields (CRF). Our work is the first instance of a CRF approach for Urdu POS tagging. The proposed model employs a strong, stable and balanced language-independent as well as language dependent feature set. The language-dependent feature considered includes part-of-speech tag of the previous word and suffix of the current word while the language-independent features includes the 'context words window'. Our approach was evaluated against support vector machine techniques for Urdu POS-considered as state of the art-on two benchmark datasets. The results show our CRF approach to improve upon the F-measure of prior attempts by 8.3-8.5%.
引用
收藏
页码:331 / 362
页数:32
相关论文
共 50 条
  • [41] Part of speech tagging for Arabic
    Kuebler, Sandra
    Mohamed, Emad
    NATURAL LANGUAGE ENGINEERING, 2012, 18 : 521 - 548
  • [42] Part-of-speech tagging
    Martinez, Angel R.
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2012, 4 (01): : 107 - 113
  • [43] Arabic Part of Speech Tagging
    Mohamed, Emad
    Kuebler, Sandra
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2537 - 2543
  • [44] PART OF SPEECH TAGGING FOR POLISH
    Krasnowska-Kieras, Katarzyna
    Kobylinski, Lukasz
    POZNAN STUDIES IN CONTEMPORARY LINGUISTICS, 2019, 55 (02) : 211 - 237
  • [45] DISCRIMINATIVE DURATION MODELING FOR SPEECH RECOGNITION WITH SEGMENTAL CONDITIONAL RANDOM FIELDS
    Kao, Justine T.
    Zweig, Geoffrey
    Nguyen, Patrick
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4476 - 4479
  • [46] Part of speech tagging in odia using support vector machine
    Das, Bishwa Ranjan
    Sahoo, Smrutirekha
    Panda, Chandra Sekhar
    Patnaik, Srikanta
    INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATION AND CONVERGENCE (ICCC 2015), 2015, 48 : 507 - 512
  • [47] Recursive Part-of-Speech Tagging Using Word Structures
    Chan, Samuel W. K.
    Chong, Mickey W. C.
    TEXT, SPEECH, AND DIALOGUE, TSD 2013, 2013, 8082 : 419 - 425
  • [48] Confidence Estimation for Speech Recognition Systems using Conditional Random Fields Trained with Partially Annotated Data
    Li, Sheng
    Lu, Xugang
    Mori, Shinsuke
    Akita, Yuya
    Kaivahara, Tatsuya
    2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [49] Part of Speech Tagging in Bengali using Support Vector Machine
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    ICIT 2008: PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY, 2008, : 106 - 111
  • [50] Research on Automatic Tagging of Parts of Speech for Tibetan Texts Based on the Condition of Random Fields
    Wu, Zhiqiang
    Yu, Hongzhi
    Wan, Shuhui
    COMPUTER AND INFORMATION TECHNOLOGY, 2014, 519-520 : 784 - 787