Urdu part of speech tagging using conditional random fields

被引:17
|
作者
Khan, Wahab [1 ]
Daud, Ali [1 ,2 ]
Nasir, Jamal Abdul [1 ]
Amjad, Tehmina [1 ]
Arafat, Sachi [2 ]
Aljohani, Naif [2 ]
Alotaibi, Fahd S. [2 ]
机构
[1] IIU, Dept Comp Sci & Software Engn, Islamabad 44000, Pakistan
[2] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
关键词
Urdu; Part of speech (POS); Conditional random field (CRF); Support vector machine (SVM);
D O I
10.1007/s10579-018-9439-6
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Part of speech (POS) tagging, the assignment of syntactic categories for words in running text, is significant to natural language processing as a preliminary task in applications such as speech processing, information extraction, and others. Urdu language processing presents a challenge due to the dual behaviour of various Urdu POS tags in differing situations (morphosyntactic ambiguity). This paper addresses this challenge by developing a novel tagging approach using linear-chain conditional random fields (CRF). Our work is the first instance of a CRF approach for Urdu POS tagging. The proposed model employs a strong, stable and balanced language-independent as well as language dependent feature set. The language-dependent feature considered includes part-of-speech tag of the previous word and suffix of the current word while the language-independent features includes the 'context words window'. Our approach was evaluated against support vector machine techniques for Urdu POS-considered as state of the art-on two benchmark datasets. The results show our CRF approach to improve upon the F-measure of prior attempts by 8.3-8.5%.
引用
收藏
页码:331 / 362
页数:32
相关论文
共 50 条
  • [21] Active Learning for Speech Emotion Recognition Using Conditional Random Fields
    Zhao, Ziping
    Ma, Xirong
    2013 14TH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD 2013), 2013, : 127 - 131
  • [22] Shrinkage Based Features for Slot Tagging with Conditional Random Fields
    Sarikaya, Ruhi
    Celikyilmaz, Asli
    Deoras, Anoop
    Jeong, Minwoo
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 268 - 272
  • [23] AUTOMATIC MINUTE GENERATION FOR PARLIAMENTARY SPEECH USING CONDITIONAL RANDOM FIELDS
    Zhang, Justin Jian
    Fung, Pascale
    Chan, Ricky Ho Yin
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5536 - 5539
  • [24] Low-Resource Speech Translation of Urdu to English Using Semi-Supervised Part-of-Speech Tagging and Transliteration
    Aminzadeh, A. Ryan
    Shen, Wade
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 265 - 268
  • [25] A conditional random field based approach for high-accuracy part-of-speech tagging using language-independent features
    Ali, Mushtaq
    Khan, Muzammil
    Alharbi, Yasser
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [26] Attribute-based Mandarin Speech Recognition using Conditional Random Fields
    Lin, Chi-Yueh
    Wang, Hsiao-Chuan
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 709 - 712
  • [27] Hidden Conditional Random Fields for Visual Speech Recognition
    Pass, Adrian
    Zhang, Jianguo
    Stewart, Darryl
    2009 13TH INTERNATIONAL MACHINE VISION AND IMAGE PROCESSING CONFERENCE, 2009, : 117 - 122
  • [28] Speech Synthesis Based on Gaussian Conditional Random Fields
    Khorram, Soheil
    Bahmaninezhad, Fahimeh
    Sameti, Hossein
    ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING, AISP 2013, 2014, 427 : 183 - 193
  • [29] Conditional Random Fields in Speech, Audio, and Language Processing
    Fosler-Lussier, Eric
    He, Yanzhang
    Jyothi, Preethi
    Prabhavalkar, Rohit
    PROCEEDINGS OF THE IEEE, 2013, 101 (05) : 1054 - 1075
  • [30] Hierarchical conditional random fields (HCRF) for chinese named entity tagging
    Lu, Peng
    Yang, Yiping
    Gao, Yibo
    Ren, He
    ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 5, PROCEEDINGS, 2007, : 24 - +