Urdu part of speech tagging using conditional random fields

被引:17
|
作者
Khan, Wahab [1 ]
Daud, Ali [1 ,2 ]
Nasir, Jamal Abdul [1 ]
Amjad, Tehmina [1 ]
Arafat, Sachi [2 ]
Aljohani, Naif [2 ]
Alotaibi, Fahd S. [2 ]
机构
[1] IIU, Dept Comp Sci & Software Engn, Islamabad 44000, Pakistan
[2] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
关键词
Urdu; Part of speech (POS); Conditional random field (CRF); Support vector machine (SVM);
D O I
10.1007/s10579-018-9439-6
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Part of speech (POS) tagging, the assignment of syntactic categories for words in running text, is significant to natural language processing as a preliminary task in applications such as speech processing, information extraction, and others. Urdu language processing presents a challenge due to the dual behaviour of various Urdu POS tags in differing situations (morphosyntactic ambiguity). This paper addresses this challenge by developing a novel tagging approach using linear-chain conditional random fields (CRF). Our work is the first instance of a CRF approach for Urdu POS tagging. The proposed model employs a strong, stable and balanced language-independent as well as language dependent feature set. The language-dependent feature considered includes part-of-speech tag of the previous word and suffix of the current word while the language-independent features includes the 'context words window'. Our approach was evaluated against support vector machine techniques for Urdu POS-considered as state of the art-on two benchmark datasets. The results show our CRF approach to improve upon the F-measure of prior attempts by 8.3-8.5%.
引用
收藏
页码:331 / 362
页数:32
相关论文
共 50 条
  • [31] Toward enhanced Arabic speech recognition using part of speech tagging
    AbuZeina, Dia
    Al-Khatib, Wasfi
    Elshafei, Moustafa
    Al-Muhtaseb, Husni
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2011, 14 (04) : 419 - 426
  • [32] Part-of-Speech Tagging Using Evolutionary Computation
    Silva, Ana Paula
    Silva, Arlindo
    Rodrigues, Irene
    NATURE INSPIRED COOPERATIVE STRATEGIES FOR OPTIMIZATION (NICSO 2013), 2014, 512 : 167 - +
  • [33] Part-of-Speech Tagging Using Multiview Learning
    Lim, Kyungtae
    Park, Jungyeul
    IEEE ACCESS, 2020, 8 : 195184 - 195196
  • [34] Romanian Part of Speech Tagging using LSTM Networks
    Lorincz, Beata
    Nutu, Maria
    Stan, Adriana
    2019 IEEE 15TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP 2019), 2019, : 223 - 228
  • [35] Part-of-speech tagging using genetic algorithms
    Department of Computer Science and Engineering, Lovely Professional University, Jalandhar
    Punjab, India
    Int. J. Simul. Syst. Sci. Technol., 6 (11.1-11.7):
  • [36] Disfluency Correction of Spontaneous Speech using Conditional Random Fields with Variable-Length Features
    Yeh, Jui-Feng
    Wu, Chung-Hsien
    Wu, Wei-Yen
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 549 - +
  • [37] Utterance Segmentation Using Conditional Random Fields
    Ben Dbabis, Samira
    Reguii, Boutheina
    Ghorbel, Hatem
    Belguith, Lamia Hadrich
    INNOVATION MANAGEMENT AND EDUCATION EXCELLENCE VISION 2020: FROM REGIONAL DEVELOPMENT SUSTAINABILITY TO GLOBAL ECONOMIC GROWTH, VOLS I - VI, 2016, : 3420 - 3426
  • [38] Curb Reconstruction using Conditional Random Fields
    Siegemund, Jan
    Pfeiffer, David
    Franke, Uwe
    Foerstner, Wolfgang
    2010 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2010, : 203 - 210
  • [39] IMAGE SYNTHESIS USING CONDITIONAL RANDOM FIELDS
    Ahmadi, E.
    Azimifar, Z.
    Fieguth, P.
    Ayatollahi, Sh.
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 3997 - 4000
  • [40] Document Summarization using Conditional Random Fields
    Shen, Dou
    Sun, Jian-Tao
    Li, Hua
    Yang, Qiang
    Chen, Zheng
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2868 - 2873