Urdu part of speech tagging using conditional random fields

被引:17
|
作者
Khan, Wahab [1 ]
Daud, Ali [1 ,2 ]
Nasir, Jamal Abdul [1 ]
Amjad, Tehmina [1 ]
Arafat, Sachi [2 ]
Aljohani, Naif [2 ]
Alotaibi, Fahd S. [2 ]
机构
[1] IIU, Dept Comp Sci & Software Engn, Islamabad 44000, Pakistan
[2] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah, Saudi Arabia
关键词
Urdu; Part of speech (POS); Conditional random field (CRF); Support vector machine (SVM);
D O I
10.1007/s10579-018-9439-6
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Part of speech (POS) tagging, the assignment of syntactic categories for words in running text, is significant to natural language processing as a preliminary task in applications such as speech processing, information extraction, and others. Urdu language processing presents a challenge due to the dual behaviour of various Urdu POS tags in differing situations (morphosyntactic ambiguity). This paper addresses this challenge by developing a novel tagging approach using linear-chain conditional random fields (CRF). Our work is the first instance of a CRF approach for Urdu POS tagging. The proposed model employs a strong, stable and balanced language-independent as well as language dependent feature set. The language-dependent feature considered includes part-of-speech tag of the previous word and suffix of the current word while the language-independent features includes the 'context words window'. Our approach was evaluated against support vector machine techniques for Urdu POS-considered as state of the art-on two benchmark datasets. The results show our CRF approach to improve upon the F-measure of prior attempts by 8.3-8.5%.
引用
收藏
页码:331 / 362
页数:32
相关论文
共 50 条
  • [1] Urdu part of speech tagging using conditional random fields
    Wahab Khan
    Ali Daud
    Jamal Abdul Nasir
    Tehmina Amjad
    Sachi Arafat
    Naif Aljohani
    Fahd S. Alotaibi
    Language Resources and Evaluation, 2019, 53 : 331 - 362
  • [2] Part-Of-Speech Tagging And Parsing Of Kannada Text Using Conditional Random Fields (CRFs)
    Suraksha, N. M.
    Reshma, K.
    Kumar, Shiva K. M.
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2), 2017,
  • [3] Part-of-Speech Tagging for Mizo Language Using Conditional Random Field
    Nunsanga, Morrel V. L.
    Pakray, Partha
    Lallawmsanga, C.
    Singh, L. Lolit Kumar
    COMPUTACION Y SISTEMAS, 2021, 25 (04): : 803 - 812
  • [4] Part-of-Speech Tagging Using Conditional Random Fields and Decision Tree: Amazigh Text Written in Tifinaghe Characters
    Maarouf, Otman
    El Ayachi, Rachid
    Biniz, Mohamed
    ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 220 - 232
  • [5] Part-of-Speech Tagging using Conditional Random Fields: Exploiting Sub-Label Dependencies for Improved Accuracy
    Silfverberg, Miikka
    Ruokolainen, Teemu
    Linden, Krister
    Kurimo, Mikko
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 259 - 264
  • [6] Khmer POS Tagging Using Conditional Random Fields
    Sangvat, Sokunsatya
    Pluempitiwiriyawej, Charnyote
    COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 : 169 - 178
  • [7] Accurate Part-of-Speech Tagging via Conditional Random Field
    Zhang, Jinmei
    Zhang, Yucheng
    INTERNET OF VEHICLES - TECHNOLOGIES AND SERVICES, 2016, 10036 : 217 - 224
  • [8] A Comparative Study of Hidden Markov Model and Conditional Random Fields on a Yoruba Part-of-Speech Tagging Task
    Ayogu, Ikechukwu I.
    Adetunmbi, Adebayo O.
    Ojokoh, Bolanle A.
    Oluwadare, Samuel A.
    PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTING NETWORKING AND INFORMATICS (ICCNI 2017), 2017,
  • [9] Part-of-speech (POS) tagging using conditional random field (CRF) model for Khasi corpora
    Sunita Warjri
    Partha Pakray
    Saralin A. Lyngdoh
    Arnab Kumar Maji
    International Journal of Speech Technology, 2021, 24 : 853 - 864
  • [10] Part-of-speech (POS) tagging using conditional random field (CRF) model for Khasi corpora
    Warjri, Sunita
    Pakray, Partha
    Lyngdoh, Saralin A.
    Maji, Arnab Kumar
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 853 - 864