Urdu part of speech tagging using conditional random fields

被引：17

作者：

Khan, Wahab ^{[1
]}

Daud, Ali ^{[1
,2
]}

Nasir, Jamal Abdul ^{[1
]}

Amjad, Tehmina ^{[1
]}

Arafat, Sachi ^{[2
]}

Aljohani, Naif ^{[2
]}

Alotaibi, Fahd S. ^{[2
]}

机构：

[1] IIU, Dept Comp Sci & Software Engn, Islamabad 44000, Pakistan

[2] King Abdulaziz Univ, Fac Comp & Informat Technol, Jeddah, Saudi Arabia

来源：

LANGUAGE RESOURCES AND EVALUATION | 2019年 / 53卷 / 03期

关键词：

Urdu; Part of speech (POS); Conditional random field (CRF); Support vector machine (SVM);

D O I：

10.1007/s10579-018-9439-6

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Part of speech (POS) tagging, the assignment of syntactic categories for words in running text, is significant to natural language processing as a preliminary task in applications such as speech processing, information extraction, and others. Urdu language processing presents a challenge due to the dual behaviour of various Urdu POS tags in differing situations (morphosyntactic ambiguity). This paper addresses this challenge by developing a novel tagging approach using linear-chain conditional random fields (CRF). Our work is the first instance of a CRF approach for Urdu POS tagging. The proposed model employs a strong, stable and balanced language-independent as well as language dependent feature set. The language-dependent feature considered includes part-of-speech tag of the previous word and suffix of the current word while the language-independent features includes the 'context words window'. Our approach was evaluated against support vector machine techniques for Urdu POS-considered as state of the art-on two benchmark datasets. The results show our CRF approach to improve upon the F-measure of prior attempts by 8.3-8.5%.

引用

页码：331 / 362

页数：32

共 50 条

[1] Urdu part of speech tagging using conditional random fields
Wahab Khan
Ali Daud
Jamal Abdul Nasir
Tehmina Amjad
Sachi Arafat
Naif Aljohani
Fahd S. Alotaibi
Language Resources and Evaluation, 2019, 53 : 331 - 362
[2] Part-Of-Speech Tagging And Parsing Of Kannada Text Using Conditional Random Fields (CRFs)
Suraksha, N. M.
Reshma, K.
Kumar, Shiva K. M.
PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL (I2C2), 2017,
[3] Part-of-Speech Tagging for Mizo Language Using Conditional Random Field
Nunsanga, Morrel V. L.
Pakray, Partha
Lallawmsanga, C.
Singh, L. Lolit Kumar
COMPUTACION Y SISTEMAS, 2021, 25 (04): : 803 - 812
[4] Part-of-Speech Tagging Using Conditional Random Fields and Decision Tree: Amazigh Text Written in Tifinaghe Characters
Maarouf, Otman
El Ayachi, Rachid
Biniz, Mohamed
ADVANCED INTELLIGENT SYSTEMS FOR SUSTAINABLE DEVELOPMENT (AI2SD'2020), VOL 2, 2022, 1418 : 220 - 232
[5] Part-of-Speech Tagging using Conditional Random Fields: Exploiting Sub-Label Dependencies for Improved Accuracy
Silfverberg, Miikka
Ruokolainen, Teemu
Linden, Krister
Kurimo, Mikko
PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2014, : 259 - 264
[6] Khmer POS Tagging Using Conditional Random Fields
Sangvat, Sokunsatya
Pluempitiwiriyawej, Charnyote
COMPUTATIONAL LINGUISTICS, PACLING 2017, 2018, 781 : 169 - 178
[7] Accurate Part-of-Speech Tagging via Conditional Random Field
Zhang, Jinmei
Zhang, Yucheng
INTERNET OF VEHICLES - TECHNOLOGIES AND SERVICES, 2016, 10036 : 217 - 224
[8] A Comparative Study of Hidden Markov Model and Conditional Random Fields on a Yoruba Part-of-Speech Tagging Task
Ayogu, Ikechukwu I.
Adetunmbi, Adebayo O.
Ojokoh, Bolanle A.
Oluwadare, Samuel A.
PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTING NETWORKING AND INFORMATICS (ICCNI 2017), 2017,
[9] Part-of-speech (POS) tagging using conditional random field (CRF) model for Khasi corpora
Sunita Warjri
Partha Pakray
Saralin A. Lyngdoh
Arnab Kumar Maji
International Journal of Speech Technology, 2021, 24 : 853 - 864
[10] Part-of-speech (POS) tagging using conditional random field (CRF) model for Khasi corpora
Warjri, Sunita
Pakray, Partha
Lyngdoh, Saralin A.
Maji, Arnab Kumar
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 853 - 864

← 1 2 3 4 5 →