A Deep Learning-Based Approach for Part of Speech (PoS) Tagging in the Pashto Language

被引：1

作者：

Ullah, Shaheen ^{[1
]}

Ahmad, Riaz ^{[1
]}

Namoun, Abdallah ^{[2
]}

Muhammad, Siraj ^{[1
]}

Ullah, Khalil ^{[3
]}

Hussain, Ibrar ^{[1
,2
,3
,4
]}

Ibrahim, Isa Ali ^{[5
]}

机构：

[1] Shaheed Benazir Bhutto Univ SBBU, Dept Comp Sci, Upper Dir 18050, Khyber Pakhtunk, Pakistan

[2] Islamic Univ Madinah, Fac Comp Sci & Informat Syst, Madinah 42351, Saudi Arabia

[3] Univ Malakand UOM, Dept Software Engn, Lower Dir, Khyber Pakhtunk, Pakistan

[4] Univ Malakand UOM, Dept Comp Sci & Informat Technol, Chakdara 18800, Khyber Pakhtunk, Pakistan

[5] Fed Univ Technol Owerri, Sch Informat & Commun Technol, Dept Cybersecur, Owerri 460114, Nigeria

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Artificial intelligence; document image analysis; handwritten text; natural language processing; optical character recognition; speech recognition; standard dataset; NATURAL-LANGUAGE;

D O I：

10.1109/ACCESS.2024.3412175

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A fundamental task in natural language processing (NLP) is part of speech (PoS) tagging. PoS tagging is crucial to many NLP applications, including question answering, machine translation, syntactic parsing, speech recognition, and semantic parsing. PoS tagging is a task for labeling sequences in which a tagger/system tags each word with its appropriate part of speech label. In NLP, PoS tagging is often considered as a language-specific task. Similarly, Pashto is a language that has not been explored regarding PoS tagging. Therefore, this research focuses on the PoS tagging considering the Pashto language and provides a baseline accuracy. The research has twofold benefits. First, it introduces a Pashto tag set that contains 2,81,205 words of the Pashto language. All these words are tagged with 17 unique PoS tags. Second, it proposes a deep learning-based model by examining classic Recursive Neural Networks (RNN) and Bidirectional Long Short Term Memory Networks (BLSTM). The results show promising performances when used with the word embedding technique. The proposed approach achieved 98.82% accuracy as a baseline on the test dataset by using the BLSTM model along with word embedding.

引用

页码：86355 / 86364

页数：10

共 50 条

[41] Part-of-speech Tagging Based on Dictionary and Statistical Machine Learning
Ye Zhonglin
Jia Zhen
Huang Junfu
Yin Hongfeng
PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 6993 - 6998
[42] Voted approach for part of speech tagging in Bengali
Department of Computational Linguistics, University of Heidelberg, 1m Neuenheimer Feld 325, 69120 Heidelberg, Germany
不详
不详
PACLIC 23 - Proc. 23rd Pacific Asia Conf. Lang. Inf. Comput., 2009, (120-129):
[43] Kadazan Part of Speech Tagging using Transformation-Based Approach
Alex, Marylyn
Zakaria, Lailatul Qadri
4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI 2013), 2013, 11 : 621 - 627
[44] A Natural Language Processing-Based Multimodal Deep Learning Approach for News Category Tagging
Kumar, Bagesh
Singh, Alankar
Sharma, Vaidik
Shivam, Yuvraj
Mohan, Krishna
Shukla, Prakhar
Falor, Tanay
Kumar, Abhishek
COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III, 2024, 2011 : 397 - 410
[45] Parallel HMM-Based Approach for Arabic Part of Speech Tagging
Kadim, Ayoub
Lazrek, Azzeddine
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (02) : 341 - 351
[46] A novel approach to part-of-speech tagging based on latent analogy
Bellegarda, Jerome R.
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4685 - 4688
[47] Part-of-speech tagging of building codes empowered by deep learning and transformational rules
Xue, Xiaorui
Zhang, Jiansong
ADVANCED ENGINEERING INFORMATICS, 2021, 47
[48] Deep Learning-based Telephony Speech Recognition in the Wild
Han, Kyu J.
Hahm, Seongjun
Kim, Byung-Hak
Kim, Jungsuk
Lane, Ian
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1323 - 1327
[49] Target exaggeration for deep learning-based speech enhancement
Kim, Hansol
Shin, Jong Won
DIGITAL SIGNAL PROCESSING, 2021, 116
[50] Deep Learning-Based Amplitude Fusion for Speech Dereverberation
Liu, Chunlei
Wang, Longbiao
Dang, Jianwu
DISCRETE DYNAMICS IN NATURE AND SOCIETY, 2020, 2020

← 1 2 3 4 5 →