A Deep Learning-Based Approach for Part of Speech (PoS) Tagging in the Pashto Language

被引:1
|
作者
Ullah, Shaheen [1 ]
Ahmad, Riaz [1 ]
Namoun, Abdallah [2 ]
Muhammad, Siraj [1 ]
Ullah, Khalil [3 ]
Hussain, Ibrar [1 ,2 ,3 ,4 ]
Ibrahim, Isa Ali [5 ]
机构
[1] Shaheed Benazir Bhutto Univ SBBU, Dept Comp Sci, Upper Dir 18050, Khyber Pakhtunk, Pakistan
[2] Islamic Univ Madinah, Fac Comp Sci & Informat Syst, Madinah 42351, Saudi Arabia
[3] Univ Malakand UOM, Dept Software Engn, Lower Dir, Khyber Pakhtunk, Pakistan
[4] Univ Malakand UOM, Dept Comp Sci & Informat Technol, Chakdara 18800, Khyber Pakhtunk, Pakistan
[5] Fed Univ Technol Owerri, Sch Informat & Commun Technol, Dept Cybersecur, Owerri 460114, Nigeria
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Artificial intelligence; document image analysis; handwritten text; natural language processing; optical character recognition; speech recognition; standard dataset; NATURAL-LANGUAGE;
D O I
10.1109/ACCESS.2024.3412175
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A fundamental task in natural language processing (NLP) is part of speech (PoS) tagging. PoS tagging is crucial to many NLP applications, including question answering, machine translation, syntactic parsing, speech recognition, and semantic parsing. PoS tagging is a task for labeling sequences in which a tagger/system tags each word with its appropriate part of speech label. In NLP, PoS tagging is often considered as a language-specific task. Similarly, Pashto is a language that has not been explored regarding PoS tagging. Therefore, this research focuses on the PoS tagging considering the Pashto language and provides a baseline accuracy. The research has twofold benefits. First, it introduces a Pashto tag set that contains 2,81,205 words of the Pashto language. All these words are tagged with 17 unique PoS tags. Second, it proposes a deep learning-based model by examining classic Recursive Neural Networks (RNN) and Bidirectional Long Short Term Memory Networks (BLSTM). The results show promising performances when used with the word embedding technique. The proposed approach achieved 98.82% accuracy as a baseline on the test dataset by using the BLSTM model along with word embedding.
引用
收藏
页码:86355 / 86364
页数:10
相关论文
共 50 条
  • [41] Part-of-speech Tagging Based on Dictionary and Statistical Machine Learning
    Ye Zhonglin
    Jia Zhen
    Huang Junfu
    Yin Hongfeng
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 6993 - 6998
  • [42] Voted approach for part of speech tagging in Bengali
    Department of Computational Linguistics, University of Heidelberg, 1m Neuenheimer Feld 325, 69120 Heidelberg, Germany
    不详
    不详
    PACLIC 23 - Proc. 23rd Pacific Asia Conf. Lang. Inf. Comput., 2009, (120-129):
  • [43] Kadazan Part of Speech Tagging using Transformation-Based Approach
    Alex, Marylyn
    Zakaria, Lailatul Qadri
    4TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS (ICEEI 2013), 2013, 11 : 621 - 627
  • [44] A Natural Language Processing-Based Multimodal Deep Learning Approach for News Category Tagging
    Kumar, Bagesh
    Singh, Alankar
    Sharma, Vaidik
    Shivam, Yuvraj
    Mohan, Krishna
    Shukla, Prakhar
    Falor, Tanay
    Kumar, Abhishek
    COMPUTER VISION AND IMAGE PROCESSING, CVIP 2023, PT III, 2024, 2011 : 397 - 410
  • [45] Parallel HMM-Based Approach for Arabic Part of Speech Tagging
    Kadim, Ayoub
    Lazrek, Azzeddine
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (02) : 341 - 351
  • [46] A novel approach to part-of-speech tagging based on latent analogy
    Bellegarda, Jerome R.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4685 - 4688
  • [47] Part-of-speech tagging of building codes empowered by deep learning and transformational rules
    Xue, Xiaorui
    Zhang, Jiansong
    ADVANCED ENGINEERING INFORMATICS, 2021, 47
  • [48] Deep Learning-based Telephony Speech Recognition in the Wild
    Han, Kyu J.
    Hahm, Seongjun
    Kim, Byung-Hak
    Kim, Jungsuk
    Lane, Ian
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1323 - 1327
  • [49] Target exaggeration for deep learning-based speech enhancement
    Kim, Hansol
    Shin, Jong Won
    DIGITAL SIGNAL PROCESSING, 2021, 116
  • [50] Deep Learning-Based Amplitude Fusion for Speech Dereverberation
    Liu, Chunlei
    Wang, Longbiao
    Dang, Jianwu
    DISCRETE DYNAMICS IN NATURE AND SOCIETY, 2020, 2020