A Deep Learning-Based Approach for Part of Speech (PoS) Tagging in the Pashto Language

被引:1
|
作者
Ullah, Shaheen [1 ]
Ahmad, Riaz [1 ]
Namoun, Abdallah [2 ]
Muhammad, Siraj [1 ]
Ullah, Khalil [3 ]
Hussain, Ibrar [1 ,2 ,3 ,4 ]
Ibrahim, Isa Ali [5 ]
机构
[1] Shaheed Benazir Bhutto Univ SBBU, Dept Comp Sci, Upper Dir 18050, Khyber Pakhtunk, Pakistan
[2] Islamic Univ Madinah, Fac Comp Sci & Informat Syst, Madinah 42351, Saudi Arabia
[3] Univ Malakand UOM, Dept Software Engn, Lower Dir, Khyber Pakhtunk, Pakistan
[4] Univ Malakand UOM, Dept Comp Sci & Informat Technol, Chakdara 18800, Khyber Pakhtunk, Pakistan
[5] Fed Univ Technol Owerri, Sch Informat & Commun Technol, Dept Cybersecur, Owerri 460114, Nigeria
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Artificial intelligence; document image analysis; handwritten text; natural language processing; optical character recognition; speech recognition; standard dataset; NATURAL-LANGUAGE;
D O I
10.1109/ACCESS.2024.3412175
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A fundamental task in natural language processing (NLP) is part of speech (PoS) tagging. PoS tagging is crucial to many NLP applications, including question answering, machine translation, syntactic parsing, speech recognition, and semantic parsing. PoS tagging is a task for labeling sequences in which a tagger/system tags each word with its appropriate part of speech label. In NLP, PoS tagging is often considered as a language-specific task. Similarly, Pashto is a language that has not been explored regarding PoS tagging. Therefore, this research focuses on the PoS tagging considering the Pashto language and provides a baseline accuracy. The research has twofold benefits. First, it introduces a Pashto tag set that contains 2,81,205 words of the Pashto language. All these words are tagged with 17 unique PoS tags. Second, it proposes a deep learning-based model by examining classic Recursive Neural Networks (RNN) and Bidirectional Long Short Term Memory Networks (BLSTM). The results show promising performances when used with the word embedding technique. The proposed approach achieved 98.82% accuracy as a baseline on the test dataset by using the BLSTM model along with word embedding.
引用
收藏
页码:86355 / 86364
页数:10
相关论文
共 50 条
  • [21] Part of speech tagging: a systematic review of deep learning and machine learning approaches
    Alebachew Chiche
    Betselot Yitagesu
    Journal of Big Data, 9
  • [22] Part-of-speech tagger for Bodo language using deep learning approach
    Pathak, Dhrubajyoti
    Narzary, Sanjib
    Nandi, Sukumar
    Som, Bidisha
    NATURAL LANGUAGE PROCESSING, 2025, 31 (02): : 215 - 229
  • [23] Part of Speech Tagging in Urdu: Comparison of Machine and Deep Learning Approaches
    Khan, Wahab
    Daud, Ali
    Khan, Khairullah
    Nasir, Jamal Abdul
    Basheri, Mohammed
    Aljohani, Naif
    Alotaibi, Fahd Saleh
    IEEE ACCESS, 2019, 7 : 38918 - 38936
  • [24] Deep learning-based recognition system for pashto handwritten text: benchmark on PHTI
    Hussain, Ibrar
    Ahmad, Riaz
    Ullah, Khalil
    Muhammad, Siraj
    Elhassan, Rasha
    Syed, Ikram
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [25] Deep Neural Network Architecture for Part-of-Speech Tagging for Turkish Language
    Bahcevan, Cenk Anil
    Kutlu, Emirhan
    Yildiz, Tugba
    2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 235 - 238
  • [26] Deep Learning based Tamil Parts of Speech (POS) Tagger
    Anbukkarasi, S.
    Varadhaganapathy, S.
    BULLETIN OF THE POLISH ACADEMY OF SCIENCES-TECHNICAL SCIENCES, 2021, 69 (06)
  • [27] Resource Building and Parts-of-Speech (POS) Tagging for the Mizo Language
    Pakray, Partha
    Pal, Arunagshu
    Majumder, Goutam
    Gelbukh, Alexander
    2015 FOURTEENTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (MICAI), 2015, : 3 - 7
  • [28] Deep Learning-based POS Tagger and Chunker for Odia Language Using Pre-trained Transformers
    Dalai, Tusarkanta
    Kumarmishra, Tapas
    Sa, Andpankaj K.
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (02)
  • [29] Part-of-Speech Tagging for Azerbaijani Language
    Mammadov, Samir
    Rustamov, Samir
    Mustafali, Ali
    Sadigov, Ziyaddin
    Mollayev, Rasim
    Mammadov, Zamir
    2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2018, : 40 - 45
  • [30] Using machine learning techniques for part-of-speech tagging in the Greek language
    Petasis, G
    Paliouras, G
    Karkaletsis, V
    Spyropoulos, CD
    Androutsopoulos, I
    ADVANCES IN INFORMATICS, 2000, : 273 - 281