A Deep Learning-Based Approach for Part of Speech (PoS) Tagging in the Pashto Language

被引:1
|
作者
Ullah, Shaheen [1 ]
Ahmad, Riaz [1 ]
Namoun, Abdallah [2 ]
Muhammad, Siraj [1 ]
Ullah, Khalil [3 ]
Hussain, Ibrar [1 ,2 ,3 ,4 ]
Ibrahim, Isa Ali [5 ]
机构
[1] Shaheed Benazir Bhutto Univ SBBU, Dept Comp Sci, Upper Dir 18050, Khyber Pakhtunk, Pakistan
[2] Islamic Univ Madinah, Fac Comp Sci & Informat Syst, Madinah 42351, Saudi Arabia
[3] Univ Malakand UOM, Dept Software Engn, Lower Dir, Khyber Pakhtunk, Pakistan
[4] Univ Malakand UOM, Dept Comp Sci & Informat Technol, Chakdara 18800, Khyber Pakhtunk, Pakistan
[5] Fed Univ Technol Owerri, Sch Informat & Commun Technol, Dept Cybersecur, Owerri 460114, Nigeria
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Artificial intelligence; document image analysis; handwritten text; natural language processing; optical character recognition; speech recognition; standard dataset; NATURAL-LANGUAGE;
D O I
10.1109/ACCESS.2024.3412175
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A fundamental task in natural language processing (NLP) is part of speech (PoS) tagging. PoS tagging is crucial to many NLP applications, including question answering, machine translation, syntactic parsing, speech recognition, and semantic parsing. PoS tagging is a task for labeling sequences in which a tagger/system tags each word with its appropriate part of speech label. In NLP, PoS tagging is often considered as a language-specific task. Similarly, Pashto is a language that has not been explored regarding PoS tagging. Therefore, this research focuses on the PoS tagging considering the Pashto language and provides a baseline accuracy. The research has twofold benefits. First, it introduces a Pashto tag set that contains 2,81,205 words of the Pashto language. All these words are tagged with 17 unique PoS tags. Second, it proposes a deep learning-based model by examining classic Recursive Neural Networks (RNN) and Bidirectional Long Short Term Memory Networks (BLSTM). The results show promising performances when used with the word embedding technique. The proposed approach achieved 98.82% accuracy as a baseline on the test dataset by using the BLSTM model along with word embedding.
引用
收藏
页码:86355 / 86364
页数:10
相关论文
共 50 条
  • [31] Hidden Markov Model Based Part of Speech Tagging for Nepali Language
    Paul, Abhijit
    Purkayastha, Bipul Syam
    Sarkar, Sunita
    2015 INTERNATIONAL SYMPOSIUM ON ADVANCED COMPUTING AND COMMUNICATION (ISACC), 2015, : 149 - 156
  • [32] Transformation-based part-of-speech tagging for Serbian language
    Delic, Vlado
    Secujski, Milan
    Kupusinac, Aleksandar
    PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, MAN-MACHINE SYSTEMS AND CYBERNETICS (CIMMACS '09), 2009, : 98 - +
  • [33] A sensing data and deep learning-based sign language recognition approach
    Hao, Wei
    Hou, Chen
    Zhang, Zhihao
    Zhai, Xueyu
    Wang, Li
    Lv, Guanghao
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 118
  • [34] On the Robustness of Deep Learning-Based Speech Enhancement
    Chhetri, Amit S.
    Hilmes, Philip
    Athi, Mrudula
    Shankar, Nikhil
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1587 - 1594
  • [35] Deep Learning Architecture for Part-of-Speech Tagging with Word and Suffix Embeddings
    Popov, Alexander
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, AIMSA 2016, 2016, 9883 : 68 - 77
  • [36] A BERT Based Approach for Arabic POS Tagging
    Saidi, Rakia
    Jarray, Fethi
    Mansour, Mahmud
    ADVANCES IN COMPUTATIONAL INTELLIGENCE, IWANN 2021, PT I, 2021, 12861 : 311 - 321
  • [37] Character-based Joint Word Segmentation and Part-of-Speech Tagging for Tibetan Based on Deep Learning
    Li, Yan
    Li, Xiaomin
    Wang, Yiru
    Lv, Hui
    Li, Fenfang
    Duo, La
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (05)
  • [38] Improving part-of-speech tagging in Amharic language using deep neural network
    Hirpassa, Sintayehu
    Lehal, G. S.
    HELIYON, 2023, 9 (07)
  • [39] Robust Multi-task Learning-based Korean POS Tagging to OvercomeWord Spacing Errors
    Park, Cheoneum
    Kim, Juae
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [40] A CONNECTIONIST APPROACH TO PART-OF-SPEECH TAGGING
    Zamora-Martinez, F.
    Castro-Bleda, M. J.
    Espana-Boquera, S.
    Tortajada, Salvador
    Aibar, P.
    IJCCI 2009: PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2009, : 421 - +