A Deep Learning-Based Approach for Part of Speech (PoS) Tagging in the Pashto Language

被引:1
|
作者
Ullah, Shaheen [1 ]
Ahmad, Riaz [1 ]
Namoun, Abdallah [2 ]
Muhammad, Siraj [1 ]
Ullah, Khalil [3 ]
Hussain, Ibrar [1 ,2 ,3 ,4 ]
Ibrahim, Isa Ali [5 ]
机构
[1] Shaheed Benazir Bhutto Univ SBBU, Dept Comp Sci, Upper Dir 18050, Khyber Pakhtunk, Pakistan
[2] Islamic Univ Madinah, Fac Comp Sci & Informat Syst, Madinah 42351, Saudi Arabia
[3] Univ Malakand UOM, Dept Software Engn, Lower Dir, Khyber Pakhtunk, Pakistan
[4] Univ Malakand UOM, Dept Comp Sci & Informat Technol, Chakdara 18800, Khyber Pakhtunk, Pakistan
[5] Fed Univ Technol Owerri, Sch Informat & Commun Technol, Dept Cybersecur, Owerri 460114, Nigeria
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Artificial intelligence; document image analysis; handwritten text; natural language processing; optical character recognition; speech recognition; standard dataset; NATURAL-LANGUAGE;
D O I
10.1109/ACCESS.2024.3412175
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A fundamental task in natural language processing (NLP) is part of speech (PoS) tagging. PoS tagging is crucial to many NLP applications, including question answering, machine translation, syntactic parsing, speech recognition, and semantic parsing. PoS tagging is a task for labeling sequences in which a tagger/system tags each word with its appropriate part of speech label. In NLP, PoS tagging is often considered as a language-specific task. Similarly, Pashto is a language that has not been explored regarding PoS tagging. Therefore, this research focuses on the PoS tagging considering the Pashto language and provides a baseline accuracy. The research has twofold benefits. First, it introduces a Pashto tag set that contains 2,81,205 words of the Pashto language. All these words are tagged with 17 unique PoS tags. Second, it proposes a deep learning-based model by examining classic Recursive Neural Networks (RNN) and Bidirectional Long Short Term Memory Networks (BLSTM). The results show promising performances when used with the word embedding technique. The proposed approach achieved 98.82% accuracy as a baseline on the test dataset by using the BLSTM model along with word embedding.
引用
收藏
页码:86355 / 86364
页数:10
相关论文
共 50 条
  • [1] Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus
    Warjri, Sunita
    Pakray, Partha
    Lyngdoh, Saralin A.
    Maji, Arnab Kumar
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [2] A Deep Learning Approach for Part-of-Speech Tagging in Nepali Language
    Prabha, Greeshma
    Jyothsna, P., V
    Shahina, K. K.
    Premjith, B.
    Soman, K. P.
    2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1132 - 1136
  • [3] Part-of-Speech (POS) Tagging for the Nyishi Language
    Siram, Joyir
    Sambyo, Koj
    Sarkar, Achyuth
    ADVANCES IN INFORMATION COMMUNICATION TECHNOLOGY AND COMPUTING, AICTC 2021, 2022, 392 : 191 - 199
  • [4] Developing a tagset for Pashto part of speech tagging
    Rabbi, Ihsan
    Khan, Mohammad Abid
    Ali, Rahman
    2008 SECOND INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, 2008, : 111 - 116
  • [5] Part-of-Speech Tagging of Odia Language Using Statistical and Deep Learning Based Approaches
    Dalai, Tusarkanta
    Mishra, Tapas Kumar
    Sa, Pankaj K.
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [6] Comparative Analysis of Deep Learning Models for Part of Speech Tagging in the Malay Language
    Adebayo B.M.
    Anbananthen K.S.M.
    Muthaiyah S.
    Lurudusamy S.N.
    HighTech and Innovation Journal, 2024, 5 (02): : 272 - 281
  • [7] Deep Learning Based Unsupervised POS Tagging for Sanskrit
    Srivastava, Prakhar
    Chauhan, Kushal
    Aggarwal, Deepanshu
    Shukla, Anupam
    Dhar, Joydip
    Jain, Vrashabh Prasad
    2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,
  • [8] A Machine Learning Approach to POS Tagging Case study: Amazighe language
    Samir, Amri
    Rkia, Bani
    Lahbib, Zenkouar
    Zouhair, Guennoun
    2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, : 410 - 413
  • [9] Part-of-Speech (POS) Tagging for Standard Brunei Malay: A Probabilistic and Neural- Based Approach
    Mohaimin, Izzati
    Apong, Rosyzie A.
    Damit, Ashrol R.
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2023, 14 (04) : 830 - 837
  • [10] A hybrid statistical and deep learning based technique for Persian part of speech tagging
    Sara Besharati
    Hadi Veisi
    Ali Darzi
    Seyed Habib Hosseini Saravani
    Iran Journal of Computer Science, 2021, 4 (1) : 35 - 43