Improved POS Tagging Model for Malay Twitter Data based on Machine Learning Algorithm

被引:0
|
作者
Ariffin, Siti Noor Allia Noor [1 ]
Tiun, Sabrina [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi, Selangor, Malaysia
关键词
Informal Malay; Malay Twitter corpus; Malay POS tagging; Malay POS tagger model; Malay social media texts; Malay POS machine learning; SVM;
D O I
10.14569/IJACSA.2022.0130730
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Twitter is a popular social media platform in Malaysia that allows for 280-character microblogging. Almost everything that happens in a single day is tweeted by users. Because of the popularity of Twitter, most Malaysians use it daily, providing researchers and developers with a wealth of data on Malaysian users. This paper explains why and how this study chose to create a new Malay Twitter corpus, Malay Part-of-Speech (POS) tags, and a Malay POS tagger model. The goal of this paper is to improve existing Malay POS tags so that they are more compatible with the newly created Malay Twitter corpus, as well as to build a POS tagging model specifically tailored for Malay Twitter data using various machine learning algorithms. For instance, Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), and K-Nearest Neighbor (KNN) classifiers. This study's data was gathered by using Twitter's Advanced Search function and relevant and related keywords associated with informal Malay. The data was fed into machine learning algorithms after several stages of processing to serve as the training and testing corpus. The evaluation and analysis of the developed Malay POS tagger model show that the SVM classifier, as well as the newly proposed Malay POS tags, is the best machine learning algorithm for Malay Twitter data. Furthermore, the prediction accuracy and POS tagging results show that this research outperformed a comparable previous study, indicating that the Malay POS tagger model and its POS were successfully improved.
引用
收藏
页码:229 / 234
页数:6
相关论文
共 50 条
  • [1] A machine learning approach to POS tagging
    Màrquez, L
    Padró, L
    Rodríguez, H
    MACHINE LEARNING, 2000, 39 (01) : 59 - 91
  • [2] A Machine Learning Approach to POS Tagging
    Lluís Màrquez
    Lluís Padró
    Horacio Rodríguez
    Machine Learning, 2000, 39 : 59 - 91
  • [3] An Improved Algorithm Model based on Machine Learning
    Zhou Ke
    Wong Huan
    Wu Ruo-fan
    Qi Xin
    2015 27TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2015, : 3754 - 3757
  • [4] Fake News Detection Using Pos Tagging and Machine Learning
    Kansal, Afreen
    JOURNAL OF APPLIED SECURITY RESEARCH, 2023, 18 (02) : 164 - 179
  • [5] Sentimental analysis over twitter data using clustering based machine learning algorithm
    Jacob, Sharon Susan
    Vijayakumar, R.
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021,
  • [6] Improved hidden Markov model for speech recognition and POS tagging
    袁里驰
    JournalofCentralSouthUniversity, 2012, 19 (02) : 511 - 516
  • [7] Improved hidden Markov model for speech recognition and POS tagging
    Li-chi Yuan
    Journal of Central South University, 2012, 19 : 511 - 516
  • [8] Improved hidden Markov model for speech recognition and POS tagging
    Yuan Li-chi
    JOURNAL OF CENTRAL SOUTH UNIVERSITY, 2012, 19 (02) : 511 - 516
  • [9] A Machine Learning Approach to POS Tagging Case study: Amazighe language
    Samir, Amri
    Rkia, Bani
    Lahbib, Zenkouar
    Zouhair, Guennoun
    2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, : 410 - 413
  • [10] Deep Learning Based Unsupervised POS Tagging for Sanskrit
    Srivastava, Prakhar
    Chauhan, Kushal
    Aggarwal, Deepanshu
    Shukla, Anupam
    Dhar, Joydip
    Jain, Vrashabh Prasad
    2018 INTERNATIONAL CONFERENCE ON ALGORITHMS, COMPUTING AND ARTIFICIAL INTELLIGENCE (ACAI 2018), 2018,