Improved POS Tagging Model for Malay Twitter Data based on Machine Learning Algorithm

被引:0
|
作者
Ariffin, Siti Noor Allia Noor [1 ]
Tiun, Sabrina [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi, Selangor, Malaysia
关键词
Informal Malay; Malay Twitter corpus; Malay POS tagging; Malay POS tagger model; Malay social media texts; Malay POS machine learning; SVM;
D O I
10.14569/IJACSA.2022.0130730
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Twitter is a popular social media platform in Malaysia that allows for 280-character microblogging. Almost everything that happens in a single day is tweeted by users. Because of the popularity of Twitter, most Malaysians use it daily, providing researchers and developers with a wealth of data on Malaysian users. This paper explains why and how this study chose to create a new Malay Twitter corpus, Malay Part-of-Speech (POS) tags, and a Malay POS tagger model. The goal of this paper is to improve existing Malay POS tags so that they are more compatible with the newly created Malay Twitter corpus, as well as to build a POS tagging model specifically tailored for Malay Twitter data using various machine learning algorithms. For instance, Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), and K-Nearest Neighbor (KNN) classifiers. This study's data was gathered by using Twitter's Advanced Search function and relevant and related keywords associated with informal Malay. The data was fed into machine learning algorithms after several stages of processing to serve as the training and testing corpus. The evaluation and analysis of the developed Malay POS tagger model show that the SVM classifier, as well as the newly proposed Malay POS tags, is the best machine learning algorithm for Malay Twitter data. Furthermore, the prediction accuracy and POS tagging results show that this research outperformed a comparable previous study, indicating that the Malay POS tagger model and its POS were successfully improved.
引用
收藏
页码:229 / 234
页数:6
相关论文
共 50 条
  • [41] A MACHINE LEARNING MODEL BASED ON HETEROGENEOUS DATA
    Narbayeva, S. M.
    Tapeeva, S. K.
    Turarbek, A.
    Zhunusbaeva, S.
    JOURNAL OF MATHEMATICS MECHANICS AND COMPUTER SCIENCE, 2022, 114 (02): : 80 - 90
  • [42] Image Segmentation Prediction Model of Machine Learning and Improved Genetic Algorithm
    Li, Caihong
    Zhang, Huie
    Huang, Junjie
    Shen, Haijie
    Tian, Xinzhi
    Engineering Intelligent Systems, 2023, 31 (02): : 115 - 125
  • [43] A detailed study on sentimental analysis using Twitter data with an Improved deep learning model
    Bhavani, M.
    Shrijeeth, S.
    Rohit, M.
    Krishnan, Sanjeev R.
    Sharveshwaran, R.
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 408 - 413
  • [44] Context-Based Bigram Model for POS Tagging in Hindi: A Heuristic Approach
    Bharti S.K.
    Gupta R.K.
    Patel S.
    Shah M.
    Annals of Data Science, 2024, 11 (01) : 347 - 378
  • [45] Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging
    Na, Seung-Hoon
    Kim, Young-Kil
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (02): : 512 - 522
  • [46] Applying class triggers in Chinese pos tagging based on maximum entropy model
    Zhao, Y
    Wang, XL
    Liu, BQ
    Guan, Y
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1641 - 1645
  • [47] An Improved Coverless Text Steganography Algorithm Based on Pretreatment and POS
    Liu, Yuling
    Wu, Jiao
    Chen, Xianyi
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2021, 15 (04) : 1553 - 1567
  • [48] A factor pricing model based on machine learning algorithm
    Fang, Yi
    Chen, Yuzhi
    Ren, Hang
    INTERNATIONAL REVIEW OF ECONOMICS & FINANCE, 2023, 88 : 280 - 297
  • [49] Part-of-Speech (POS) Tagging Using Deep Learning-Based Approaches on the Designed Khasi POS Corpus
    Warjri, Sunita
    Pakray, Partha
    Lyngdoh, Saralin A.
    Maji, Arnab Kumar
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (03)
  • [50] A Deep Learning-Based Approach for Part of Speech (PoS) Tagging in the Pashto Language
    Ullah, Shaheen
    Ahmad, Riaz
    Namoun, Abdallah
    Muhammad, Siraj
    Ullah, Khalil
    Hussain, Ibrar
    Ibrahim, Isa Ali
    IEEE ACCESS, 2024, 12 : 86355 - 86364