Improved POS Tagging Model for Malay Twitter Data based on Machine Learning Algorithm

被引:0
|
作者
Ariffin, Siti Noor Allia Noor [1 ]
Tiun, Sabrina [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi, Selangor, Malaysia
关键词
Informal Malay; Malay Twitter corpus; Malay POS tagging; Malay POS tagger model; Malay social media texts; Malay POS machine learning; SVM;
D O I
10.14569/IJACSA.2022.0130730
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Twitter is a popular social media platform in Malaysia that allows for 280-character microblogging. Almost everything that happens in a single day is tweeted by users. Because of the popularity of Twitter, most Malaysians use it daily, providing researchers and developers with a wealth of data on Malaysian users. This paper explains why and how this study chose to create a new Malay Twitter corpus, Malay Part-of-Speech (POS) tags, and a Malay POS tagger model. The goal of this paper is to improve existing Malay POS tags so that they are more compatible with the newly created Malay Twitter corpus, as well as to build a POS tagging model specifically tailored for Malay Twitter data using various machine learning algorithms. For instance, Support Vector Machine (SVM), Naive Bayes (NB), Decision Tree (DT), and K-Nearest Neighbor (KNN) classifiers. This study's data was gathered by using Twitter's Advanced Search function and relevant and related keywords associated with informal Malay. The data was fed into machine learning algorithms after several stages of processing to serve as the training and testing corpus. The evaluation and analysis of the developed Malay POS tagger model show that the SVM classifier, as well as the newly proposed Malay POS tags, is the best machine learning algorithm for Malay Twitter data. Furthermore, the prediction accuracy and POS tagging results show that this research outperformed a comparable previous study, indicating that the Malay POS tagger model and its POS were successfully improved.
引用
收藏
页码:229 / 234
页数:6
相关论文
共 50 条
  • [21] Graph Based Semi-Supervised Learning for Tamil POS Tagging
    Thayaparan, Mokanarangan
    Ranathunga, Surangika
    Thayasivam, Uthayasanker
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 3955 - 3960
  • [22] Extreme learning machine based on improved genetic algorithm
    Liu, Hai
    Jiao, Bin
    Peng, Long
    Zhang, Ting
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INFORMATION ENGINEERING FOR MECHANICS AND MATERIALS, 2015, 21 : 199 - 204
  • [23] Detection Traffic Congestion Based on Twitter Data using Machine Learning
    Zulfikar, Muhammad Taufiq
    Suharjito
    4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMPUTATIONAL INTELLIGENCE (ICCSCI 2019) : ENABLING COLLABORATION TO ESCALATE IMPACT OF RESEARCH RESULTS FOR SOCIETY, 2019, 157 : 118 - 124
  • [24] Machine learning model for feature recognition of sports competition based on improved TLD algorithm
    Ding, Qinglong
    Ding, Zhenfeng
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (02) : 2697 - 2708
  • [25] Intelligent system of English composition scoring model based on improved machine learning algorithm
    Liu, Jie
    Lin, Lin
    Liang, Xiufang
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (02) : 2397 - 2407
  • [26] Intelligent system of English composition scoring model based on improved machine learning algorithm
    Liu, Jie
    Lin, Lin
    Liang, Xiufang
    Journal of Intelligent and Fuzzy Systems, 2021, 40 (02): : 2397 - 2407
  • [27] Slangs and Short forms of Malay Twitter Sentiment Analysis using Supervised Machine Learning
    Yin, Cheng Jet
    Ayop, Zakiah
    Anawar, Syarulnaziah
    Othman, Nur Fadzilah
    Zainudin, Norulzahrah Mohd
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (11): : 294 - 300
  • [28] A Data-Driven Model for Automated Chinese Word Segmentation and POS Tagging
    Xu, Qing
    Wang, Zhiyou
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [29] Parameter Tuned Machine Learning Based Emotion Recognition on Arabic Twitter Data
    Alwayle I.M.
    Al-Onazi B.B.
    Alzahrani J.S.
    Alalayah K.M.
    Alaidarous K.M.
    Ahmed I.A.
    Othman M.
    Motwakel A.
    Computer Systems Science and Engineering, 2023, 46 (03): : 3423 - 3438
  • [30] Implementing Chinese new word discovery and POS tagging based on support vector machine
    School of Computer Science, Fudan University, Shanghai 200433, China
    不详
    J. Comput. Inf. Syst., 2009, 3 (1279-1285):