The impact of preprocessing steps on the accuracy of machine learning algorithms in sentiment analysis

被引:0
|
作者
Saqib Alam
Nianmin Yao
机构
[1] Dalian University of Technology,Department of Electronic Information and Electrical Engineering
关键词
Preprocessing; Machine learning; Sentiment analysis; Word2Vec;
D O I
暂无
中图分类号
学科分类号
摘要
Big data and its related technologies have become active areas of research recently. There is a huge amount of data generated every minute and second that includes unstructured data which is the topic of interest for researchers now a days. A lot of research work is currently going on in the areas of text analytics and text preprocessing. In this paper, we have studied the impact of different preprocessing steps on the accuracy of three machine learning algorithms for sentiment analysis. We applied different text preprocessing techniques and studied their impact on accuracy for sentiment classification using three well-known machine learning classifiers including Naïve Bayes (NB), maximum entropy (MaxE), and support vector machines (SVM). We calculated accuracy of the three machine learning algorithms before and after applying the preprocessing steps. Results proved that the accuracy of NB algorithm was significantly improved after applying the preprocessing steps. Slight improvement in accuracy of SVM algorithm was seen after applying the preprocessing steps. Interestingly, in case of MaxE algorithm, no improvement in accuracy was seen. Our work is a comparative study, and our results proved that in case of NB algorithm, actuary was again significantly high than any other machine learning algorithm after applying the preprocessing steps; followed by MaxE and SVM algorithms. This research work proves that text preprocessing impacts the accuracy of machine learning algorithms. It further concludes that in case of NB algorithm, accuracy has significantly improved after applying text preprocessing steps.
引用
收藏
页码:319 / 335
页数:16
相关论文
共 50 条
  • [21] A Multiple-Layer Machine Learning Architecture for Improved Accuracy in Sentiment Analysis
    Shyamasundar, L. B.
    Rani, P. Jhansi
    COMPUTER JOURNAL, 2020, 63 (03): : 395 - 409
  • [22] Sentiment and semantic analysis: Urban quality inference using machine learning algorithms
    Ho, Emily
    Schneider, Michelle
    Somanath, Sanjay
    Yu, Yinan
    Thuvander, Liane
    ISCIENCE, 2024, 27 (07)
  • [23] Sentiment Analysis on Reviews of Amazon Products Using Different Machine Learning Algorithms
    Tasci, Merve Esra
    Rasheed, Jawad
    Ozkul, Tarik
    FORTHCOMING NETWORKS AND SUSTAINABILITY IN THE AIOT ERA, VOL 2, FONES-AIOT 2024, 2024, 1036 : 318 - 327
  • [24] Sentiment Analysis for Thai Language in Hotel Domain Using Machine Learning Algorithms
    Khamphakdee, Nattawat
    Seresangtakul, Pusadee
    ACTA INFORMATICA PRAGENSIA, 2021, 10 (02) : 155 - 171
  • [25] Comparison Study of Sentiment Analysis of Tweets using Various Machine Learning Algorithms
    Kanakaraddi, Suvama G.
    Chikaraddi, Ashok K.
    Gull, Karuna C.
    Hiremath, P. S.
    PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT-2020), 2020, : 287 - 292
  • [26] Machine Learning Algorithms for Predicting and Analyzing Arabic Sentiment
    Amani A. Aladeemy
    Theyazn H.H. Aldhyani
    Ali Alzahrani
    Eidah M. Alzahrani
    Osamah Ibrahim Khalaf
    Saleh Nagi Alsubari
    Sachin N. Deshmukh
    Mosleh Hmoud Al-Adhaileh
    SN Computer Science, 5 (8)
  • [27] Exploring Impact of Age and Gender on Sentiment Analysis Using Machine Learning
    Kumar, Sudhanshu
    Gahalawat, Monika
    Roy, Partha Pratim
    Dogra, Debi Prosad
    Kim, Byung-Gyu
    ELECTRONICS, 2020, 9 (02)
  • [28] The Impact of Natural Language Preprocessing on Big Data Sentiment Analysis
    Khader, Mariam
    Awajan, Arafat
    Al-Naymat, Ghazi
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2019, 16 (3A) : 506 - 513
  • [29] Empirical Analysis of Supervised and Unsupervised Machine Learning Algorithms with Aspect-Based Sentiment Analysis
    Singh, Satwinder
    Kaur, Harpreet
    Kanozia, Rubal
    Kaur, Gurpreet
    APPLIED COMPUTER SYSTEMS, 2023, 28 (01) : 125 - 136
  • [30] Data preprocessing impact on machine learning algorithm performance
    Amato, Alberto
    Di Lecce, Vincenzo
    OPEN COMPUTER SCIENCE, 2023, 13 (01)