Dimensionality Reduction for Sentiment Analysis using Pre-processing Techniques

被引:0
|
作者
Mhatre, Mayuri [1 ]
Phondekar, Dakshata [1 ]
Kadam, Pranali [1 ]
Chawathe, Anushka [1 ]
Ghag, Kranti [1 ]
机构
[1] SAKEC, Informat Technol Dept, Bombay, Maharashtra, India
关键词
Sentiment Analysis; Pre-processing; Slangs Handling; Stopwords Removal; Lemmatization;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Sentiment analysis is the study of people's opinions, sentiments, attitudes and emotions, expressed in written language but this process is time consuming, inconsistent and costly in business context. Pre-processing the data will help to ease this difficulty. Pre-processing is the process of cleaning and preparing the text for its analysis using pre-processing techniques. The existing pre-processing techniques are Handling Expressive Lengthening, Emoticons Handling, HTML Tags Removal, Punctuations Handling, Slangs Handling, Stopwords Removal, Stemming and Lemmatization. In this paper, the effect of various pre-processing techniques and their combinations was analyzed on the dataset taken from Kaggle called Bag of Words Meets Bags of Popcorn. By taking every possible combination of pre-processing techniques, the aim was to find the one giving highest accuracy. Random Forest Classifier was used to predict sentiments as it is known to give good accuracy and the result was evaluated using 10 fold cross validation method. Accuracy increased from unprocessed data to pre-processed data. It was concluded that using pre-processing techniques gives a higher accuracy than the traditional approach i.e. no pre-processing.
引用
收藏
页码:16 / 21
页数:6
相关论文
共 50 条
  • [21] SentReP: Sentiment Classification of Movie Reviews using Efficient Repetitive Pre-Processing
    Manek, Asha S.
    Pallavi, R. P.
    Bhat, Veena H.
    Shenoy, P. Deepa
    Mohan, M. Chandra
    Venugopal, K. R.
    Patnaik, L. M.
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [22] Low resource language specific pre-processing and features for sentiment analysis task
    Meetei, Loitongbam Sanayai
    Singh, Thoudam Doren
    Borgohain, Samir Kumar
    Bandyopadhyay, Sivaji
    LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (04) : 947 - 969
  • [23] Low resource language specific pre-processing and features for sentiment analysis task
    Loitongbam Sanayai Meetei
    Thoudam Doren Singh
    Samir Kumar Borgohain
    Sivaji Bandyopadhyay
    Language Resources and Evaluation, 2021, 55 : 947 - 969
  • [24] Multilingual evaluation of pre-processing for BERT-based sentiment analysis of tweets
    Pota, Marco
    Ventura, Mirko
    Fujita, Hamido
    Esposito, Massimo
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 181
  • [25] Pre-processing Techniques for Colour Digital Pathology Image Analysis
    Saafin, Wael
    Schaefer, Gerald
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS (MIUA 2017), 2017, 723 : 551 - 560
  • [26] Sentiment Analysis of COVID-19 Tweets: Impact of Pre-processing Step
    Ayadi, Rami
    R.Shahin, Osama
    Ghorbel, Osama
    Alanazi, Rayan
    Saidi, Anouar
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (03): : 206 - 211
  • [27] Analysis of pre-processing techniques when using validation methods in computational electromagnetic simulations
    Jauregui, Ricardo
    Ventosa, Oriol
    Silva, Ferran
    Kunze, Marco
    IET SCIENCE MEASUREMENT & TECHNOLOGY, 2013, 7 (03) : 151 - 156
  • [28] PAPR reduction in OFDM using wavelet packet pre-processing
    Baro, Mohan
    Ilow, Jacek
    2008 5TH IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE, VOLS 1-3, 2008, : 195 - 199
  • [29] Comparison of RFID Data Processing Using Dimensionality Reduction Techniques
    Anu, Maria, V
    Mala, G. S. Anandha
    Mathi, K.
    2014 INTERNATIONAL CONFERENCE ON CONTROL, INSTRUMENTATION, COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES (ICCICCT), 2014, : 265 - 268
  • [30] Pre-processing Techniques for Detection of Blurred Images
    Francis, Leena Mary
    Sreenath, N.
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA ENGINEERING (ICCIDE 2018), 2019, 28 : 59 - 66