A Comparative Study on TF-IDF Feature Weighting Method and its Analysis using Unstructured Dataset

被引:0
|
作者
Das, Mamata [1 ]
Kamalanathan, Selvakumar [1 ]
Alphonse, P. J. A. [1 ]
机构
[1] NIT Trichy, Trichy 620015, Tamil Nadu, India
关键词
TF-IDF; N-Gram; Text classification; Feature weighting; Information retrieval; SENTIMENT; REVIEWS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text Classification is the process of categorizing text into the relevant categories and its algorithms are at the core of many Natural Language Processing (NLP). Term FrequencyInverse Document Frequency (TF-IDF) and NLP are the most highly used information retrieval methods in text classification. We have investigated and analyzed the feature weighting method for text classification on unstructured data. The proposed model considered two features NGrams and TF-IDF on the IMDB movie reviews and Amazon Alexa reviews dataset for sentiment analysis. Then we have used the state-of-the-art classifier to validate the method i.e., Support Vector Machine (SVM), Logistic Regression, Multinomial Naive Bayes (Multinomial NB), Random Forest, Decision Tree, and k-nearest neighbors (KNN). From those two feature extractions, a significant increase in feature extraction with TF-IDF features rather than based on N-Gram. TF-IDF got the maximum accuracy (93.81%), precision (94.20%), recall (93.81%), and F1-score (91.99%) value in Random Forest classifier.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Sentiment analysis using TF-IDF weighting of UK MPs' tweets on Brexit
    Mee, Alexander
    Homapour, Elmina
    Chiclana, Francisco
    Engel, Ofer
    KNOWLEDGE-BASED SYSTEMS, 2021, 228
  • [2] Emotion Analysis in Text using TF-IDF
    Sundaram, Varun
    Ahmed, Saad
    Muqtadeer, Shaik Abdul
    Reddy, R. Ravinder
    2021 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING, DATA SCIENCE & ENGINEERING (CONFLUENCE 2021), 2021, : 292 - 297
  • [3] Comparative analysis of TF-IDF and loglikelihood method for keywords extraction of twitter data
    Abid, Muhammad Adeel
    Mushtaq, Muhammad Faheem
    Akram, Urooj
    Abbasi, Mateen Ahmed
    Rustam, Furqan
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2023, 42 (01) : 88 - 94
  • [4] A Novel Approach for Feature Selection Method TF-IDF in Document Clustering
    Patil, Leena. H.
    Atique, Mohammed
    PROCEEDINGS OF THE 2013 3RD IEEE INTERNATIONAL ADVANCE COMPUTING CONFERENCE (IACC), 2013, : 858 - 862
  • [5] Implementation of Information Retrieval Using Tf-Idf Weighting Method On Detik.Com's Website
    Khusna, Arfiani Nur
    Agustina, Indri
    2018 12TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATION SYSTEMS, SERVICES, AND APPLICATIONS (TSSA), 2018,
  • [6] Analysis of TF-IDF Model and its Variant for Document Retrieval
    Mishra, Apra
    Vishwakarma, Santosh
    2015 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS (CICN), 2015, : 772 - 776
  • [7] A Scalable Method for Detecting Multiple Loci Associated with Traits using TF-IDF Weighting and Association Rule Mining
    Lee, Sunwon
    Kang, Jaewoo
    Oh, Junho
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 318 - 323
  • [8] An Approach to Modelling User Interests Using TF-IDF and Fuzzy Sets Qualitative Comparative Analysis
    Kardaras, Dimitris K.
    Kaperonis, Stavros
    Barbounaki, Stavroula
    Petrounias, Ilias
    Bithas, Kostas
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2018, 2018, 519 : 606 - 615
  • [9] Improved Bayes Method Based on TF-IDF Feature and Grade Factor Feature for Chinese Information Classification
    Qu, Zhaowei
    Song, Xiaomin
    Zheng, Shuqiang
    Wang, Xiaoru
    Song, Xiaohui
    Li, Zuquan
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2018, : 677 - 680
  • [10] FEATURE SELECTION OPTIMIZATION METHOD OF TRAFFIC CONGESTION CASE DATABASE BASED ON TF-IDF ALGORITHM
    Zhang, Hao
    UPB Scientific Bulletin, Series C: Electrical Engineering and Computer Science, 2022, 84 (04): : 235 - 248