A Comparative Study on TF-IDF Feature Weighting Method and its Analysis using Unstructured Dataset

被引:0
|
作者
Das, Mamata [1 ]
Kamalanathan, Selvakumar [1 ]
Alphonse, P. J. A. [1 ]
机构
[1] NIT Trichy, Trichy 620015, Tamil Nadu, India
关键词
TF-IDF; N-Gram; Text classification; Feature weighting; Information retrieval; SENTIMENT; REVIEWS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text Classification is the process of categorizing text into the relevant categories and its algorithms are at the core of many Natural Language Processing (NLP). Term FrequencyInverse Document Frequency (TF-IDF) and NLP are the most highly used information retrieval methods in text classification. We have investigated and analyzed the feature weighting method for text classification on unstructured data. The proposed model considered two features NGrams and TF-IDF on the IMDB movie reviews and Amazon Alexa reviews dataset for sentiment analysis. Then we have used the state-of-the-art classifier to validate the method i.e., Support Vector Machine (SVM), Logistic Regression, Multinomial Naive Bayes (Multinomial NB), Random Forest, Decision Tree, and k-nearest neighbors (KNN). From those two feature extractions, a significant increase in feature extraction with TF-IDF features rather than based on N-Gram. TF-IDF got the maximum accuracy (93.81%), precision (94.20%), recall (93.81%), and F1-score (91.99%) value in Random Forest classifier.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] BERT- and TF-IDF-based feature extraction for long-lived bug prediction in FLOSS: A comparative study
    Gomes, Luiz
    Torres, Ricardo da Silva
    Cortes, Mario Lucio
    INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 160
  • [32] Investigating response behavior through TF-IDF and Word2vec text analysis: A case study of PISA 2012 problem-solving process data
    Zhou, Jing
    Ye, Zhanliang
    Zhang, Sheng
    Geng, Zhao
    Han, Ning
    Yang, Tao
    HELIYON, 2024, 10 (16)
  • [33] Comparative Study of Induction Motor Fault Analysis Using Feature Extraction
    Thakur, Arunava Kabiraj
    Kundu, Palash Kumar
    Das, Arabinda
    2017 IEEE CALCUTTA CONFERENCE (CALCON), 2017, : 150 - 154
  • [34] Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset
    Kasongo, Sydney M.
    Sun, Yanxia
    JOURNAL OF BIG DATA, 2020, 7 (01)
  • [35] Performance Analysis of Intrusion Detection Systems Using a Feature Selection Method on the UNSW-NB15 Dataset
    Sydney M. Kasongo
    Yanxia Sun
    Journal of Big Data, 7
  • [36] A multiparametric analysis to study the lymphocyte subsets by using a comparative method
    Ortolani, R.
    Vella, A.
    Bellavite, P.
    Paiola, F.
    Martini, M.
    Marchesini, M.
    Tridente, G.
    CYTOMETRY PART A, 2008, 73A (01) : 95 - 96
  • [37] NIR Spectral Feature Selection Using Lasso Method and Its Application in the Classification Analysis
    Li Yu-qiang
    Pan Tian-hong
    Li Hao-ran
    Zou Xiao-bo
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2019, 39 (12) : 3809 - 3815
  • [38] A COMPARATIVE STUDY ON THE EXERGOECONOMIC ANALYSIS OF TEXTILE DRYERS USING SPECO METHOD
    Cay, Ahmet
    Tarakcioglu, Isik
    Hepbasli, Arif
    TEKSTIL VE KONFEKSIYON, 2012, 22 (02): : 125 - 131
  • [39] Prediction of Skin Disease Using Ensemble Data Mining Techniques and Feature Selection Method—a Comparative Study
    Anurag Kumar Verma
    Saurabh Pal
    Surjeet Kumar
    Applied Biochemistry and Biotechnology, 2020, 190 : 341 - 359
  • [40] Structural and vibrational spectroscopic analysis of anticancer drug mitotane using DFT method; a comparative study of its parent structure
    Mariappan, G.
    Sundaraganesan, N.
    JOURNAL OF MOLECULAR STRUCTURE, 2015, 1086 : 73 - 85