A Comparative Study on TF-IDF Feature Weighting Method and its Analysis using Unstructured Dataset

被引:0
|
作者
Das, Mamata [1 ]
Kamalanathan, Selvakumar [1 ]
Alphonse, P. J. A. [1 ]
机构
[1] NIT Trichy, Trichy 620015, Tamil Nadu, India
关键词
TF-IDF; N-Gram; Text classification; Feature weighting; Information retrieval; SENTIMENT; REVIEWS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text Classification is the process of categorizing text into the relevant categories and its algorithms are at the core of many Natural Language Processing (NLP). Term FrequencyInverse Document Frequency (TF-IDF) and NLP are the most highly used information retrieval methods in text classification. We have investigated and analyzed the feature weighting method for text classification on unstructured data. The proposed model considered two features NGrams and TF-IDF on the IMDB movie reviews and Amazon Alexa reviews dataset for sentiment analysis. Then we have used the state-of-the-art classifier to validate the method i.e., Support Vector Machine (SVM), Logistic Regression, Multinomial Naive Bayes (Multinomial NB), Random Forest, Decision Tree, and k-nearest neighbors (KNN). From those two feature extractions, a significant increase in feature extraction with TF-IDF features rather than based on N-Gram. TF-IDF got the maximum accuracy (93.81%), precision (94.20%), recall (93.81%), and F1-score (91.99%) value in Random Forest classifier.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Comparative study of nootropic agent Semax and its analogous using molecular dynamic method
    Tereshkina, K. B.
    Shaitan, K. V.
    JOURNAL OF PEPTIDE SCIENCE, 2006, 12 : 191 - 191
  • [42] A Comparative Study of Metabolic Syndrome Using NCEP-ATP III and IDF Criteria in Children and Its Relationship with Biochemical Indicators in Huatusco, Veracruz, Mexico
    Rivadeneyra-Dominguez, Eduardo
    Diaz-Vallejo, Joel Jahaziel
    Prado-Bobadilla, Aurora Guadalupe
    Rodriguez-Landa, Juan Francisco
    CHILDREN-BASEL, 2023, 10 (03):
  • [43] Comparative Study on Elution of Polyvinylpyrrolidone on Dialyzers Using Ultraviolet Analysis and Iodine Method
    Woiterski, Claudia
    Jaeger, Sandra
    Droeschel, Stefan
    ASAIO JOURNAL, 2023, 69 (02) : 225 - 230
  • [44] Prediction of Skin Disease Using Ensemble Data Mining Techniques and Feature Selection Method-a Comparative Study
    Verma, Anurag Kumar
    Pal, Saurabh
    Kumar, Surjeet
    APPLIED BIOCHEMISTRY AND BIOTECHNOLOGY, 2020, 190 (02) : 341 - 359
  • [45] Designing a New Method of Studying Feature-Length Films An Empirical Study and its Critical Analysis
    Canas-Bajo, Jose
    Canas-Bajo, Teresa
    Berki, Eleni
    Valtanen, Juri-Petri
    Saariluoma, Pertti
    PROJECTIONS-THE JOURNAL FOR MOVIES AND MIND, 2019, 13 (03) : 53 - 78
  • [46] High-dimensional mediation analysis for continuous outcome with confounders using overlap weighting method in observational epigenetic study
    Hu, Weiwei
    Chen, Shiyu
    Cai, Jiaxin
    Yang, Yuhui
    Yan, Hong
    Chen, Fangyao
    BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
  • [47] Feature Selection using Ant Colony Optimization (ACO): A new method and comparative study in the application of face recognition system
    Kanan, Hamidreza Rashidy
    Faez, Karim
    Taheri, Sayyed Mostafa
    ADVANCES IN DATA MINING: THEORETICAL ASPECTS AND APPLICATIONS, PROCEEDINGS, 2007, 4597 : 63 - +
  • [48] Predictive Analysis of Students' Learning Performance Using Data Mining Techniques: A Comparative Study of Feature Selection Methods
    Mustapha, S. M. F. D. Syed
    APPLIED SYSTEM INNOVATION, 2023, 6 (05)
  • [49] Comparative study of CNN, LSTM and hybrid CNN-LSTM model in amazigh speech recognition using spectrogram feature extraction and different gender and age dataset
    Telmem, Meryam
    Laaidi, Naouar
    Ghanou, Youssef
    Hamiane, Sanae
    Satori, Hassan
    International Journal of Speech Technology, 2024, 27 (04) : 1121 - 1133
  • [50] Alternative method for canagliflozin oxidation analysis using an electrochemical flow cell-Comparative study
    Vymyslicky, Filip
    Krizek, Tomas
    Kozlik, Petr
    Kubickova, Anna
    Hert, Jakub
    Bartosinska, Ewa
    JOURNAL OF PHARMACEUTICAL AND BIOMEDICAL ANALYSIS, 2022, 207