Supervised term-category feature weighting for improved text classification

被引:12
|
作者
Attieh, Joseph [1 ]
Tekli, Joe [1 ,2 ]
机构
[1] Lebanese Amer Univ LAU, Elect & Comp Engn Dept, Byblos 36, Lebanon
[2] Univ Pay & Pays Adour UPPA, LIUPPA Lab, SPIDER Res Team, F-64600 Anglet, Aquitaine, France
关键词
Text classification; Document and text processing; Feature Engineering; Supervised term weighting; Inverse Category Frequency; TF-IDF; Text representation; SCHEMES; MODEL;
D O I
10.1016/j.knosys.2022.110215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is a central task in Natural Language Processing (NLP) that aims at categorizing text documents into predefined classes or categories. It requires appropriate features to describe the contents and meaning of text documents, and map them with their target categories. Existing text feature representations rely on a weighted representation of the document terms. Hence, choosing a suitable method for term weighting is of major importance and can help increase the effectiveness of the classification task. In this study, we provide a novel text classification framework for Category -based Feature Engineering titled CFE. It consists of a supervised weighting scheme defined based on a variant of the TF-ICF (Term Frequency-Inverse Category Frequency) model, embedded into three new lean classification approaches: (i) IterativeAdditive (flat), (ii) GradientDescentANN (1-layered), and (iii) FeedForwardANN (2-layered). The IterativeAdditive approach augments each document representation with a set of synthetic features inferred from TF-ICF category representations. It builds a term-category TF-ICF matrix using an iterative and additive algorithm that produces category vector representations and updates until reaching convergence. GradientDescentANN replaces the iterative additive process mentioned previously by computing the term-category matrix using a gradient descent ANN model. Training the ANN using the gradient descent algorithm allows updating the term-category matrix until reaching convergence. FeedForwardANN uses a feed-forward ANN model to transform document representations into the category vector space. The transformed document vectors are then compared with the target category vectors, and are associated with the most similar categories. We have implemented CFE including its three classification approaches, and we have conducted a large battery of tests to evaluate their performance. Experimental results on five benchmark datasets show that our lean approaches mostly improve text classification accuracy while requiring significantly less computation time compared with their deep model alternatives.(c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页数:17
相关论文
共 50 条
  • [11] Improved inverse gravity moment term weighting for text classification
    Dogan, Turgut
    Uysal, Alper Kursat
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 130 : 45 - 59
  • [12] Supervised Graph-Based Term Weighting Scheme for Effective Text Classification
    Shanavas, Niloofer
    Wang, Hui
    Lin, Zhiwei
    Hawe, Glenn
    ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 1710 - 1711
  • [13] Supervised term weighting for automated text categorization
    Debole, F
    Sebastiani, F
    TEXT MINING AND ITS APPLICATIONS, 2004, 138 : 81 - 97
  • [14] Combining supervised term-weighting metrics for SVM text classification with extended term representation
    Mounia Haddoud
    Aïcha Mokhtari
    Thierry Lecroq
    Saïd Abdeddaïm
    Knowledge and Information Systems, 2016, 49 : 909 - 931
  • [15] Combining supervised term-weighting metrics for SVM text classification with extended term representation
    Haddoud, Mounia
    Mokhtari, Aicha
    Lecroq, Thierry
    Abdeddaim, Said
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 49 (03) : 909 - 931
  • [16] An improved term weighting method based on relevance frequency for text classification
    Li, Chuanxiao
    Li, Wenqiang
    Tang, Zhong
    Li, Song
    Xiang, Hai
    SOFT COMPUTING, 2023, 27 (07) : 3563 - 3579
  • [17] An improved term weighting method based on relevance frequency for text classification
    Chuanxiao Li
    Wenqiang Li
    Zhong Tang
    Song Li
    Hai Xiang
    Soft Computing, 2023, 27 : 3563 - 3579
  • [18] A Text Classification Algorithm based on Feature Weighting
    Yang, Han
    Cui, Honggang
    Tang, Hao
    GREEN ENERGY AND SUSTAINABLE DEVELOPMENT I, 2017, 1864
  • [19] Effective Text Classification Through Supervised Rough Set-Based Term Weighting
    Cekik, Rasim
    SYMMETRY-BASEL, 2025, 17 (01):
  • [20] Medical query generation by term-category correlation
    Liu, Rey-Long
    Huang, Yi-Chih
    INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (01) : 68 - 79