Text classification using embeddings: a survey

被引:7
|
作者
da Costa, Liliane Soares [1 ]
Oliveira, Italo L. [1 ]
Fileto, Renato [1 ]
机构
[1] Fed Univ Santa Catarina UFSC, Dept Informat & Stat INE, Campus Reitor Joao David Ferreira Lima, BR-88040900 Florianopolis, SC, Brazil
关键词
Text classification; Feature representation; Embeddings; LABEL; DOCUMENT;
D O I
10.1007/s10115-023-01856-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification results can be hindered when just the bag-of-words model is used for representing features, because it ignores word order and senses, which can vary with the context. Embeddings have recently emerged as a means to circumvent these limitations, allowing considerable performance gains. However, determining the best combinations of classification techniques and embeddings for classifying particular corpora can be challenging. This survey provides a comprehensive review of text classification approaches that employ embeddings. First, it analyzes past and recent advancements in feature representation for text classification. Then, it identifies the combinations of embedding-based feature representations and classification techniques that have provided the best performances for classifying text from distinct corpora, also providing links to the original articles, source code (when available) and data sets used in the performance evaluation. Finally, it discusses current challenges and promising directions for text classification research, such as cost-effectiveness, multi-label classification, and the potential of knowledge graphs and knowledge embeddings to enhance text classification.
引用
收藏
页码:2761 / 2803
页数:43
相关论文
共 50 条
  • [21] Knowledge-enhanced document embeddings for text classification
    Sinoara, Roberta A.
    Camacho-Collados, Jose
    Rossi, Rafael G.
    Navigli, Roberto
    Rezende, Solange O.
    KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 955 - 971
  • [22] Scholarly Text Classification with Sentence BERT and Entity Embeddings
    Piao, Guangyuan
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2021, 2021, 12705 : 79 - 87
  • [23] An Approach Based on Semantic Relationship Embeddings for Text Classification
    Laura Lezama-Sanchez, Ana
    Tovar Vidal, Mireya
    Reyes-Ortiz, Jose A.
    MATHEMATICS, 2022, 10 (21)
  • [24] Arabic Text Classification Based on Word and Document Embeddings
    El Mahdaouy, Abdelkader
    Gaussier, Eric
    El Alaoui, Said Ouatik
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENT SYSTEMS AND INFORMATICS 2016, 2017, 533 : 32 - 41
  • [25] Text sentiment classification of Amazon reviews using word embeddings and convolutional neural networks
    Mohammed Qorich
    Rajae El Ouazzani
    The Journal of Supercomputing, 2023, 79 : 11029 - 11054
  • [26] Text sentiment classification of Amazon reviews using word embeddings and convolutional neural networks
    Qorich, Mohammed
    El Ouazzani, Rajae
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (10): : 11029 - 11054
  • [27] Job Demand Estimation Using Text Embeddings of Patent Classification Codes and Occupational Data
    Ha, Taehyun
    Moon, Ahram
    IEEE ACCESS, 2025, 13 : 34854 - 34864
  • [28] Text Classification Algorithms: A Survey
    Kowsari, Kamran
    Meimandi, Kiana Jafari
    Heidarysafa, Mojtaba
    Mendu, Sanjana
    Barnes, Laura
    Brown, Donald
    INFORMATION, 2019, 10 (04)
  • [29] Text Classification Using Machine Learning Methods-A Survey
    Agarwal, Basant
    Mittal, Namita
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2012), 2014, 236 : 701 - 709
  • [30] Task-Optimized Word Embeddings for Text Classification Representations
    Gupta, Sukrat
    Kanchinadam, Teja
    Conathan, Devin
    Fung, Glenn
    FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2020, 5