Text classification using embeddings: a survey

被引:7
|
作者
da Costa, Liliane Soares [1 ]
Oliveira, Italo L. [1 ]
Fileto, Renato [1 ]
机构
[1] Fed Univ Santa Catarina UFSC, Dept Informat & Stat INE, Campus Reitor Joao David Ferreira Lima, BR-88040900 Florianopolis, SC, Brazil
关键词
Text classification; Feature representation; Embeddings; LABEL; DOCUMENT;
D O I
10.1007/s10115-023-01856-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification results can be hindered when just the bag-of-words model is used for representing features, because it ignores word order and senses, which can vary with the context. Embeddings have recently emerged as a means to circumvent these limitations, allowing considerable performance gains. However, determining the best combinations of classification techniques and embeddings for classifying particular corpora can be challenging. This survey provides a comprehensive review of text classification approaches that employ embeddings. First, it analyzes past and recent advancements in feature representation for text classification. Then, it identifies the combinations of embedding-based feature representations and classification techniques that have provided the best performances for classifying text from distinct corpora, also providing links to the original articles, source code (when available) and data sets used in the performance evaluation. Finally, it discusses current challenges and promising directions for text classification research, such as cost-effectiveness, multi-label classification, and the potential of knowledge graphs and knowledge embeddings to enhance text classification.
引用
收藏
页码:2761 / 2803
页数:43
相关论文
共 50 条
  • [41] Multilabeled Emotions Classification in Software Engineering Text Using Convolutional Neural Networks and Word Embeddings
    Wagan, Atif Ali
    Li, Shuaiyong
    JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2025, 37 (03)
  • [42] Evaluating the construct validity of text embeddings with application to survey questions
    Fang, Qixiang
    Nguyen, Dong
    Oberski, Daniel L.
    EPJ DATA SCIENCE, 2022, 11 (01)
  • [43] Evaluating the construct validity of text embeddings with application to survey questions
    Qixiang Fang
    Dong Nguyen
    Daniel L. Oberski
    EPJ Data Science, 11
  • [44] A Survey on Text Classification Algorithms: From Text to Predictions
    Gasparetto, Andrea
    Marcuzzo, Matteo
    Zangari, Alessandro
    Albarelli, Andrea
    INFORMATION, 2022, 13 (02)
  • [45] Decoding Emotions in Text Using GloVe Embeddings
    Gupta, Piyush
    Roy, Inika
    Batra, Gunnika
    Dubey, Arun Kumar
    2021 IEEE INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, AND INTELLIGENT SYSTEMS (ICCCIS), 2021, : 36 - 40
  • [46] Automatic Text Summarization using Word Embeddings
    Easwar, Arjun
    Uthra, Annie
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 1065 - 1079
  • [47] A survey of Arabic text classification approaches
    Sayed, Mostafa
    Salem, Rashed K.
    Khder, Ayman E.
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2019, 59 (03) : 236 - 251
  • [48] Nationality Classification Using Name Embeddings
    Ye, Junting
    Han, Shuchu
    Hu, Yifan
    Coskun, Baris
    Liu, Meizhu
    Qin, Hong
    Skiena, Steven
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 1897 - 1906
  • [49] A survey on text classification and its applications
    Zhou, Xujuan
    Gururajan, Raj
    Li, Yuefeng
    Venkataraman, Revathi
    Tao, Xiaohui
    Bargshady, Ghazal
    Barua, Prabal D.
    Kondalsamy-Chennakesavan, Srinivas
    WEB INTELLIGENCE, 2020, 18 (03) : 205 - 216
  • [50] A SURVEY ON CLASSIFICATION TECHNIQUES FOR TEXT MINING
    Brindha, S.
    Sukumaran, S.
    Prabha, K.
    2016 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2016,