Text classification using embeddings: a survey

被引:7
|
作者
da Costa, Liliane Soares [1 ]
Oliveira, Italo L. [1 ]
Fileto, Renato [1 ]
机构
[1] Fed Univ Santa Catarina UFSC, Dept Informat & Stat INE, Campus Reitor Joao David Ferreira Lima, BR-88040900 Florianopolis, SC, Brazil
关键词
Text classification; Feature representation; Embeddings; LABEL; DOCUMENT;
D O I
10.1007/s10115-023-01856-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification results can be hindered when just the bag-of-words model is used for representing features, because it ignores word order and senses, which can vary with the context. Embeddings have recently emerged as a means to circumvent these limitations, allowing considerable performance gains. However, determining the best combinations of classification techniques and embeddings for classifying particular corpora can be challenging. This survey provides a comprehensive review of text classification approaches that employ embeddings. First, it analyzes past and recent advancements in feature representation for text classification. Then, it identifies the combinations of embedding-based feature representations and classification techniques that have provided the best performances for classifying text from distinct corpora, also providing links to the original articles, source code (when available) and data sets used in the performance evaluation. Finally, it discusses current challenges and promising directions for text classification research, such as cost-effectiveness, multi-label classification, and the potential of knowledge graphs and knowledge embeddings to enhance text classification.
引用
收藏
页码:2761 / 2803
页数:43
相关论文
共 50 条
  • [31] The more "similar" the happier: Augmenting text using similarity scoring with neural embeddings for happiness classification
    Yong, Kuan Shyang
    Liew, Jasy Suet Yan
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 60 (03) : 631 - 653
  • [32] Sequence-Based Word Embeddings for Effective Text Classification
    Gomes, Bruno Guilherme
    Murai, Fabricio
    Goussevskaia, Olga
    Couto da Silva, Ana Paula
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2021), 2021, 12801 : 135 - 146
  • [33] Fast and Efficient Text Classification with Class-based Embeddings
    Wehrmann, Jonatas
    Kolling, Camila
    Barros, Rodrigo C.
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [34] A Comparative Study onWord Embeddings in Deep Learning for Text Classification
    Wang, Congcong
    Nulty, Paul
    Lillis, David
    2020 4TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2020, 2020, : 37 - 46
  • [35] Towards Unsupervised Text Classification Leveraging Experts and Word Embeddings
    Haj-Yahia, Zied
    Sieg, Adrien
    Deleris, Lea A.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 371 - 379
  • [36] A Heterogeneous Directed Graph Attention Network for inductive text classification using multilevel semantic embeddings
    Lin, Mu
    Wang, Tao
    Zhu, Yifan
    Li, Xiaobo
    Zhou, Xin
    Wang, Weiping
    KNOWLEDGE-BASED SYSTEMS, 2024, 295
  • [37] The more "similar" the happier: Augmenting text using similarity scoring with neural embeddings for happiness classification
    Kuan Shyang Yong
    Jasy Suet Yan Liew
    Journal of Intelligent Information Systems, 2023, 60 : 631 - 653
  • [38] Heterogeneous Document Embeddings for Cross-Lingual Text Classification
    Moreo, Alejandro
    Pedrotti, Andrea
    Sebastiani, Fabrizio
    36TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2021, 2021, : 685 - 688
  • [39] MULTIMODAL DEPRESSION CLASSIFICATION USING ARTICULATORY COORDINATION FEATURES AND HIERARCHICAL ATTENTION BASED TEXT EMBEDDINGS
    Seneviratne, Nadee
    Espy-Wilson, Carol
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6252 - 6256
  • [40] A Neural Network Approach for Text Classification Using Low Dimensional Joint Embeddings of Words and Knowledge
    da Costa, Liliane Soares
    Oliveira, Italo Lopes
    Fileto, Renato
    INFORMATION INTEGRATION AND WEB INTELLIGENCE, IIWAS 2022, 2022, 13635 : 181 - 194