A Self-Training Approach for Short Text Clustering

被引:0
|
作者
Hadifar, Amir [1 ]
Sterckx, Lucas [1 ]
Demeester, Thomas [1 ]
Develder, Chris [1 ]
机构
[1] Univ Ghent, IMEC, IDLab, Dept Informat Technol, Ghent, Belgium
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations for short texts. Low-dimensional continuous representations or embeddings can counter that sparseness problem: their high representational power is exploited in deep clustering algorithms. While deep clustering has been studied extensively in computer vision, relatively little work has focused on NLP. The method we propose, learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Experiments on three short text datasets empirically validate the effectiveness of our method.
引用
收藏
页码:194 / 199
页数:6
相关论文
共 50 条
  • [21] Improving Compositional Generalization with Self-Training for Data-to-Text Generation
    Mehta, Sanket Vaibhav
    Rao, Jinfeng
    Tay, Yi
    Kale, Mihir
    Parikh, Ankur P.
    Strubell, Emma
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4205 - 4219
  • [22] Analysis of training data using clustering to improve semi-supervised self-training
    Piroonsup, N.
    Sinthupinyo, S.
    KNOWLEDGE-BASED SYSTEMS, 2018, 143 : 65 - 80
  • [23] A self-training approach to cost sensitive uncertainty sampling
    Alexander Liu
    Goo Jun
    Joydeep Ghosh
    Machine Learning, 2009, 76 : 257 - 270
  • [24] A self-training approach to cost sensitive uncertainty sampling
    Liu, Alexander
    Jun, Goo
    Ghosh, Joydeep
    MACHINE LEARNING, 2009, 76 (2-3) : 257 - 270
  • [25] Self-training ABS
    Akhmetshin, A.M.
    Avtomobil'naya Promyshlennost, 2001, (06): : 34 - 36
  • [26] Self-training: A survey
    Amini, Massih-Reza
    Feofanov, Vasilii
    Pauletto, Loic
    Hadjadj, Lies
    Devijver, Emilie
    Maximov, Yury
    NEUROCOMPUTING, 2025, 616
  • [27] Category-aware self-training for extremely weakly supervised text classification
    Su, Jing
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 269
  • [28] Combining Coregularization and Consensus-based Self-Training for Multilingual Text Categorization
    Amini, Massih-Reza
    Goutte, Cyril
    Usunier, Nicolas
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 475 - 482
  • [29] Uncertainty-aware Self-training for Few-shot Text Classification
    Mukherjee, Subhabrata
    Awadallah, Ahmed Hassan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [30] Autonomous Terrain Classification With Co- and Self-Training Approach
    Otsu, Kyohei
    Ono, Masahiro
    Fuchs, Thomas J.
    Baldwin, Ian
    Kubota, Takashi
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2016, 1 (02): : 814 - 819