A Self-Training Approach for Short Text Clustering

被引:0
|
作者
Hadifar, Amir [1 ]
Sterckx, Lucas [1 ]
Demeester, Thomas [1 ]
Develder, Chris [1 ]
机构
[1] Univ Ghent, IMEC, IDLab, Dept Informat Technol, Ghent, Belgium
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations for short texts. Low-dimensional continuous representations or embeddings can counter that sparseness problem: their high representational power is exploited in deep clustering algorithms. While deep clustering has been studied extensively in computer vision, relatively little work has focused on NLP. The method we propose, learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Experiments on three short text datasets empirically validate the effectiveness of our method.
引用
收藏
页码:194 / 199
页数:6
相关论文
共 50 条
  • [31] A Web-based self-training approach for authorship attribution
    Guzman-Cabrera, Rafael
    Montes-y-Gomez, Manuel
    Rosso, Paolo
    Villasenor-Pineda, Luis
    ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2008, 5221 : 160 - +
  • [32] A pre-training and self-training approach for biomedical named entity recognition
    Gao, Shang
    Kotevska, Olivera
    Sorokine, Alexandre
    Christian, J. Blair
    PLOS ONE, 2021, 16 (02):
  • [33] SETRED: Self-training with editing
    Li, M
    Zhou, ZH
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 611 - 621
  • [34] Attentive Representation Learning With Adversarial Training for Short Text Clustering
    Zhang, Wei
    Dong, Chao
    Yin, Jianhua
    Wang, Jianyong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (11) : 5196 - 5210
  • [35] Deep Bayesian Self-Training
    Fabio De Sousa Ribeiro
    Francesco Calivá
    Mark Swainson
    Kjartan Gudmundsson
    Georgios Leontidis
    Stefanos Kollias
    Neural Computing and Applications, 2020, 32 : 4275 - 4291
  • [36] Confidence Regularized Self-Training
    Zou, Yang
    Yu, Zhiding
    Liu, Xiaofeng
    Kumar, B. V. K. Vijaya
    Wang, Jinsong
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5981 - 5990
  • [37] A hierarchical topic modelling approach for short text clustering
    Pradhan R.
    Sharma D.K.
    International Journal of Information and Communication Technology, 2022, 20 (04): : 463 - 481
  • [38] Self-Training with Weak Supervision
    Karamanolakis, Giannis
    Mukherjee, Subhabrata
    Zheng, Guoqing
    Awadallah, Ahmed Hassan
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 845 - 863
  • [40] KUDOS FOR SELF-TRAINING AIDS
    BRYANT, SF
    COMPUTER DECISIONS, 1984, 16 (14): : 44 - &