A Self-Training Approach for Short Text Clustering

被引:0
|
作者
Hadifar, Amir [1 ]
Sterckx, Lucas [1 ]
Demeester, Thomas [1 ]
Develder, Chris [1 ]
机构
[1] Univ Ghent, IMEC, IDLab, Dept Informat Technol, Ghent, Belgium
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations for short texts. Low-dimensional continuous representations or embeddings can counter that sparseness problem: their high representational power is exploited in deep clustering algorithms. While deep clustering has been studied extensively in computer vision, relatively little work has focused on NLP. The method we propose, learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Experiments on three short text datasets empirically validate the effectiveness of our method.
引用
收藏
页码:194 / 199
页数:6
相关论文
共 50 条
  • [1] Self-training for Handwritten Text Line Recognition
    Frinken, Volkmar
    Bunke, Horst
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, 2010, 6419 : 104 - 112
  • [2] Constrained Spectral Clustering Network with Self-Training
    Liu, Xinyue
    Yang, Shichong
    Zong, Linlin
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2861 - 2866
  • [3] Self-training method based on GCN for semi-supervised short text classification
    Cui, Hongyan
    Wang, Gangkun
    Li, Yuanxin
    Welsch, Roy E.
    INFORMATION SCIENCES, 2022, 611 : 18 - 29
  • [4] Self-Training for Domain Adaptive Scene Text Detection
    Chen, Yudi
    Wang, Wei
    Zhou, Yu
    Yang, Fei
    Yang, Dongbao
    Wang, Weiping
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 850 - 857
  • [5] Using the Web as corpus for self-training text categorization
    Rafael Guzmán-Cabrera
    Manuel Montes-y-Gómez
    Paolo Rosso
    Luis Villaseñor-Pineda
    Information Retrieval, 2009, 12 : 400 - 415
  • [6] An approach to mobile robot self-training
    Golovko, V
    Ignatiuk, O
    Sauta, V
    PROCEEDINGS OF THE IEEE INTELLIGENT VEHICLES SYMPOSIUM 2000, 2000, : 608 - 613
  • [7] Initialization Independent Clustering With Actively Self-Training Method
    Nie, Feiping
    Xu, Dong
    Li, Xuelong
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2012, 42 (01): : 17 - 27
  • [8] Text Classification Based on Transfer Learning and Self-Training
    Zheng, Yabin
    Teng, Shaohua
    Liu, Zhiyuan
    Sun, Maosong
    ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 3, PROCEEDINGS, 2008, : 363 - 367
  • [9] Using the Web as corpus for self-training text categorization
    Guzman-Cabrera, Rafael
    Montes-y-Gomez, Manuel
    Rosso, Paolo
    Villasenor-Pineda, Luis
    INFORMATION RETRIEVAL, 2009, 12 (03): : 400 - 415
  • [10] An approach to self-training of the mobile robot
    Golovko, V
    Ignatiuk, O
    Sadykhov, R
    IDAACS'2001: PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATION, 2001, : 11 - 15