A Self-Training Approach for Short Text Clustering

被引:0
|
作者
Hadifar, Amir [1 ]
Sterckx, Lucas [1 ]
Demeester, Thomas [1 ]
Develder, Chris [1 ]
机构
[1] Univ Ghent, IMEC, IDLab, Dept Informat Technol, Ghent, Belgium
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations for short texts. Low-dimensional continuous representations or embeddings can counter that sparseness problem: their high representational power is exploited in deep clustering algorithms. While deep clustering has been studied extensively in computer vision, relatively little work has focused on NLP. The method we propose, learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Experiments on three short text datasets empirically validate the effectiveness of our method.
引用
收藏
页码:194 / 199
页数:6
相关论文
共 50 条
  • [41] Doubly Robust Self-Training
    Zhu, Banghua
    Ding, Mingyu
    Jacobson, Philip
    Wu, Ming
    Zhan, Wei
    Jordan, Michael I.
    Jiao, Jiantao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Deep Bayesian Self-Training
    Ribeiro, Fabio De Sousa
    Caliva, Francesco
    Swainson, Mark
    Gudmundsson, Kjartan
    Leontidis, Georgios
    Kollias, Stefanos
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (09): : 4275 - 4291
  • [43] RECURSIVE SELF-TRAINING ALGORITHMS
    TSYPKIN, YZ
    KELMANS, GK
    ENGINEERING CYBERNETICS, 1967, (05): : 70 - &
  • [44] KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation
    Feng, Yuxi
    Yi, Xiaoyuan
    Lakshmanan, Laks V. S.
    Xie, Xing
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5049 - 5057
  • [45] Rethinking Pre-training and Self-training
    Zoph, Barret
    Ghiasi, Golnaz
    Lin, Tsung-Yi
    Cui, Yin
    Liu, Hanxiao
    Cubuk, Ekin D.
    Le, Quoc V.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [46] A self-training subspace clustering algorithm based on adaptive confidence for gene expression data
    Li, Dan
    Liang, Hongnan
    Qin, Pan
    Wang, Jia
    FRONTIERS IN GENETICS, 2023, 14
  • [47] An Approach for Self-Training Audio Event Detectors Using Web Data
    Elizalde, Benjamin
    Shah, Ankit
    Dalmia, Siddharth
    Lee, Min Hun
    Badlani, Rohan
    Kumar, Anurag
    Raj, Bhiksha
    Lane, Ian
    2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 1863 - 1867
  • [48] A Self-training Approach for Few-Shot Named Entity Recognition
    Qian, Yudong
    Zheng, Weiguo
    WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 183 - 191
  • [49] Transductive Zero-Shot Learning With a Self-Training Dictionary Approach
    Yu, Yunlong
    Ji, Zhong
    Li, Xi
    Guo, Jichang
    Zhang, Zhongfei
    Ling, Haibin
    Wu, Fei
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (10) : 2908 - 2919
  • [50] A Novel Self-training Approach for Low-resource Speech Recognition
    Singh, Satwinder
    Hou, Feng
    Wang, Ruili
    INTERSPEECH 2023, 2023, : 1588 - 1592