A Self-Training Approach for Short Text Clustering

被引：0

作者：

Hadifar, Amir ^{[1
]}

Sterckx, Lucas ^{[1
]}

Demeester, Thomas ^{[1
]}

Develder, Chris ^{[1
]}

机构：

[1] Univ Ghent, IMEC, IDLab, Dept Informat Technol, Ghent, Belgium

来源：

4TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2019) | 2019年

关键词：

D O I：

暂无

中图分类号：

H0 [语言学];

学科分类号：

030303 ; 0501 ; 050102 ;

摘要：

Short text clustering is a challenging problem when adopting traditional bag-of-words or TF-IDF representations, since these lead to sparse vector representations for short texts. Low-dimensional continuous representations or embeddings can counter that sparseness problem: their high representational power is exploited in deep clustering algorithms. While deep clustering has been studied extensively in computer vision, relatively little work has focused on NLP. The method we propose, learns discriminative features from both an autoencoder and a sentence embedding, then uses assignments from a clustering algorithm as supervision to update weights of the encoder network. Experiments on three short text datasets empirically validate the effectiveness of our method.

引用

页码：194 / 199

页数：6

共 50 条

[41] Doubly Robust Self-Training
Zhu, Banghua
Ding, Mingyu
Jacobson, Philip
Wu, Ming
Zhan, Wei
Jordan, Michael I.
Jiao, Jiantao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[42] Deep Bayesian Self-Training
Ribeiro, Fabio De Sousa
Caliva, Francesco
Swainson, Mark
Gudmundsson, Kjartan
Leontidis, Georgios
Kollias, Stefanos
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (09): : 4275 - 4291
[43] RECURSIVE SELF-TRAINING ALGORITHMS
TSYPKIN, YZ
KELMANS, GK
ENGINEERING CYBERNETICS, 1967, (05): : 70 - &
[44] KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation
Feng, Yuxi
Yi, Xiaoyuan
Lakshmanan, Laks V. S.
Xie, Xing
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 5049 - 5057
[45] Rethinking Pre-training and Self-training
Zoph, Barret
Ghiasi, Golnaz
Lin, Tsung-Yi
Cui, Yin
Liu, Hanxiao
Cubuk, Ekin D.
Le, Quoc V.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[46] A self-training subspace clustering algorithm based on adaptive confidence for gene expression data
Li, Dan
Liang, Hongnan
Qin, Pan
Wang, Jia
FRONTIERS IN GENETICS, 2023, 14
[47] An Approach for Self-Training Audio Event Detectors Using Web Data
Elizalde, Benjamin
Shah, Ankit
Dalmia, Siddharth
Lee, Min Hun
Badlani, Rohan
Kumar, Anurag
Raj, Bhiksha
Lane, Ian
2017 25TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2017, : 1863 - 1867
[48] A Self-training Approach for Few-Shot Named Entity Recognition
Qian, Yudong
Zheng, Weiguo
WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 183 - 191
[49] Transductive Zero-Shot Learning With a Self-Training Dictionary Approach
Yu, Yunlong
Ji, Zhong
Li, Xi
Guo, Jichang
Zhang, Zhongfei
Ling, Haibin
Wu, Fei
IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (10) : 2908 - 2919
[50] A Novel Self-training Approach for Low-resource Speech Recognition
Singh, Satwinder
Hou, Feng
Wang, Ruili
INTERSPEECH 2023, 2023, : 1588 - 1592

← 1 2 3 4 5 →