Cformer: Semi-Supervised Text Clustering Based on Pseudo Labeling

被引:3
|
作者
Hatefi, Arezoo [1 ]
Vu, Xuan-Son [1 ]
Bhuyan, Monowar [1 ]
Drewes, Frank [1 ]
机构
[1] Umea Univ, Dept Comp Sci, Umea, Sweden
关键词
meta pseudo clustering; semi-supervised learning; pseudo labeling;
D O I
10.1145/3459637.3482073
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a semi-supervised learning method called Cformer for automatic clustering of text documents in cases where clusters are described by a small number of labeled examples, while the majority of training examples are unlabeled. We motivate this setting with an application in contextual programmatic advertising, a type of content placement on news pages that does not exploit personal information about visitors but relies on the availability of a high-quality clustering computed on the basis of a small number of labeled samples. To enable text clustering with little training data, Cformer leverages the teacher-student architecture of Meta Pseudo Labels. In addition to unlabeled data, Cformer uses a small amount of labeled data to describe the clusters aimed at. Our experimental results confirm that the performance of the proposed model improves the state-of-the-art if a reasonable amount of labeled data is available. The models are comparatively small and suitable for deployment in constrained environments with limited computing resources. The source code is available at https://github.com/Aha6988/Cformer.
引用
收藏
页码:3078 / 3082
页数:5
相关论文
共 50 条
  • [31] GENERALIZED PSEUDO-LABELING IN CONSISTENCY REGULARIZATION FOR SEMI-SUPERVISED LEARNING
    Karaliolios, Nikolaos
    Chabot, Florian
    Dupont, Camille
    Le Borgne, Herve
    Quoc-Cuong Pham
    Audigier, Romaric
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 525 - 529
  • [32] Multiview Pseudo-Labeling for Semi-supervised Learning from Video
    Xiong, Bo
    Fan, Haoqi
    Grauman, Kristen
    Feichtenhofer, Christoph
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 7189 - 7199
  • [33] PseCo: Pseudo Labeling and Consistency Training for Semi-Supervised Object Detection
    Li, Gang
    Li, Xiang
    Wang, Yujie
    Wu, Yichao
    Liang, Ding
    Zhang, Shanshan
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 457 - 472
  • [34] Pseudo-Labeling and Confirmation Bias in Deep Semi-Supervised Learning
    Arazo, Eric
    Ortego, Diego
    Albert, Paul
    O'Connor, Noel E.
    McGuinness, Kevin
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [35] Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition
    Zhu H.
    Gao D.
    Cheng G.
    Povey D.
    Zhang P.
    Yan Y.
    IEEE/ACM Transactions on Audio Speech and Language Processing, 2023, 31 : 3320 - 3330
  • [36] Class Balanced Adaptive Pseudo Labeling for Federated Semi-Supervised Learning
    Li, Ming
    Li, Qingli
    Wang, Yan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 16292 - 16301
  • [37] DYMatch: Semi-Supervised Learning with Dynamic Pseudo Labeling and Feature Consistency
    Mao, Zhongjie
    Pan, Feng
    Sun, Jun
    APPLIED SCIENCES-BASEL, 2024, 14 (01):
  • [38] TEXT CLASSIFICATION BASED ON SEMI-SUPERVISED LEARNING
    Vo Duy Thanh
    Vo Trung Hung
    Pham Minh Tuan
    Doan Van Ban
    2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 232 - 236
  • [39] MVS-based Semi-Supervised Clustering
    Yan, Yang
    Chen, Lihui
    Chan, Chee Keong
    2013 9TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS AND SIGNAL PROCESSING (ICICS), 2013,
  • [40] Semi-Supervised Density-Based Clustering
    Lelis, Levi
    Sander, Joerg
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 842 - 847