Cformer: Semi-Supervised Text Clustering Based on Pseudo Labeling

被引:3
|
作者
Hatefi, Arezoo [1 ]
Vu, Xuan-Son [1 ]
Bhuyan, Monowar [1 ]
Drewes, Frank [1 ]
机构
[1] Umea Univ, Dept Comp Sci, Umea, Sweden
关键词
meta pseudo clustering; semi-supervised learning; pseudo labeling;
D O I
10.1145/3459637.3482073
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a semi-supervised learning method called Cformer for automatic clustering of text documents in cases where clusters are described by a small number of labeled examples, while the majority of training examples are unlabeled. We motivate this setting with an application in contextual programmatic advertising, a type of content placement on news pages that does not exploit personal information about visitors but relies on the availability of a high-quality clustering computed on the basis of a small number of labeled samples. To enable text clustering with little training data, Cformer leverages the teacher-student architecture of Meta Pseudo Labels. In addition to unlabeled data, Cformer uses a small amount of labeled data to describe the clusters aimed at. Our experimental results confirm that the performance of the proposed model improves the state-of-the-art if a reasonable amount of labeled data is available. The models are comparatively small and suitable for deployment in constrained environments with limited computing resources. The source code is available at https://github.com/Aha6988/Cformer.
引用
收藏
页码:3078 / 3082
页数:5
相关论文
共 50 条
  • [41] Semi-supervised Classification Based on Clustering Ensembles
    Chen, Si
    Guo, Gongde
    Chen, Lifei
    ARTIFICIAL INTELLIGENCE AND COMPUTATIONAL INTELLIGENCE, PROCEEDINGS, 2009, 5855 : 629 - 638
  • [42] Density-based semi-supervised clustering
    Carlos Ruiz
    Myra Spiliopoulou
    Ernestina Menasalvas
    Data Mining and Knowledge Discovery, 2010, 21 : 345 - 370
  • [43] An efficient semi-supervised graph based clustering
    Viet-Vu Vu
    INTELLIGENT DATA ANALYSIS, 2018, 22 (02) : 297 - 307
  • [44] Semi-Supervised Clustering Based on Exemplars Constraints
    Wang, Sailan
    Yang, Zhenzhi
    Yang, Jin
    Wang, Hongjun
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (06) : 1231 - 1241
  • [45] Density-based semi-supervised clustering
    Ruiz, Carlos
    Spiliopoulou, Myra
    Menasalvas, Ernestina
    DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 21 (03) : 345 - 370
  • [46] Dual Pseudo Supervision for Semi-Supervised Text Classification with a Reliable Teacher
    Li, Shujie
    Yang, Min
    Li, Chengming
    Xu, Ruifeng
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2513 - 2518
  • [47] COLLABORATIVE LEARNING OF SEMI-SUPERVISED CLUSTERING AND CLASSIFICATION FOR LABELING UNCURATED DATA
    Mousavi, Sara
    Lee, Dylan
    Griffin, Tatianna
    Steadman, Dawnie
    Mockus, Audris
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1716 - 1720
  • [48] Pseudo Semi-Supervised General Type-II Fuzzy Clustering
    Torshizi, A. Doostparast
    Zarandi, M. H. Fazel
    Zakeri, H.
    Moghadas Nejad, Fereidoon
    Fahimifar, A.
    2014 IEEE CONFERENCE ON NORBERT WIENER IN THE 21ST CENTURY (21CW), 2014,
  • [49] Graph Segmentation-Based Pseudo-Labeling for Semi-Supervised Pathology Image Classification
    Shin, Hong-Kyu
    Uhmn, Kwang-Hyun
    Choi, Kyuyeon
    Xu, Zhixin
    Jung, Seung-Won
    Ko, Sung-Jea
    IEEE ACCESS, 2022, 10 : 93960 - 93970
  • [50] Feature Affinity-Based Pseudo Labeling for Semi-Supervised Person Re-Identification
    Ding, Guodong
    Zhang, Shanshan
    Khan, Salman
    Tang, Zhenmin
    Zhang, Jian
    Porikli, Fatih
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (11) : 2891 - 2902