Cformer: Semi-Supervised Text Clustering Based on Pseudo Labeling

被引:3
|
作者
Hatefi, Arezoo [1 ]
Vu, Xuan-Son [1 ]
Bhuyan, Monowar [1 ]
Drewes, Frank [1 ]
机构
[1] Umea Univ, Dept Comp Sci, Umea, Sweden
关键词
meta pseudo clustering; semi-supervised learning; pseudo labeling;
D O I
10.1145/3459637.3482073
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a semi-supervised learning method called Cformer for automatic clustering of text documents in cases where clusters are described by a small number of labeled examples, while the majority of training examples are unlabeled. We motivate this setting with an application in contextual programmatic advertising, a type of content placement on news pages that does not exploit personal information about visitors but relies on the availability of a high-quality clustering computed on the basis of a small number of labeled samples. To enable text clustering with little training data, Cformer leverages the teacher-student architecture of Meta Pseudo Labels. In addition to unlabeled data, Cformer uses a small amount of labeled data to describe the clusters aimed at. Our experimental results confirm that the performance of the proposed model improves the state-of-the-art if a reasonable amount of labeled data is available. The models are comparatively small and suitable for deployment in constrained environments with limited computing resources. The source code is available at https://github.com/Aha6988/Cformer.
引用
收藏
页码:3078 / 3082
页数:5
相关论文
共 50 条
  • [1] Prototype-Guided Pseudo Labeling for Semi-Supervised Text Classification
    Yang, Weiyi
    Zhang, Richong
    Chen, Junfan
    Wang, Lihong
    Kim, Jaein
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 16369 - 16382
  • [2] Integrating pseudo labeling with contrastive clustering for transformer-based semi-supervised action recognition
    Li, Nannan
    Huang, Kan
    Wu, Qingtian
    Zhao, Yang
    APPLIED INTELLIGENCE, 2024, 54 (22) : 11177 - 11195
  • [3] Text Classification Using Semi-Supervised Clustering
    Zhang, Wen
    Yoshida, Taketoshi
    Tang, Xijin
    2009 INTERNATIONAL CONFERENCE ON BUSINESS INTELLIGENCE AND FINANCIAL ENGINEERING, PROCEEDINGS, 2009, : 197 - 200
  • [4] Semi-supervised Malicious Domain Detection Based on Meta Pseudo Labeling
    Gao, Yi
    Yuan, Fangfang
    Yang, Jinglin
    Wang, Dakui
    Cao, Cong
    Liu, Yanbing
    COMPUTATIONAL SCIENCE, ICCS 2024, PT II, 2024, 14833 : 312 - 324
  • [5] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning
    Cascante-Bonilla, Paola
    Tan, Fuwen
    Qi, Yanjun
    Ordonez, Vicente
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 6912 - 6920
  • [6] JointMatch: A Unified Approach for Diverse and Collaborative Pseudo-Labeling to Semi-Supervised Text Classification
    Zou, Henry Peng
    Caragea, Cornelia
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7290 - 7301
  • [7] Use of Distributed Semi-Supervised Clustering for Text Classification
    Li, Pei
    Deng, Ze
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2019, 28 (08)
  • [8] Text classification with enhanced semi-supervised fuzzy clustering
    Keswani, G
    Hall, LO
    PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOL 1 & 2, 2002, : 621 - 626
  • [9] Active Learning of Constraints for Semi-supervised Text Clustering
    Huang, Ruizhang
    Lam, Wai
    Zhang, Zhigang
    PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 113 - 124
  • [10] Semi-Supervised Semantic Dynamic Text Clustering Algorithm
    Qian Z.-S.
    Huang R.-Z.
    Wei Q.
    Qin Y.-B.
    Chen Y.-P.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2019, 48 (06): : 803 - 808