Multi-label dataless text classification with topic modeling

被引:0
|
作者
Daochen Zha
Chenliang Li
机构
[1] Wuhan University,School of Computer Science
[2] Wuhan University,School of Cyber Science and Engineering
来源
关键词
Dataless text classification; Topic model; Multi-label text classification; Spike and slab prior;
D O I
暂无
中图分类号
学科分类号
摘要
Manually labeling documents is tedious and expensive, but it is essential for training a traditional text classifier. In recent years, a few dataless text classification techniques have been proposed to address this problem. However, existing works mainly center on single-label classification problems, that is, each document is restricted to belonging to a single category. In this paper, we propose a novel Seed-guided Multi-label Topic Model, named SMTM. With a few seed words relevant to each category, SMTM conducts multi-label classification for a collection of documents without any labeled document. In SMTM, each category is associated with a single category-topic which covers the meaning of the category. To accommodate with multi-label documents, we explicitly model the category sparsity in SMTM by using spike and slab prior and weak smoothing prior. That is, without using any threshold tuning, SMTM automatically selects the relevant categories for each document. To incorporate the supervision of the seed words, we propose a seed-guided biased GPU (i.e., generalized Pólya urn) sampling procedure to guide the topic inference of SMTM. Experiments on two public datasets show that SMTM achieves better classification accuracy than state-of-the-art alternatives and even outperforms supervised solutions in some scenarios.
引用
收藏
页码:137 / 160
页数:23
相关论文
共 50 条
  • [1] Multi-label dataless text classification with topic modeling
    Zha, Daochen
    Li, Chenliang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 137 - 160
  • [2] Online multi-label dependency topic models for text classification
    Sophie Burkhardt
    Stefan Kramer
    Machine Learning, 2018, 107 : 859 - 886
  • [3] An Efficient Framework by Topic Model for Multi-label Text Classification
    Sun, Wei
    Ran, Xiangying
    Luo, Xiangyang
    Wang, Chongjun
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [4] Online multi-label dependency topic models for text classification
    Burkhardt, Sophie
    Kramer, Stefan
    MACHINE LEARNING, 2018, 107 (05) : 859 - 886
  • [5] Dataless Text Classification: A Topic Modeling Approach with Document Manifold
    Li, Ximing
    Li, Changchun
    Chi, Jinjin
    Ouyang, Jihong
    Li, Chenliang
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 973 - 982
  • [6] New Lifelong Topic Modeling Method and Its Application to Vietnamese Text Multi-label Classification
    Quang-Thuy Ha
    Thi-Ngan Pham
    Van-Quang Nguyen
    Thi-Cham Nguyen
    Thi-Hong Vuong
    Minh-Tuoi Tran
    Tri-Thanh Nguyen
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2018, PT I, 2018, 10751 : 200 - 210
  • [7] Feature Extraction of Deep Topic Model for Multi-label Text Classification
    Chen W.
    Liu X.
    Lu M.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (09): : 785 - 792
  • [8] Label prompt for multi-label text classification
    Song, Rui
    Liu, Zelong
    Chen, Xingbing
    An, Haining
    Zhang, Zhiqi
    Wang, Xiaoguang
    Xu, Hao
    APPLIED INTELLIGENCE, 2023, 53 (08) : 8761 - 8775
  • [9] Label prompt for multi-label text classification
    Rui Song
    Zelong Liu
    Xingbing Chen
    Haining An
    Zhiqi Zhang
    Xiaoguang Wang
    Hao Xu
    Applied Intelligence, 2023, 53 : 8761 - 8775
  • [10] A Label Distribution Topic Model for Multi-label Classification
    Liu, Lin
    Tang, Lin
    2019 4TH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION PROCESSING (ICIIP 2019), 2019, : 52 - 57