Multi-label dataless text classification with topic modeling

被引:0
|
作者
Daochen Zha
Chenliang Li
机构
[1] Wuhan University,School of Computer Science
[2] Wuhan University,School of Cyber Science and Engineering
来源
关键词
Dataless text classification; Topic model; Multi-label text classification; Spike and slab prior;
D O I
暂无
中图分类号
学科分类号
摘要
Manually labeling documents is tedious and expensive, but it is essential for training a traditional text classifier. In recent years, a few dataless text classification techniques have been proposed to address this problem. However, existing works mainly center on single-label classification problems, that is, each document is restricted to belonging to a single category. In this paper, we propose a novel Seed-guided Multi-label Topic Model, named SMTM. With a few seed words relevant to each category, SMTM conducts multi-label classification for a collection of documents without any labeled document. In SMTM, each category is associated with a single category-topic which covers the meaning of the category. To accommodate with multi-label documents, we explicitly model the category sparsity in SMTM by using spike and slab prior and weak smoothing prior. That is, without using any threshold tuning, SMTM automatically selects the relevant categories for each document. To incorporate the supervision of the seed words, we propose a seed-guided biased GPU (i.e., generalized Pólya urn) sampling procedure to guide the topic inference of SMTM. Experiments on two public datasets show that SMTM achieves better classification accuracy than state-of-the-art alternatives and even outperforms supervised solutions in some scenarios.
引用
收藏
页码:137 / 160
页数:23
相关论文
共 50 条
  • [31] A Survey of Statistical Topic Model for Multi-label Classification
    Liu, Lin
    Tang, Lin
    2018 26TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS (GEOINFORMATICS 2018), 2018,
  • [32] WiseTag: An Ensemble Method for Multi-label Topic Classification
    Liang, Guanqing
    Kao, Hsiaohsien
    Leung, Cane Wing-Ki
    He, Chao
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 479 - 489
  • [33] Labelset topic model for multi-label document classification
    Li, Ximing
    Ouyang, Jihong
    Zhou, Xiaotang
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (01) : 83 - 97
  • [34] Centroid prior topic model for multi-label classification
    Li, Ximing
    Ouyang, Jihong
    Zhou, Xiaotang
    PATTERN RECOGNITION LETTERS, 2015, 62 : 8 - 13
  • [35] Multi-label Classification of Legal Text with Fusion of Label Relations
    Song Z.
    Li Y.
    Li D.
    Wang S.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (02): : 185 - 192
  • [36] Multi-Label Text Classification Based on DistilBERT and Label Correlation
    Wang, Xuyang
    Geng, Liuqing
    Zhang, Xin
    Computer Engineering and Applications, 2024, 60 (23) : 168 - 175
  • [37] A Multi-Label Text Classification Model with Enhanced Label Information
    Wang, Min
    Gao, Yan
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 329 - 334
  • [38] A Label Information Aware Model for Multi-label Text Classification
    Tian, Xiaoyu
    Qin, Yongbin
    Huang, Ruizhang
    Chen, Yanping
    NEURAL PROCESSING LETTERS, 2024, 56 (05)
  • [39] MULTI-LABEL TEXT CLASSIFICATION WITH A ROBUST LABEL DEPENDENT REPRESENTATION
    Alfaro, Rodrigo
    Allende, Hector
    2011 INTERNATIONAL CONFERENCE ON INSTRUMENTATION, MEASUREMENT, CIRCUITS AND SYSTEMS (ICIMCS 2011), VOL 3: COMPUTER-AIDED DESIGN, MANUFACTURING AND MANAGEMENT, 2011, : 211 - 214
  • [40] TAE: Topic-aware encoder for large-scale multi-label text classification
    Qin, Shaowei
    Wu, Hao
    Zhou, Lihua
    Zhao, Yiji
    Zhang, Lei
    APPLIED INTELLIGENCE, 2024, 54 (08) : 6269 - 6284