Supervised topic models with weighted words: multi-label document classification

被引:0
|
作者
Yue-peng Zou
Ji-hong Ouyang
Xi-ming Li
机构
[1] Jilin University,College of Computer Science and Technology
[2] Jilin University,MOE Key Laboratory of Symbolic Computation and Knowledge Engineering
关键词
Supervised topic model; Multi-label classification; Class frequency; Labeled latent Dirichlet allocation (L-LDA); Dependency-LDA; TP391;
D O I
暂无
中图分类号
学科分类号
摘要
Supervised topic modeling algorithms have been successfully applied to multi-label document classification tasks. Representative models include labeled latent Dirichlet allocation (L-LDA) and dependency-LDA. However, these models neglect the class frequency information of words (i.e., the number of classes where a word has occurred in the training data), which is significant for classification. To address this, we propose a method, namely the class frequency weight (CF-weight), to weight words by considering the class frequency knowledge. This CF-weight is based on the intuition that a word with higher (lower) class frequency will be less (more) discriminative. In this study, the CF-weight is used to improve L-LDA and dependency-LDA. A number of experiments have been conducted on real-world multi-label datasets. Experimental results demonstrate that CF-weight based algorithms are competitive with the existing supervised topic models.
引用
收藏
页码:513 / 523
页数:10
相关论文
共 50 条
  • [41] Label-Specific Document Representation for Multi-Label Text Classification
    Xiao, Lin
    Huang, Xin
    Chen, Boli
    Jing, Liping
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 466 - 475
  • [42] Multi-label clinical document classification: Impact of label-density
    Blanco, Alberto
    Casillas, Arantza
    Perez, Alicia
    Diaz de Ilarraza, Arantza
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 138
  • [43] Weighted Ensemble Classification of Multi-label Data Streams
    Wang, Lulu
    Shen, Hong
    Tian, Hui
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2017, PT II, 2017, 10235 : 551 - 562
  • [44] A Double Weighted Naive Bayes for Multi-label Classification
    Yan, Xuesong
    Li, Wei
    Wu, Qinghua
    Sheng, Victor S.
    COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, (ISICA 2015), 2016, 575 : 382 - 389
  • [45] LF-LDA: A Topic Model for Multi-label Classification
    Zhang, Yongjun
    Ma, Jialin
    Wang, Zijian
    Chen, Bolun
    ADVANCES IN INTERNETWORKING, DATA & WEB TECHNOLOGIES, EIDWT-2017, 2018, 6 : 618 - 628
  • [46] An Efficient Framework by Topic Model for Multi-label Text Classification
    Sun, Wei
    Ran, Xiangying
    Luo, Xiangyang
    Wang, Chongjun
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [47] Neural Networks for Multi-lingual Multi-label Document Classification
    Martinek, Jiri
    Lenc, Ladislav
    Kral, Pavel
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I, 2018, 11139 : 73 - 83
  • [48] A survey of multi-label classification based on supervised and semi-supervised learning
    Han, Meng
    Wu, Hongxin
    Chen, Zhiqiang
    Li, Muhang
    Zhang, Xilong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (03) : 697 - 724
  • [49] Semi-Supervised Dimension Reduction for Multi-label Classification
    Qian, Buyue
    Davidson, Ian
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 569 - 574
  • [50] A survey of multi-label classification based on supervised and semi-supervised learning
    Meng Han
    Hongxin Wu
    Zhiqiang Chen
    Muhang Li
    Xilong Zhang
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 697 - 724