Bonsai: diverse and shallow trees for extreme multi-label classification

被引:0
|
作者
Sujay Khandagale
Han Xiao
Rohit Babbar
机构
[1] Aalto University,
来源
Machine Learning | 2020年 / 109卷
关键词
Large-scale multi-label classification; Extreme multi-label classification; Large label space;
D O I
暂无
中图分类号
学科分类号
摘要
Extreme multi-label classification (XMC) refers to supervised multi-label learning involving hundreds of thousands or even millions of labels. In this paper, we develop a suite of algorithms, called Bonsai, which generalizes the notion of label representation in XMC, and partitions the labels in the representation space to learn shallow trees. We show three concrete realizations of this label representation space including: (i) the input space which is spanned by the input features, (ii) the output space spanned by label vectors based on their co-occurrence with other labels, and (iii) the joint space by combining the input and output representations. Furthermore, the constraint-free multi-way partitions learnt iteratively in these spaces lead to shallow trees. By combining the effect of shallow trees and generalized label representation, Bonsai achieves the best of both worlds—fast training which is comparable to state-of-the-art tree-based methods in XMC, and much better prediction accuracy, particularly on tail-labels. On a benchmark Amazon-3M dataset with 3 million labels, Bonsai outperforms a state-of-the-art one-vs-rest method in terms of prediction accuracy, while being approximately 200 times faster to train. The code for Bonsai is available at https://github.com/xmc-aalto/bonsai.
引用
收藏
页码:2099 / 2119
页数:20
相关论文
共 50 条
  • [1] Bonsai: diverse and shallow trees for extreme multi-label classification
    Khandagale, Sujay
    Xiao, Han
    Babbar, Rohit
    MACHINE LEARNING, 2020, 109 (11) : 2099 - 2119
  • [2] Decision trees for hierarchical multi-label classification
    Vens, Celine
    Struyf, Jan
    Schietgat, Leander
    Dzeroski, Saso
    Blockeel, Hendrik
    MACHINE LEARNING, 2008, 73 (02) : 185 - 214
  • [3] Decision trees for hierarchical multi-label classification
    Celine Vens
    Jan Struyf
    Leander Schietgat
    Sašo Džeroski
    Hendrik Blockeel
    Machine Learning, 2008, 73 : 185 - 214
  • [4] Extreme Learning Machine for Multi-Label Classification
    Sun, Xia
    Xu, Jingting
    Jiang, Changmeng
    Feng, Jun
    Chen, Su-Shing
    He, Feijuan
    ENTROPY, 2016, 18 (06)
  • [5] Extreme Multi-label Classification for Information Retrieval
    Dembczynski, Krzysztof
    Babbar, Rohit
    ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 839 - 840
  • [6] Multi-Label Classification with Extreme Learning Machine
    Kongsorot, Yanika
    Horata, Punyaphol
    2014 6TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST), 2014, : 81 - 86
  • [7] Evaluating Extreme Hierarchical Multi-label Classification
    Amigo, Enrique
    Delgado, Agustin D.
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5809 - 5819
  • [8] Reweighting Forest for Extreme Multi-label Classification
    Lin, Zhun-Zheng
    Dai, Bi-Ru
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2017, 2017, 10440 : 286 - 299
  • [9] Consistent, Balanced, and Overlapping Label Trees for Extreme Multi-label Learning
    Ge, Zhiqi
    Guan, Yuanyuan
    Li, Ximing
    Fu, Bo
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 551 - 560
  • [10] Fuzzy Rough Decision Trees for Multi-label Classification
    Wang, Xiaoxue
    An, Shuang
    Shi, Hong
    Hu, Qinghua
    ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, RSFDGRC 2015, 2015, 9437 : 207 - 217