Open source software classification using cost-sensitive multi-label learning

被引:0
|
作者
Han, Le [1 ]
Li, Ming [1 ]
机构
[1] National Key Laboratory for Novel Software Technology (Nanjing University), Nanjing,210023, China
来源
Ruan Jian Xue Bao/Journal of Software | 2014年 / 25卷 / 09期
关键词
Learning systems - Problem solving - Learning algorithms - Open systems - User interfaces - Costs;
D O I
10.13328/j.cnki.jos.004639
中图分类号
学科分类号
摘要
With the explosive growth of open source software, retrieving desired software in open source software communities becomes a great challenge. Tagging open source software is usually a manual process which assigns software with several tags describing its functions and characteristics. Users can search their desired software by matching the keywords. Because of the simplicity and convenience, software retrieval based on tags has been widely used. However, since human effort is expensive and time-consuming, developers are not willing to tag software sufficiently when uploading software projects. Thus automatic software tagging, with tags describing functions and characteristics according to software projects' text descriptions provided by users, becomes key to effective software retrieval. This article formalizes this problem as a multi-label learning problem and proposes a new multi-label learning method ML-CKNN which can effectively solve this problem when the number of different tags is extremely large. By imposing cost value of wrong classification into multi-label learning, ML-CKNN can effectively solve this imbalanced problem, as each tag instances associated with this tag are much less than those not associated with this tag. Experiments on three open source software community datasets show that ML-CKNN can provide high-quality tags for new uploading open source software while significantly outperforming existing methods. © Copyright 2014, Institute of Software, the Chinese Academy of Science. All Rights Reserved.
引用
收藏
页码:1982 / 1991
相关论文
共 50 条
  • [1] Cost-sensitive label embedding for multi-label classification
    Huang, Kuan-Hao
    Lin, Hsuan-Tien
    MACHINE LEARNING, 2017, 106 (9-10) : 1725 - 1746
  • [2] Cost-sensitive label embedding for multi-label classification
    Kuan-Hao Huang
    Hsuan-Tien Lin
    Machine Learning, 2017, 106 : 1725 - 1746
  • [3] Multi-label thresholding for cost-sensitive classification
    Alotaibi, Reem
    Flach, Peter
    NEUROCOMPUTING, 2021, 436 : 232 - 247
  • [4] Cost-sensitive ensemble learning algorithm for multi-label classification problems
    Fu, Z.-L. (fzliang@netease.com), 1600, Science Press (40):
  • [5] Condensed Filter Tree for Cost-Sensitive Multi-Label Classification
    Li, Chun-Liang
    Lin, Hsuan-Tien
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [6] Text classification based on a novel cost-sensitive ensemble multi-label learning method
    Hu, Haifeng
    Zhang, Tao
    Wu, Jiansheng
    Journal of Software Engineering, 2016, 10 (01): : 42 - 53
  • [7] Online multi-label learning with cost-sensitive budgeted SVM
    Liu, Jing
    Guo, Zhongwen
    Jian, Ling
    Qiu, Like
    Wang, Xupeng
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2018, 17 (03) : 324 - 332
  • [8] Multi-label Classification with Feature-aware Cost-sensitive Label Embedding
    Chiu, Hsien-Chun
    Lin, Hsuan-Tien
    2018 CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI), 2018, : 40 - 45
  • [9] Cost-Sensitive Reference Pair Encoding for Multi-Label Learning
    Yang, Yao-Yuan
    Huang, Kuan-Hao
    Chang, Chih-Wei
    Lin, Hsuan-Tien
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT I, 2018, 10937 : 143 - 155
  • [10] Dynamic principal projection for cost-sensitive online multi-label classification
    Hong-Min Chu
    Kuan-Hao Huang
    Hsuan-Tien Lin
    Machine Learning, 2019, 108 : 1193 - 1230