A robust multi-class AdaBoost algorithm for mislabeled noisy data

被引:61
|
作者
Sun, Bo [1 ]
Chen, Songcan [1 ]
Wang, Jiandong [1 ]
Chen, Haiyan [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, 29 Yudao St, Nanjing 210016, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Ensemble learning; AdaBoost; Robustness; Multi-class classification; Mislabeled noise; CLASSIFICATION; SETS;
D O I
10.1016/j.knosys.2016.03.024
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
AdaBoost has been theoretically and empirically proved to be a very successful ensemble learning algorithm, which iteratively generates a set of diverse weak learners and combines their outputs using the weighted majority voting rule as the final decision. However, in some cases, AdaBoost leads to overfitting especially for mislabeled noisy training examples, resulting in both its degraded generalization performance and non-robustness. Recently, a representative approach named noise-detection based AdaBoost (ND_AdaBoost) has been proposed to improve the robustness of AdaBoost in the two-class classification scenario, however, in the multi-class scenario, this approach can hardly achieve satisfactory performance due to the following three reasons. (1) If we decompose a multi-class classification problem using such strategies as one-versus-all or one-versus-one, the obtained two-class problems usually have imbalanced training sets, which negatively influences the performance of ND_AdaBoost (2) If we directly apply ND_AdaBoost to the multi-class classification scenario, its two-class loss function is no longer applicable and its accuracy requirement for the (weak) base classifiers, i.e., greater than 0.5, is too strong to be almost satisfied. (3) ND_AdaBoost still has the tendency of overfitting as it increases the weights of correctly classified noisy examples, which could make it focus on learning these noisy examples in the subsequent iterations. To solve the dilemma, in this paper, we propose a robust multi-class AdaBoost algorithm (Rob_MulAda) whose key ingredients consist in a noise-detection based multi-class loss function and a new weight updating scheme. Experimental study indicates that our newly-proposed weight updating scheme is indeed more robust to mislabeled noises than that of ND_AdaBoost in both two -class and multi -class scenarios. In addition, through the comparison experiments, we also verify the effectiveness of Rob_MulAda and provide a suggestion in choosing the most appropriate noise-alleviating approach according to the concrete noise level in practical applications. Crown Copyright (C) 2016 Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:87 / 102
页数:16
相关论文
共 50 条
  • [41] Head Pose Classification by Multi-Class AdaBoost with Fusion of RGB and Depth Images
    Yun, Yixiao
    Changrampadi, Mohamed H.
    Gu, Irene Y. H.
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2014, : 174 - +
  • [42] Feature Selection for Multi-Class Imbalanced Data Sets Based on Genetic Algorithm
    Du L.-M.
    Xu Y.
    Zhu H.
    Ann. Data Sci., 3 (293-300): : 293 - 300
  • [43] Multi-Class Learning: From Theory to Algorithm
    Li, Jian
    Liu, Yong
    Yin, Rong
    Zhang, Hua
    Ding, Lizhong
    Wang, Weiping
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [44] ERM learning algorithm for multi-class classification
    Wang, Cheng
    Guo, Zheng-Chu
    APPLICABLE ANALYSIS, 2012, 91 (07) : 1339 - 1349
  • [45] A Novel Double Ensemble Algorithm for the Classification of Multi-Class Imbalanced Hyperspectral Data
    Quan, Daying
    Feng, Wei
    Dauphin, Gabriel
    Wang, Xiaofeng
    Huang, Wenjiang
    Xing, Mengdao
    REMOTE SENSING, 2022, 14 (15)
  • [46] A supervised feature extraction algorithm for multi-class
    Ding, Shifei
    Jin, Fengxiang
    Lei, Xiaofeng
    Shi, Zhongzhi
    FRONTIERS IN ALGORITHMICS, 2008, 5059 : 323 - +
  • [47] An active learning algorithm for multi-class classification
    Dongjiang Liu
    Yanbi Liu
    Pattern Analysis and Applications, 2019, 22 : 1051 - 1063
  • [48] Combined Cleaning and Resampling algorithm for multi-class imbalanced data with label noise
    Koziarski, Michal
    Wozniak, Michal
    Krawczyk, Bartosz
    KNOWLEDGE-BASED SYSTEMS, 2020, 204 (204)
  • [49] A Robust and Accurate Method for Feature Selection and Prioritization from Multi-Class OMICs Data
    Fortino, Vittorio
    Kinaret, Pia
    Fyhrquist, Nanna
    Alenius, Harri
    Greco, Dario
    PLOS ONE, 2014, 9 (09):
  • [50] An algorithm for correcting mislabeled data
    Zeng, Xinchuan
    Martinez, Tony R.
    Intelligent Data Analysis, 2001, 5 (06) : 491 - 502