Learning from Combination of Data Chunks for Multi-class Imbalanced Data

被引:0
|
作者
Liu, Xu-Ying [1 ,2 ]
Li, Qian-Qian [1 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, MOE, Key Lab Comp Network & Informat Integrat, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210008, Jiangsu, Peoples R China
关键词
NEURAL-NETWORKS; ROC CURVE; AREA;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class-imbalance is very common in real-world applications. Previous studies focused on binary-class imbalance problem, whereas multi-class imbalance problem is more general and more challenging. Under-sampling is an effective and efficient method for binary-class imbalanced data. But when it is used for multi-class imbalanced data, many more majority class examples are ignored because there are often multiple majority classes, and the minority class often has few data. To utilize the information contained in the majority class examples ignored by under-sampling, this paper proposes a method ChunkCombine. For each majority class, it performs under-sampling multiple times to obtained non-overlapping data chunks, such that they contain the most information that a data sample of the same size can contain. Each data chunk has the same size as the minority class to achieve balance. Then every possible combination of the minority class and each data chunk from every majority class forms a balanced training set. ChunkCombine uses ensemble techniques to learn from the different training sets derived from all the possible combinations. Experimental results show it is better than many other popular methods for multi-class imbalanced data when average accuracy, G-mean and MAUC are used as evaluation measures. Besides, we discuss different evaluation measures and suggest that, a multi-class F-measure Mean F-Measure (MFM) is unsuitable for multi-class imbalanced data in many situations because it is not consistent with the standard F-measure in binary-class case and it is close to accuracy.
引用
收藏
页码:1680 / 1687
页数:8
相关论文
共 50 条
  • [1] A Combination Method for Multi-Class Imbalanced Data Classification
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 365 - 368
  • [2] OAHO: an effective algorithm for multi-class learning from imbalanced data
    Murphey, Yi L.
    Wang, Haoxing
    Ou, Guobin
    Feldkamp, Lee A.
    2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 406 - +
  • [3] Multi-class Boosting for Imbalanced Data
    Fernandez-Baldera, Antonio
    Buenaposada, Jose M.
    Baumela, Luis
    PATTERN RECOGNITION AND IMAGE ANALYSIS (IBPRIA 2015), 2015, 9117 : 57 - 64
  • [4] Learning Imbalanced Multi-class Data with Optimal Dichotomy Weights
    Liu, Xu-Ying
    Li, Qian-Qian
    Zhou, Zhi-Hua
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 478 - 487
  • [5] Multi-class Ensemble Learning of Imbalanced Bidding Fraud Data
    Anowar, Farzana
    Sadaoui, Samira
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11489 : 352 - 358
  • [6] Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data
    Zhao, Jiakun
    Jin, Ju
    Zhang, Yibo
    Zhang, Ruifeng
    Chen, Si
    INTELLIGENT DATA ANALYSIS, 2022, 26 (03) : 599 - 614
  • [7] Evaluating Difficulty of Multi-class Imbalanced Data
    Lango, Mateusz
    Napierala, Krystyna
    Stefanowski, Jerzy
    FOUNDATIONS OF INTELLIGENT SYSTEMS, ISMIS 2017, 2017, 10352 : 312 - 322
  • [8] Survey on Highly Imbalanced Multi-class Data
    Hamid, Hakim Abdul
    Yusoff, Marina
    Mohamed, Azlinah
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (06) : 211 - 229
  • [9] Online active learning method for multi-class imbalanced data stream
    Ang Li
    Meng Han
    Dongliang Mu
    Zhihui Gao
    Shujuan Liu
    Knowledge and Information Systems, 2024, 66 : 2355 - 2391
  • [10] Online active learning method for multi-class imbalanced data stream
    Li, Ang
    Han, Meng
    Mu, Dongliang
    Gao, Zhihui
    Liu, Shujuan
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (04) : 2355 - 2391