On the Dynamics of Classification Measures for Imbalanced and Streaming Data

被引:31
|
作者
Brzezinski, Dariusz [1 ,2 ]
Stefanowski, Jerzy [1 ,2 ]
Susmaga, Robert [1 ,2 ]
Szczech, Izabela [1 ,2 ]
机构
[1] Poznan Univ Tech, CAMIL, PL-60965 Poznan, Poland
[2] Poznan Univ Tech, Inst Comp Sci, PL-60965 Poznan, Poland
关键词
Data visualization; Atmospheric measurements; Particle measurements; Histograms; Task analysis; Size measurement; Sensitivity; Class imbalance; classification measures; concept drift; data streams; measure gradients; measure histograms;
D O I
10.1109/TNNLS.2019.2899061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As each imbalanced classification problem comes with its own set of challenges, the measure used to evaluate classifiers must be individually selected. To help researchers make this decision in an informed manner, experimental and theoretical investigations compare general properties of measures. However, existing studies do not analyze changes in measure behavior imposed by different imbalance ratios. Moreover, several characteristics of imbalanced data streams, such as the effect of dynamically changing class proportions, have not been thoroughly investigated from the perspective of different metrics. In this paper, we study measure dynamics by analyzing changes of measure values, distributions, and gradients with diverging class proportions. For this purpose, we visualize measure probability mass functions and gradients. In addition, we put forward a histogram-based normalization method that provides a unified, probabilistic interpretation of any measure over data sets with different class distributions. The results of analyzing eight popular classification measures show that the effect class proportions have on each measure is different and should be taken into account when evaluating classifiers. Apart from highlighting imbalance-related properties of each measure, our study shows a direct connection between class ratio changes and certain types of concept drift, which could be influential in designing new types of classifiers and drift detectors for imbalanced data streams.
引用
收藏
页码:2868 / 2878
页数:11
相关论文
共 50 条
  • [1] Data Complexity Measures for Imbalanced Classification Tasks
    Barella, Victor H.
    Garcia, Luis P. F.
    de Souto, Marcilio P.
    Lorena, Ana C.
    de Carvalho, Andre
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [2] Preliminary Evaluation of Classification Complexity Measures on Imbalanced Data
    Xing, Yan
    Cai, Hao
    Cai, Yanguang
    Hejlesen, Ole
    Toft, Egon
    PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 189 - 196
  • [3] Evaluation Measures of the Classification Performance of Imbalanced Data Sets
    Gu, Qiong
    Zhu, Li
    Cai, Zhihua
    COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, 2009, 51 : 461 - +
  • [4] A Study of Interestingness Measures for Associative Classification on Imbalanced Data
    Yang, Guangfei
    Cui, Xuejiao
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2015, 2015, 9441 : 141 - 151
  • [5] Multi-window based ensemble learning for classification of imbalanced streaming data
    Hu Li
    Ye Wang
    Hua Wang
    Bin Zhou
    World Wide Web, 2017, 20 : 1507 - 1525
  • [6] Multi-window based ensemble learning for classification of imbalanced streaming data
    Li, Hu
    Wang, Ye
    Wang, Hua
    Zhou, Bin
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2017, 20 (06): : 1507 - 1525
  • [7] Empirical Assessment of Performance Measures for Preprocessing Moments in Imbalanced Data Classification Problem
    Szeszko, Pawel
    Topczewska, Magdalena
    COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL MANAGEMENT, CISIM 2016, 2016, 9842 : 183 - 194
  • [8] A Framework of Online Learning with Imbalanced Streaming Data
    Yan, Yan
    Yang, Tianbao
    Yang, Yi
    Chen, Jianhui
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2817 - 2823
  • [9] Binary Classification with Imbalanced Data
    Chiang, Jyun-You
    Lio, Yuhlong
    Hsu, Chien-Ya
    Ho, Chia-Ling
    Tsai, Tzong-Ru
    ENTROPY, 2024, 26 (01)
  • [10] Framework for imbalanced data classification
    Blaszczyk, Mikolaj
    Jedrzejowicz, Joanna
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 3477 - 3486