On the Dynamics of Classification Measures for Imbalanced and Streaming Data

被引:31
|
作者
Brzezinski, Dariusz [1 ,2 ]
Stefanowski, Jerzy [1 ,2 ]
Susmaga, Robert [1 ,2 ]
Szczech, Izabela [1 ,2 ]
机构
[1] Poznan Univ Tech, CAMIL, PL-60965 Poznan, Poland
[2] Poznan Univ Tech, Inst Comp Sci, PL-60965 Poznan, Poland
关键词
Data visualization; Atmospheric measurements; Particle measurements; Histograms; Task analysis; Size measurement; Sensitivity; Class imbalance; classification measures; concept drift; data streams; measure gradients; measure histograms;
D O I
10.1109/TNNLS.2019.2899061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As each imbalanced classification problem comes with its own set of challenges, the measure used to evaluate classifiers must be individually selected. To help researchers make this decision in an informed manner, experimental and theoretical investigations compare general properties of measures. However, existing studies do not analyze changes in measure behavior imposed by different imbalance ratios. Moreover, several characteristics of imbalanced data streams, such as the effect of dynamically changing class proportions, have not been thoroughly investigated from the perspective of different metrics. In this paper, we study measure dynamics by analyzing changes of measure values, distributions, and gradients with diverging class proportions. For this purpose, we visualize measure probability mass functions and gradients. In addition, we put forward a histogram-based normalization method that provides a unified, probabilistic interpretation of any measure over data sets with different class distributions. The results of analyzing eight popular classification measures show that the effect class proportions have on each measure is different and should be taken into account when evaluating classifiers. Apart from highlighting imbalance-related properties of each measure, our study shows a direct connection between class ratio changes and certain types of concept drift, which could be influential in designing new types of classifiers and drift detectors for imbalanced data streams.
引用
收藏
页码:2868 / 2878
页数:11
相关论文
共 50 条
  • [41] Classification performance assessment for imbalanced multiclass data
    Aguilar-Ruiz, Jesus S.
    Michalak, Marcin
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [42] Classification of Imbalanced Data Represented as Binary Features
    Mahmudah, Kunti Robiatul
    Indriani, Fatma
    Takemori-Sakai, Yukiko
    Iwata, Yasunori
    Wada, Takashi
    Satou, Kenji
    APPLIED SCIENCES-BASEL, 2021, 11 (17):
  • [43] Classification with local clustering in imbalanced data sets
    Ji, Hua
    Zhang, Huaxiang
    ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 151 - 155
  • [44] Training and Testing Cascades for Imbalanced Data Classification
    Sadreddin, Armin
    Sadaoui, Samira
    2022 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2022, : 261 - 268
  • [45] Dynamic Ensemble Framework for Imbalanced Data Classification
    Zhu, Tuanfei
    Hu, Xingchen
    Liu, Xinwang
    Zhu, En
    Zhu, Xinzhong
    Xu, Huiying
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (05) : 2456 - 2471
  • [46] Imbalanced Data Classification Method Based on LSSASMOTE
    Wang, Zhi
    Liu, Qicheng
    IEEE ACCESS, 2023, 11 : 32252 - 32260
  • [47] An "Outside the Box" Solution for Imbalanced Data Classification
    Jegierski, Hubert
    Saganowski, Stanislaw
    IEEE ACCESS, 2020, 8 (08): : 125191 - 125209
  • [48] Imbalanced classification by learning hidden data structure
    Zhao, Yang
    Shrivastava, Abhishek K.
    Tsui, Kwok Leung
    IIE TRANSACTIONS, 2016, 48 (07) : 614 - 628
  • [49] Imbalanced data classification using MapReduce and relief
    Jedrzejowicz, Joanna
    Kostrzewski, Robert
    Neumann, Jakub
    Zakrzewska, Magdalena
    JOURNAL OF INFORMATION AND TELECOMMUNICATION, 2018, 2 (02) : 217 - 230
  • [50] Classification of weld flaws with imbalanced class data
    Liao, T. Warren
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) : 1041 - 1052