On the Dynamics of Classification Measures for Imbalanced and Streaming Data

被引:31
|
作者
Brzezinski, Dariusz [1 ,2 ]
Stefanowski, Jerzy [1 ,2 ]
Susmaga, Robert [1 ,2 ]
Szczech, Izabela [1 ,2 ]
机构
[1] Poznan Univ Tech, CAMIL, PL-60965 Poznan, Poland
[2] Poznan Univ Tech, Inst Comp Sci, PL-60965 Poznan, Poland
关键词
Data visualization; Atmospheric measurements; Particle measurements; Histograms; Task analysis; Size measurement; Sensitivity; Class imbalance; classification measures; concept drift; data streams; measure gradients; measure histograms;
D O I
10.1109/TNNLS.2019.2899061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As each imbalanced classification problem comes with its own set of challenges, the measure used to evaluate classifiers must be individually selected. To help researchers make this decision in an informed manner, experimental and theoretical investigations compare general properties of measures. However, existing studies do not analyze changes in measure behavior imposed by different imbalance ratios. Moreover, several characteristics of imbalanced data streams, such as the effect of dynamically changing class proportions, have not been thoroughly investigated from the perspective of different metrics. In this paper, we study measure dynamics by analyzing changes of measure values, distributions, and gradients with diverging class proportions. For this purpose, we visualize measure probability mass functions and gradients. In addition, we put forward a histogram-based normalization method that provides a unified, probabilistic interpretation of any measure over data sets with different class distributions. The results of analyzing eight popular classification measures show that the effect class proportions have on each measure is different and should be taken into account when evaluating classifiers. Apart from highlighting imbalance-related properties of each measure, our study shows a direct connection between class ratio changes and certain types of concept drift, which could be influential in designing new types of classifiers and drift detectors for imbalanced data streams.
引用
收藏
页码:2868 / 2878
页数:11
相关论文
共 50 条
  • [31] Sparse Stochastic Online AUC Optimization for Imbalanced Streaming Data
    Yang, Min
    Cai, Xufen
    Hu, Ruimin
    Ye, Long
    Zhu, Rong
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 960 - 969
  • [32] IEBench: Benchmarking Streaming Learners on Imbalanced Evolving Data Streams
    Bernardo, Alessio
    Ziffer, Giacomo
    Della Valle, Emanuele
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 331 - 340
  • [33] Weighted Data Gravitation Classification for Standard and Imbalanced Data
    Cano, Alberto
    Zafra, Amelia
    Ventura, Sebastian
    IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (06) : 1672 - 1687
  • [34] An automated approach for binary classification on imbalanced data
    Vieira, Pedro Marques
    Rodrigues, Fatima
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (05) : 2747 - 2767
  • [35] MaMiPot: a paradigm shift for the classification of imbalanced data
    Zefrehi, Hossein Ghaderi
    Altincay, Hakan
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 61 (01) : 299 - 324
  • [36] Discriminative feature generation for classification of imbalanced data
    Suh, Sungho
    Lukowicz, Paul
    Lee, Yong Oh
    PATTERN RECOGNITION, 2022, 122
  • [37] Imbalanced Data Stream Classification: Analysis and Solution
    Anjana, Koringa
    Radhika, Kotecha
    Darshana, Patel
    INFORMATION AND COMMUNICATION TECHNOLOGY FOR INTELLIGENT SYSTEMS (ICTIS 2017) - VOL 2, 2018, 84 : 316 - 324
  • [38] Classification of imbalanced data with a geometric digraph family
    Manukyan, Artür
    Ceyhan, Elvan
    Journal of Machine Learning Research, 2016, 17
  • [39] Imbalanced Data Classification Based on Hybrid Methods
    Zhang, Nai-Nan
    Ye, Shao-Zhen
    Chien, Ting-Ying
    PROCEEDINGS OF THE 2018 2ND INTERNATIONAL CONFERENCE ON BIG DATA RESEARCH (ICBDR 2018), 2018, : 16 - 20
  • [40] MaMiPot: a paradigm shift for the classification of imbalanced data
    Hossein Ghaderi Zefrehi
    Hakan Altınçay
    Journal of Intelligent Information Systems, 2023, 61 : 299 - 324