On the Dynamics of Classification Measures for Imbalanced and Streaming Data

被引:31
|
作者
Brzezinski, Dariusz [1 ,2 ]
Stefanowski, Jerzy [1 ,2 ]
Susmaga, Robert [1 ,2 ]
Szczech, Izabela [1 ,2 ]
机构
[1] Poznan Univ Tech, CAMIL, PL-60965 Poznan, Poland
[2] Poznan Univ Tech, Inst Comp Sci, PL-60965 Poznan, Poland
关键词
Data visualization; Atmospheric measurements; Particle measurements; Histograms; Task analysis; Size measurement; Sensitivity; Class imbalance; classification measures; concept drift; data streams; measure gradients; measure histograms;
D O I
10.1109/TNNLS.2019.2899061
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As each imbalanced classification problem comes with its own set of challenges, the measure used to evaluate classifiers must be individually selected. To help researchers make this decision in an informed manner, experimental and theoretical investigations compare general properties of measures. However, existing studies do not analyze changes in measure behavior imposed by different imbalance ratios. Moreover, several characteristics of imbalanced data streams, such as the effect of dynamically changing class proportions, have not been thoroughly investigated from the perspective of different metrics. In this paper, we study measure dynamics by analyzing changes of measure values, distributions, and gradients with diverging class proportions. For this purpose, we visualize measure probability mass functions and gradients. In addition, we put forward a histogram-based normalization method that provides a unified, probabilistic interpretation of any measure over data sets with different class distributions. The results of analyzing eight popular classification measures show that the effect class proportions have on each measure is different and should be taken into account when evaluating classifiers. Apart from highlighting imbalance-related properties of each measure, our study shows a direct connection between class ratio changes and certain types of concept drift, which could be influential in designing new types of classifiers and drift detectors for imbalanced data streams.
引用
收藏
页码:2868 / 2878
页数:11
相关论文
共 50 条
  • [21] Classification of imbalanced data with transparent kernels
    Lee, KK
    Gunn, SR
    Harris, CJ
    Reed, PAS
    IJCNN'01: INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2001, : 2410 - 2415
  • [22] A Novel Model for Imbalanced Data Classification
    Yin, Jian
    Gan, Chunjing
    Zhao, Kaiqi
    Lin, Xuan
    Quan, Zhe
    Wang, Zhi-Jie
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6680 - 6687
  • [23] Review of imbalanced data classification methods
    Li Y.-X.
    Chai Y.
    Hu Y.-Q.
    Yin H.-P.
    Kongzhi yu Juece/Control and Decision, 2019, 34 (04): : 673 - 688
  • [24] Pairwise Learning for Imbalanced Data Classification
    Liu, Shu
    Wu, Qiang
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2021), 2021, : 186 - 189
  • [25] The Text Classification for Imbalanced Data Sets
    Li, Yanling
    Zhu, Yehang
    Yang, Ping
    ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 778 - +
  • [26] Ensemble Approach for the Classification of Imbalanced Data
    Nikulin, Vladimir
    McLachlan, Geoffrey J.
    Ng, Shu Kay
    AI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5866 : 291 - +
  • [27] Potential Anchoring for imbalanced data classification
    Koziarski, Michal
    PATTERN RECOGNITION, 2021, 120
  • [28] Incremental Learning of Concept Drift from Streaming Imbalanced Data
    Ditzler, Gregory
    Polikar, Robi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (10) : 2283 - 2301
  • [29] Imbalanced classification applied to asteroid resonant dynamics
    Carruba, V.
    Aljbaae, S.
    Carita, G.
    Lourenco, M. V. F.
    Martins, B. S.
    Alves, A. A.
    FRONTIERS IN ASTRONOMY AND SPACE SCIENCES, 2023, 10
  • [30] Ensemble Classifier for Imbalanced Streaming Data Using Partial Labeling
    Arabmakki, Elaheh
    Kantardzic, Mehmed
    Sethi, Tegjyot Singh
    PROCEEDINGS OF 2016 IEEE 17TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IEEE IRI), 2016, : 257 - 260