Linear Approximation of F-Measure for the Performance Evaluation of Classification Algorithms on Imbalanced Data Sets

被引:7
|
作者
Wong, Tzu-Tsung [1 ]
机构
[1] Natl Cheng Kung Univ, Inst Informat Management, 1 Ta Sheuh Rd, Tainan 701, Taiwan
关键词
Classification; cross validation; F-measure; imbalanced data set; sampling distribution; STATISTICAL COMPARISONS; K-FOLD; CLASSIFIERS; RECALL;
D O I
10.1109/TKDE.2020.2986749
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Accuracy is a popular measure for evaluating the performance of classification algorithms tested on ordinary data sets. When a data set is imbalanced, F-measure will be a better choice than accuracy for this purpose. Since F-measure is calculated as the harmonic mean of recall and precision, it is difficult to find the sampling distribution of F-measure for evaluating classification algorithms. Since the values of recall and precision are dependent, their joint distribution is assumed to follow a bivariate normal distribution in this study. When the evaluation method is k-fold cross validation, a linear approximation approach is proposed to derive the sampling distribution of F-measure. This approach is used to design methods for comparing the performance of two classification algorithms tested on single or multiple imbalanced data sets. The methods are tested on ten imbalanced data sets to demonstrate their effectiveness. The weight of recall provided by this linear approximation approach can help us to analyze the characteristics of classification algorithms.
引用
收藏
页码:753 / 763
页数:11
相关论文
共 50 条
  • [31] An Improved Algorithm for SVMs Classification of Imbalanced Data Sets
    Castro, Cristiano Leite
    Carvalho, Mateus Araujo
    Braga, Antonio Padua
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, PROCEEDINGS, 2009, 43 : 108 - 118
  • [32] Classification of imbalanced marketing data with balanced random sets
    Nikulin, Vladimir
    McLachlan, Geoffrey J.
    Journal of Machine Learning Research, 2009, 7 : 89 - 100
  • [33] An evaluation of progressive sampling for imbalanced data sets
    Ng, Willie
    Dash, Manoranjan
    ICDM 2006: SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, WORKSHOPS, 2006, : 657 - +
  • [34] Improving the classification performance on imbalanced data sets via new hybrid parameterisation model
    Mohamad, Masurah
    Selamat, Ali
    Subroto, Imam Much
    Krejcar, Ondrej
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2021, 33 (07) : 787 - 797
  • [35] Evaluation of the Classifiers in Multiparameter and Imbalanced Data Sets
    Piotrowska, Ewelina
    INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, ISAT 2019, PT II, 2020, 1051 : 263 - 273
  • [36] APPROXIMATION OF GDELTA-SETS IN MEASURE BY F SIGMA-SETS
    LARMAN, DG
    PROCEEDINGS OF THE CAMBRIDGE PHILOSOPHICAL SOCIETY-MATHEMATICAL AND PHYSICAL SCIENCES, 1965, 61 : 105 - &
  • [37] Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets
    Liu, Wei
    Chawla, Sanjay
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6635 : 345 - 356
  • [38] Comparative Study on Defect Prediction Algorithms of Supervised Learning Software Based on Imbalanced Classification Data Sets
    Ge, Jianxin
    Liu, Jiaomin
    Liu, Wenyuan
    2018 19TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2018, : 399 - 406
  • [39] A Classification Performance Evaluation Measure Considering Data Separability
    Xue, Lingyan
    Zhang, Xinyu
    Jiang, Weidong
    Huo, Kai
    Shen, Qinmu
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT I, 2023, 14254 : 1 - 13
  • [40] Probabilistic mapping of imbalanced data for groundwater contamination using classification algorithms: Performance and reliability
    Qiu, Yang
    Zhou, Aiguo
    Xiong, Hanxiang
    Zhang, Defang
    Su, Cheng
    Zhou, Shizheng
    Go, Lin
    Yang, Chi
    Cui, Hao
    Fan, Wei
    Yu, Yao
    Zhang, Fawang
    Ma, Chuanming
    GROUNDWATER FOR SUSTAINABLE DEVELOPMENT, 2025, 28