Linear Approximation of F-Measure for the Performance Evaluation of Classification Algorithms on Imbalanced Data Sets

被引:7
|
作者
Wong, Tzu-Tsung [1 ]
机构
[1] Natl Cheng Kung Univ, Inst Informat Management, 1 Ta Sheuh Rd, Tainan 701, Taiwan
关键词
Classification; cross validation; F-measure; imbalanced data set; sampling distribution; STATISTICAL COMPARISONS; K-FOLD; CLASSIFIERS; RECALL;
D O I
10.1109/TKDE.2020.2986749
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Accuracy is a popular measure for evaluating the performance of classification algorithms tested on ordinary data sets. When a data set is imbalanced, F-measure will be a better choice than accuracy for this purpose. Since F-measure is calculated as the harmonic mean of recall and precision, it is difficult to find the sampling distribution of F-measure for evaluating classification algorithms. Since the values of recall and precision are dependent, their joint distribution is assumed to follow a bivariate normal distribution in this study. When the evaluation method is k-fold cross validation, a linear approximation approach is proposed to derive the sampling distribution of F-measure. This approach is used to design methods for comparing the performance of two classification algorithms tested on single or multiple imbalanced data sets. The methods are tested on ten imbalanced data sets to demonstrate their effectiveness. The weight of recall provided by this linear approximation approach can help us to analyze the characteristics of classification algorithms.
引用
收藏
页码:753 / 763
页数:11
相关论文
共 50 条
  • [1] F-Measure Curves for Visualizing Classifier Performance with Imbalanced Data
    Soleymani, Roghayeh
    Granger, Eric
    Fumera, Giorgio
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, ANNPR 2018, 2018, 11081 : 165 - 177
  • [2] Adjusted F-measure and kernel scaling for imbalanced data learning
    Maratea, Antonio
    Petrosino, Alfredo
    Manzo, Mario
    INFORMATION SCIENCES, 2014, 257 : 331 - 341
  • [3] Evaluation Measures of the Classification Performance of Imbalanced Data Sets
    Gu, Qiong
    Zhu, Li
    Cai, Zhihua
    COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, 2009, 51 : 461 - +
  • [4] F-Measure Optimization for Multi-class, Imbalanced Emotion Classification Tasks
    Inan, Toki Tahmid
    Liu, Mingrui
    Shehu, Amarda
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 : 158 - 170
  • [5] An Adjusted Nearest Neighbor Algorithm Maximizing the F-Measure from Imbalanced Data
    Viola, Remi
    Emonet, Remi
    Habrard, Amaury
    Metzler, Guillaume
    Riou, Sebastien
    Sebban, Marc
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 243 - 250
  • [6] Examining the Performance of Classification Algorithms for Imbalanced Data Sets in Web Author Identification
    Vorobeva, Alisa A.
    2016 18TH CONFERENCE OF OPEN INNOVATIONS ASSOCIATION AND SEMINAR ON INFORMATION SECURITY AND PROTECTION OF INFORMATION TECHNOLOGY (FRUCT-ISPIT), 2016, : 385 - 390
  • [7] Feature Selection For High Dimensional Imbalanced Class Data Based on F-Measure Optimization
    Zhang, Chunkai
    Wang, Guoquan
    Zhou, Ying
    Yao, Lin
    Jiang, Zoe L.
    Liao, Qing
    Wang, Xuan
    2017 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC), 2017, : 278 - 283
  • [8] An experimental comparison of classification algorithms for imbalanced credit scoring data sets
    Brown, Iain
    Mues, Christophe
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (03) : 3446 - 3453
  • [9] Regularized F-Measure Maximization for Feature Selection and Classification
    Liu, Zhenqiu
    Tan, Ming
    Jiang, Feng
    JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2009,
  • [10] Performance of evaluation metrics for classification in imbalanced data
    Huayanay, Alex de la Cruz
    Bazan, Jorge L.
    Russo, Cibele M.
    COMPUTATIONAL STATISTICS, 2025, 40 (03) : 1447 - 1473