Linear Approximation of F-Measure for the Performance Evaluation of Classification Algorithms on Imbalanced Data Sets

被引:7
|
作者
Wong, Tzu-Tsung [1 ]
机构
[1] Natl Cheng Kung Univ, Inst Informat Management, 1 Ta Sheuh Rd, Tainan 701, Taiwan
关键词
Classification; cross validation; F-measure; imbalanced data set; sampling distribution; STATISTICAL COMPARISONS; K-FOLD; CLASSIFIERS; RECALL;
D O I
10.1109/TKDE.2020.2986749
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Accuracy is a popular measure for evaluating the performance of classification algorithms tested on ordinary data sets. When a data set is imbalanced, F-measure will be a better choice than accuracy for this purpose. Since F-measure is calculated as the harmonic mean of recall and precision, it is difficult to find the sampling distribution of F-measure for evaluating classification algorithms. Since the values of recall and precision are dependent, their joint distribution is assumed to follow a bivariate normal distribution in this study. When the evaluation method is k-fold cross validation, a linear approximation approach is proposed to derive the sampling distribution of F-measure. This approach is used to design methods for comparing the performance of two classification algorithms tested on single or multiple imbalanced data sets. The methods are tested on ten imbalanced data sets to demonstrate their effectiveness. The weight of recall provided by this linear approximation approach can help us to analyze the characteristics of classification algorithms.
引用
收藏
页码:753 / 763
页数:11
相关论文
共 50 条
  • [41] Optimal and Linear F-Measure Classifiers Applied to Non-technical Losses Detection
    Rodriguez, Fernanda
    Di Martino, Matias
    Pablo Kosut, Juan
    Santomauro, Fernando
    Lecumberry, Federico
    Fernandez, Alicia
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 83 - 91
  • [42] Improving SVM Classification on Imbalanced Data Sets in Distance Spaces
    Koeknar-Tezel, Suzan
    Latecki, Longin Jan
    2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 259 - +
  • [43] Classification performance assessment for imbalanced multiclass data
    Aguilar-Ruiz, Jesus S.
    Michalak, Marcin
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [45] A New Evaluation Measure for Learning from Imbalanced Data
    Thai-Nghe, Nguyen
    Gantner, Zeno
    Schmidt-Thieme, Lars
    2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 537 - 542
  • [46] Parallel Metaheuristic Algorithms for Solving Imbalanced Data Classification Problems
    Alweshah, Mohammed
    Almiani, Muder
    Alkhalaileh, Saleh
    Kassaymeh, Sofian
    Hezzam, Essa Abdullah
    Alomoush, Waleed
    IEEE ACCESS, 2023, 11 : 114443 - 114458
  • [47] Near-Linear Approximation Algorithms for Geometric Hitting Sets
    Pankaj K. Agarwal
    Esther Ezra
    Micha Sharir
    Algorithmica, 2012, 63 : 1 - 25
  • [48] Performance evaluation of clustering algorithms for varying cardinality and dimensionality of data sets
    Renjith, Shini
    Sreekumar, A.
    Jathavedan, M.
    MATERIALS TODAY-PROCEEDINGS, 2020, 27 : 627 - 633
  • [49] Near-Linear Approximation Algorithms for Geometric Hitting Sets
    Agarwal, Pankaj K.
    Ezra, Esther
    Sharir, Micha
    ALGORITHMICA, 2012, 63 (1-2) : 1 - 25
  • [50] Near-Linear Approximation Algorithms for Geometric Hitting Sets
    Agarwal, Pankaj K.
    Ezra, Esther
    Sharir, Micha
    PROCEEDINGS OF THE TWENTY-FIFTH ANNUAL SYMPOSIUM ON COMPUTATIONAL GEOMETRY (SCG'09), 2009, : 23 - 32