The area under the ROC curve as a measure of clustering quality

被引:17
|
作者
Jaskowiak, Pablo A. [1 ]
Costa, Ivan G. [2 ]
Campello, Ricardo J. G. B. [3 ]
机构
[1] Fed Univ Santa Catarina UFSC, Joinville, SC, Brazil
[2] Rhein Westfal TH Aachen, Inst Computat Genom, Med Fac, Aachen, Germany
[3] Univ Newcastle, Sch Math & Phys Sci, Callaghan, NSW, Australia
基金
巴西圣保罗研究基金会;
关键词
Clustering validation; Area under the curve; Receiver operating characteristics; AUC/ROC; Area under the curve for clustering; Qualitative/visual clustering evaluation; R-PACKAGE; VALIDATION; INDEXES; NUMBER;
D O I
10.1007/s10618-022-00829-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The area under the receiver operating characteristics (ROC) Curve, referred to as AUC, is a well-known performance measure in the supervised learning domain. Due to its compelling features, it has been employed in a number of studies to evaluate and compare the performance of different classifiers. In this work, we explore AUC as a performance measure in the unsupervised learning domain, more specifically, in the context of cluster analysis. In particular, we elaborate on the use of AUC as an internal/relative measure of clustering quality, which we refer to as Area Under the Curve for Clustering (AUCC). We show that the AUCC of a given candidate clustering solution has an expected value under a null model of random clustering solutions, regardless of the size of the dataset and, more importantly, regardless of the number or the (im)balance of clusters under evaluation. In addition, we elaborate on the fact that, in the context of internal/relative clustering validation as we consider, AUCC is actually a linear transformation of the Gamma criterion from Baker and Hubert (1975), for which we also formally derive a theoretical expected value for chance clusterings. We also discuss the computational complexity of these criteria and show that, while an ordinary implementation of Gamma can be computationally prohibitive and impractical for most real applications of cluster analysis, its equivalence with AUCC actually unveils a much more efficient algorithmic procedure. Our theoretical findings are supported by experimental results. These results show that, in addition to an effective and robust quantitative evaluation provided by AUCC, visual inspection of the ROC curves themselves can be useful to further assess a candidate clustering solution from a broader, qualitative perspective as well.
引用
收藏
页码:1219 / 1245
页数:27
相关论文
共 50 条
  • [31] Score Fusion by Maximizing the Area under the ROC Curve
    Villegas, Mauricio
    Paredes, Roberto
    PATTERN RECOGNITION AND IMAGE ANALYSIS, PROCEEDINGS, 2009, 5524 : 473 - 480
  • [32] A boosting method for maximization of the area under the ROC curve
    Osamu Komori
    Annals of the Institute of Statistical Mathematics, 2011, 63 : 961 - 979
  • [33] Area under the ROC curve for a binary diagnostic test
    Desbiens, NA
    MEDICAL DECISION MAKING, 2001, 21 (05) : 421 - 421
  • [34] Equivalence of the statistics for replicability and area under the ROC curve
    Irwin, R. John
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2009, 62 : 485 - 487
  • [35] Combined Use of the Area under the ROC Curve and a measure of Contrast to Evaluate Template Matching Similarity Metrics
    Padayachee, J.
    Rae, W. I. D.
    Alport, M. J.
    WORLD CONGRESS ON MEDICAL PHYSICS AND BIOMEDICAL ENGINEERING 2006, VOL 14, PTS 1-6, 2007, 14 : 2476 - +
  • [36] How to measure the diagnostic accuracy of noninvasive liver fibrosis indices: The area under the ROC curve revisited
    Lambert, Jerome
    Halfon, Philippe
    Penaranda, Guillaume
    Bedossa, Pierre
    Cacoub, Patrice
    Carrat, Fabrice
    CLINICAL CHEMISTRY, 2008, 54 (08) : 1372 - 1378
  • [37] Area under the curve as a measure of discounting
    Myerson, J
    Green, L
    Warusawitharana, M
    JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR, 2001, 76 (02) : 235 - 243
  • [38] ESTIMATION OF AREA UNDER THE ROC CURVE UNDER NONIGNORABLE VERIFICATION BIAS
    Yu, Wenbao
    Kim, Jae Kwang
    Park, Taesung
    STATISTICA SINICA, 2018, 28 (04) : 2149 - 2166
  • [39] A relationship between the incremental values of area under the ROC curve and of area under the precision-recall curve
    Qian M. Zhou
    Lu Zhe
    Russell J. Brooke
    Melissa M. Hudson
    Yan Yuan
    Diagnostic and Prognostic Research, 5 (1)
  • [40] Weighted empirical likelihood inference for the area under the ROC curve
    Chrzanowski, Michal
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2014, 147 : 159 - 172