Finding the Best Classification Threshold in Imbalanced Classification

被引:166
|
作者
Zou, Quan [1 ,2 ]
Xie, Sifa [2 ]
Lin, Ziyu [2 ]
Wu, Meihong [2 ]
Ju, Ying [2 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Xiamen Univ, Dept Comp Sci, Xiamen, Peoples R China
关键词
Receiver Operating Characteristic (ROC); Protein remote homology detection; Imbalance data; F-score; JOINT VIBROARTHROGRAPHIC SIGNALS; REMOTE HOMOLOGY DETECTION; AMINO-ACID-COMPOSITION; MICRORNA PRECURSOR; NEURAL-NETWORK; PROTEIN; IDENTIFICATION; EVOLUTIONARY; SOFTWARE;
D O I
10.1016/j.bdr.2015.12.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification with imbalanced class distributions is a major problem in machine learning. Researchers have given considerable attention to the applications in many real-world scenarios. Although several works have utilized the area under the receiver operating characteristic (ROC) curve to select potentially optimal classifiers in imbalanced classifications, limited studies have been devoted to finding the classification threshold for testing or unknown datasets. In general, the classification threshold is simply set to 0.5, which is usually unsuitable for an imbalanced classification. In this study, we analyze the drawbacks of using ROC as the sole measure of imbalance in data classification problems. In addition, a novel framework for finding the best classification threshold is proposed. Experiments with SCOP v.1.53 data reveal that, with the default threshold set to 0.5, our proposed framework demonstrated a 20.63% improvement in terms of F-score compared with that of more commonly used methods. The findings suggest that the proposed framework is both effective and efficient. A web server and software tools are available via http://datamining.xmu.edu.cn/prht/orhttp://prht.sinaapp.com/. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:2 / 8
页数:7
相关论文
共 50 条
  • [22] Influence of Resampling on Accuracy of Imbalanced Classification
    Burnaev, E.
    Erofeev, P.
    Papanov, A.
    EIGHTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2015), 2015, 9875
  • [23] Multimedia Traffic Classification for Imbalanced Environment
    Wu, Zheng
    Dong, Yu-ning
    Jin, Jiong
    Wei, Hua-Liang
    Xie, Gaogang
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (03): : 1838 - 1852
  • [24] A Nearest Neighbor Algorithm for Imbalanced Classification
    Viola, Remi
    Emonet, Remi
    Habrard, Amaury
    Metzler, Guillaume
    Riou, Sebastien
    Sebban, Marc
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2021, 30 (03)
  • [25] Discrimination Aware Classification for Imbalanced Datasets
    Ristanoski, Goce
    Liu, Wei
    Bailey, James
    PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, : 1529 - 1532
  • [26] SVMs Modeling for Highly Imbalanced Classification
    Tang, Yuchun
    Zhang, Yan-Qing
    Chawla, Nitesh V.
    Krasser, Sven
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2009, 39 (01): : 281 - 288
  • [27] Classification of Imbalanced Auction Fraud Data
    Ganguly, Swati
    Sadaoui, Samira
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 84 - 89
  • [28] An Analysis of Performance Metrics for Imbalanced Classification
    Gaudreault, Jean-Gabriel
    Branco, Paula
    Gama, Joao
    DISCOVERY SCIENCE (DS 2021), 2021, 12986 : 67 - 77
  • [29] Classification of Antimicrobial Peptides with Imbalanced Datasets
    Camacho, Francy L.
    Torres, Rodrigo
    Ramos Pollan, Raul
    11TH INTERNATIONAL SYMPOSIUM ON MEDICAL INFORMATION PROCESSING AND ANALYSIS, 2015, 9681
  • [30] Learning Deep Representation for Imbalanced Classification
    Huang, Chen
    Li, Yining
    Loy, Chen Change
    Tang, Xiaoou
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5375 - 5384