Finding the Best Classification Threshold in Imbalanced Classification

被引:166
|
作者
Zou, Quan [1 ,2 ]
Xie, Sifa [2 ]
Lin, Ziyu [2 ]
Wu, Meihong [2 ]
Ju, Ying [2 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Xiamen Univ, Dept Comp Sci, Xiamen, Peoples R China
关键词
Receiver Operating Characteristic (ROC); Protein remote homology detection; Imbalance data; F-score; JOINT VIBROARTHROGRAPHIC SIGNALS; REMOTE HOMOLOGY DETECTION; AMINO-ACID-COMPOSITION; MICRORNA PRECURSOR; NEURAL-NETWORK; PROTEIN; IDENTIFICATION; EVOLUTIONARY; SOFTWARE;
D O I
10.1016/j.bdr.2015.12.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification with imbalanced class distributions is a major problem in machine learning. Researchers have given considerable attention to the applications in many real-world scenarios. Although several works have utilized the area under the receiver operating characteristic (ROC) curve to select potentially optimal classifiers in imbalanced classifications, limited studies have been devoted to finding the classification threshold for testing or unknown datasets. In general, the classification threshold is simply set to 0.5, which is usually unsuitable for an imbalanced classification. In this study, we analyze the drawbacks of using ROC as the sole measure of imbalance in data classification problems. In addition, a novel framework for finding the best classification threshold is proposed. Experiments with SCOP v.1.53 data reveal that, with the default threshold set to 0.5, our proposed framework demonstrated a 20.63% improvement in terms of F-score compared with that of more commonly used methods. The findings suggest that the proposed framework is both effective and efficient. A web server and software tools are available via http://datamining.xmu.edu.cn/prht/orhttp://prht.sinaapp.com/. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:2 / 8
页数:7
相关论文
共 50 条
  • [1] Finding Contrast Patterns in Imbalanced Classification based on Sliding Window
    Chen, Xiangtao
    Liu, Zhouzhou
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON MECHANICAL MATERIALS AND MANUFACTURING ENGINEERING (MMME 2016), 2016, 79 : 161 - 166
  • [2] Selecting the optimal threshold based on impurity index in imbalanced classification
    Jang, Shuin
    Yeo, In-Kwon
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (05) : 711 - 721
  • [3] Imbalanced Classification Problems: Systematic Study, Issues and Best Practices
    Lemnaru, Camelia
    Potolea, Rodica
    ENTERPRISE INFORMATION SYSTEMS, ICEIS 2011, 2012, 102 : 35 - 50
  • [4] Weighted ReliefF with threshold constraints of feature selection for imbalanced data classification
    Song, Yan
    Si, Weiyun
    Dai, Feifan
    Yang, Guisong
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2020, 32 (14):
  • [5] A Hybrid Method for Fast Finding the Reduct with the Best Classification Accuracy
    Hacibeyoglu, Mehmet
    Arslan, Ahmet
    Kahramanli, Sirzat
    ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2013, 13 (04) : 57 - 64
  • [6] Threshold optimization for classification in imbalanced data in a problem of gamma-ray astronomy
    Tobias Voigt
    Roland Fried
    Michael Backes
    Wolfgang Rhode
    Advances in Data Analysis and Classification, 2014, 8 : 195 - 216
  • [7] Threshold optimization for classification in imbalanced data in a problem of gamma-ray astronomy
    Voigt, Tobias
    Fried, Roland
    Backes, Michael
    Rhode, Wolfgang
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2014, 8 (02) : 195 - 216
  • [8] Binary Classification with Imbalanced Data
    Chiang, Jyun-You
    Lio, Yuhlong
    Hsu, Chien-Ya
    Ho, Chia-Ling
    Tsai, Tzong-Ru
    ENTROPY, 2024, 26 (01)
  • [9] Deep MLPs for Imbalanced Classification
    Diaz-Vico, David
    Figueiras-Vidal, Anibal R.
    Dorronsoro, Jose R.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 612 - 618
  • [10] Deep Ensembles for Imbalanced Classification
    Kozlovskaia, Nataliia
    Zaytsev, Alexey
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 908 - 913