Finding the Best Classification Threshold in Imbalanced Classification

被引:166
|
作者
Zou, Quan [1 ,2 ]
Xie, Sifa [2 ]
Lin, Ziyu [2 ]
Wu, Meihong [2 ]
Ju, Ying [2 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin, Peoples R China
[2] Xiamen Univ, Dept Comp Sci, Xiamen, Peoples R China
关键词
Receiver Operating Characteristic (ROC); Protein remote homology detection; Imbalance data; F-score; JOINT VIBROARTHROGRAPHIC SIGNALS; REMOTE HOMOLOGY DETECTION; AMINO-ACID-COMPOSITION; MICRORNA PRECURSOR; NEURAL-NETWORK; PROTEIN; IDENTIFICATION; EVOLUTIONARY; SOFTWARE;
D O I
10.1016/j.bdr.2015.12.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification with imbalanced class distributions is a major problem in machine learning. Researchers have given considerable attention to the applications in many real-world scenarios. Although several works have utilized the area under the receiver operating characteristic (ROC) curve to select potentially optimal classifiers in imbalanced classifications, limited studies have been devoted to finding the classification threshold for testing or unknown datasets. In general, the classification threshold is simply set to 0.5, which is usually unsuitable for an imbalanced classification. In this study, we analyze the drawbacks of using ROC as the sole measure of imbalance in data classification problems. In addition, a novel framework for finding the best classification threshold is proposed. Experiments with SCOP v.1.53 data reveal that, with the default threshold set to 0.5, our proposed framework demonstrated a 20.63% improvement in terms of F-score compared with that of more commonly used methods. The findings suggest that the proposed framework is both effective and efficient. A web server and software tools are available via http://datamining.xmu.edu.cn/prht/orhttp://prht.sinaapp.com/. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:2 / 8
页数:7
相关论文
共 50 条
  • [31] The effect of rebalancing on LDA in imbalanced classification
    Kim, Arlene K. H.
    Chung, Hyunwoo
    STAT, 2021, 10 (01):
  • [32] Finding the best not the most: regularized loss minimization subgraph selection for graph classification
    Pan, Shirui
    Wu, Jia
    Zhu, Xingquan
    Long, Guodong
    Zhang, Chengqi
    PATTERN RECOGNITION, 2015, 48 (11) : 3783 - 3796
  • [33] Deep reinforcement learning for imbalanced classification
    Lin, Enlu
    Chen, Qiong
    Qi, Xiaoming
    APPLIED INTELLIGENCE, 2020, 50 (08) : 2488 - 2502
  • [34] Imbalanced Classification: Challenges and Approaches to Handle
    Maheshwari, Divya
    Smart Innovation, Systems and Technologies, 2023, 363 : 533 - 543
  • [35] FINDING THE BEST ALGORITHMS AND EFFECTIVE FACTORS IN CLASSIFICATION OF TURKISH SCIENCE STUDENT SUCCESS
    Filiz, Enes
    Oz, Ersoy
    JOURNAL OF BALTIC SCIENCE EDUCATION, 2019, 18 (02): : 239 - 253
  • [36] Imbalanced Data Classification Based on Clustering
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 741 - 745
  • [37] The Imbalanced Problem in Morphological Galaxy Classification
    de la Calleja, Jorge
    Huerta, Gladis
    Fuentes, Olac
    Benitez, Antonio
    Lopez Dominguez, Eduardo
    Auxilio Medina, Ma.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, 2010, 6419 : 533 - +
  • [38] BSMBoost for Imbalanced Pattern Classification Problems
    Ng, Wing W. Y.
    Zhang, Yuda
    Zhang, Jianjun
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 930 - 935
  • [39] Learning Deep Landmarks for Imbalanced Classification
    Bao, Feng
    Deng, Yue
    Kong, Youyong
    Ren, Zhiquan
    Suo, Jinli
    Dai, Qionghai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) : 2691 - 2704
  • [40] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269