Classification for Imbalanced and Overlapping Classes Using Outlier Detection and Sampling Techniques

被引:8
|
作者
Yang, Zeping [1 ]
Gao, Daqi [1 ]
机构
[1] E China Univ Sci & Technol, Sch Informat Sci & Engn, Shanghai 200237, Peoples R China
关键词
Under-sampling; outlier detection; overlapping; imbalanced data; artificial neural network (ANN);
D O I
10.12785/amis/071L50
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In many real world applications, the example data among different pattern classes are imbalanced and overlapping, which hinder the classification performance of many learning algorithms In this paper, data cleaning techniques based BNF (the borderline noise factor) is proposed to remove the borderline noise and three under-sampling methods are studied to select the representative majority class examples and remove the distant samples which are useless to form the decision boundary. BNF shows the degree of being a borderline noise and the outlier detection algorithm is improved to clean the whole dataset. Here G-mean (Geometric Mean) is used to define the threshold, which can improve the classification accuracy of minority classes while achieving better performance on the overall classification. The experimental results demonstrate the effectiveness of sampling method with data cleaning techniques based on BNF.
引用
收藏
页码:375 / 381
页数:7
相关论文
共 50 条
  • [11] Classification of imbalanced ECG beats using re-sampling techniques and AdaBoost ensemble classifier
    Rajesh, Kandala N. V. P. S.
    Dhuli, Ravindra
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2018, 41 : 242 - 254
  • [12] Machine Learning with Imbalanced EEG Datasets using Outlier-based Sampling
    Islah, Nizar
    Koerner, Jamie
    Genov, Roman
    Valiante, Taufik A.
    O'Leary, Gerard
    42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 112 - 115
  • [13] Comparison Of The Different Sampling Techniques For Imbalanced Classification Problems In Machine Learning
    Peng Zhihao
    Yan Fenglong
    Li Xucheng
    2019 11TH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2019), 2019, : 431 - 434
  • [14] A resistance outlier sampling algorithm for imbalanced data prediction
    Pan, Xiaoying
    Jia, Rong
    Huang, Jiahao
    Wang, Hao
    INTELLIGENT DATA ANALYSIS, 2022, 26 (03) : 583 - 598
  • [15] NEW ITERATIVE LEARNING STRATEGY TO IMPROVE CLASSIFICATION SYSTEMS BY USING OUTLIER DETECTION TECHNIQUES
    Pelletier, C.
    Valero, S.
    Inglada, J.
    Dedieu, G.
    Champion, N.
    2017 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2017, : 3676 - 3679
  • [16] Noise Detection in Imbalanced Classes Using Adaptive Boosting
    Saglam, Fatih
    Cengiz, Mehmet Ali
    2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 449 - 452
  • [17] From Imbalanced Classification to Supervised Outlier Detection Problems: Adversarially Trained Auto Encoders
    Luebbering, Max
    Ramamurthy, Rajkumar
    Gebauer, Michael
    Bell, Thiago
    Sifa, Rafet
    Bauckhage, Christian
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT I, 2020, 12396 : 27 - 38
  • [18] Improvising the accuracy in Classification of Spam emails through Outlier Detection and Classification techniques
    Nancy, P.
    Ramani, R. Geetha
    Jacob, Shomona Gracia
    2012 INTERNATIONAL CONFERENCE ON FUTURE COMMUNICATION AND COMPUTER TECHNOLOGY (ICFCCT 2012), 2012, : 173 - 179
  • [19] Using Evolutionary Multiobjective Techniques for Imbalanced Classification Data
    Garcia, Sandra
    Aler, Ricardo
    Maria Galvan, Ines
    ARTIFICIAL NEURAL NETWORKS-ICANN 2010, PT I, 2010, 6352 : 422 - 427
  • [20] Enriched Over-Sampling Techniques for Improving Classification of Imbalanced Big Data
    Patil, Sachin Subhash
    Sonavane, Shefali Pratap
    2017 THIRD IEEE INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS (IEEE BIGDATASERVICE 2017), 2017, : 1 - 10