Over-sampling algorithm for imbalanced data classification

被引:0
|
作者
XU Xiaolong [1 ]
CHEN Wen [2 ]
SUN Yanfei [3 ]
机构
[1] Jiangsu Key Laboratory of Big Data Security & Intelligent Processing, Nanjing University of Posts and Telecommunications
[2] Institute of Big Data Research at Yancheng, Nanjing University of Posts and Telecommunications
[3] Office of Scientific R&D, Nanjing University of Posts and Telecommunications
关键词
imbalanced data; density-based spatial clustering of applications with noise(DBSCAN); synthetic minority oversampling technique(SMOTE); over-sampling;
D O I
暂无
中图分类号
TP311.13 [];
学科分类号
1201 ;
摘要
For imbalanced datasets, the focus of classification is to identify samples of the minority class. The performance of current data mining algorithms is not good enough for processing imbalanced datasets. The synthetic minority over-sampling technique(SMOTE) is specifically designed for learning from imbalanced datasets, generating synthetic minority class examples by interpolating between minority class examples nearby. However, the SMOTE encounters the overgeneralization problem. The densitybased spatial clustering of applications with noise(DBSCAN) is not rigorous when dealing with the samples near the borderline.We optimize the DBSCAN algorithm for this problem to make clustering more reasonable. This paper integrates the optimized DBSCAN and SMOTE, and proposes a density-based synthetic minority over-sampling technique(DSMOTE). First, the optimized DBSCAN is used to divide the samples of the minority class into three groups, including core samples, borderline samples and noise samples, and then the noise samples of minority class is removed to synthesize more effective samples. In order to make full use of the information of core samples and borderline samples,different strategies are used to over-sample core samples and borderline samples. Experiments show that DSMOTE can achieve better results compared with SMOTE and Borderline-SMOTE in terms of precision, recall and F-value.
引用
收藏
页码:1182 / 1191
页数:10
相关论文
共 50 条
  • [21] Over-Sampling Method on Imbalanced Data Based on WKMeans and SMOTE
    Chen, Junfeng
    Zheng, Zhongtuan
    Computer Engineering and Applications, 2024, 57 (23) : 106 - 112
  • [22] Imbalanced Data Over-Sampling Method Based on ISODATA Clustering
    Lv, Zhenzhe
    Liu, Qicheng
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2023, E106D (09) : 1528 - 1536
  • [23] BCGAN-based Over-sampling Scheme for Imbalanced Data
    Son, Minjae
    Jung, Seungwon
    Moon, Jihoon
    Hwang, Eenjun
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 155 - 160
  • [24] A self-adaptive synthetic over-sampling technique for imbalanced classification
    Gu, Xiaowei
    Angelov, Plamen P.
    Soares, Eduardo A.
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2020, 35 (06) : 923 - 943
  • [25] Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification
    Du, Jie
    Vong, Chi-Man
    Chang, Yajie
    Jiao, Yang
    PROCEEDINGS OF ELM-2016, 2018, 9 : 229 - 239
  • [26] A New Over-sampling Technique Based on SVM for Imbalanced Diseases Data
    Wang, Jinjin
    Yao, Yukai
    Zhou, Hanhai
    Leng, Mingwei
    Chen, Xiaoyun
    PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC), 2013, : 1224 - 1228
  • [27] Preprocessing of Imbalanced Breast Cancer Data using Feature Selection Combined with Over-Sampling Technique for classification
    Jojan, Janjira
    Srivihok, Anongnart
    2013 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2013, : 407 - 412
  • [28] Dynamic weighted majority based on over-sampling for imbalanced data streams
    Du, Hongle
    Thelma, Palaoag
    2021 THE 4TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INTELLIGENT SYSTEMS, CIIS 2021, 2021, : 87 - 95
  • [29] An over-sampling expert system for learning from imbalanced data sets
    He, GX
    Han, H
    Wang, WY
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 537 - 541
  • [30] A Learning Approach with Under-and Over-sampling for Imbalanced Data Sets
    Yeh, Chun-Wu
    Li, Der-Chiang
    Lin, Liang-Sian
    Tsai, Tung-I
    PROCEEDINGS 2016 5TH IIAI INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS IIAI-AAI 2016, 2016, : 725 - 729