Joint Sample Position Based Noise Filtering and Mean Shift Clustering for Imbalanced Classification Learning

被引:1
|
作者
Duan, Lilong [1 ,2 ]
Xue, Wei [1 ,2 ]
Huang, Jun [1 ,2 ]
Zheng, Xiao [1 ,2 ]
机构
[1] Anhui Univ Technol, Sch Comp Sci & Technol, Maanshan 243032, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei 230088, Peoples R China
来源
TSINGHUA SCIENCE AND TECHNOLOGY | 2024年 / 29卷 / 01期
关键词
Clustering algorithms; Filtering algorithms; Benchmark testing; Sampling methods; Information filters; Cleaning; Classification algorithms; imbalanced data classification; oversampling; noise filtering; clustering; OVERSAMPLING TECHNIQUE; SMOTE; PREDICTION;
D O I
10.26599/TST.2023.9010006
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of imbalanced data classification learning has received much attention. Conventional classification algorithms are susceptible to data skew to favor majority samples and ignore minority samples. Majority weighted minority oversampling technique (MWMOTE) is an effective approach to solve this problem, however, it may suffer from the shortcomings of inadequate noise filtering and synthesizing the same samples as the original minority data. To this end, we propose an improved MWMOTE method named joint sample position based noise filtering and mean shift clustering (SPMSC) to solve these problems. Firstly, in order to effectively eliminate the effect of noisy samples, SPMSC uses a new noise filtering mechanism to determine whether a minority sample is noisy or not based on its position and distribution relative to the majority sample. Note that MWMOTE may generate duplicate samples, we then employ the mean shift algorithm to cluster minority samples to reduce synthetic replicate samples. Finally, data cleaning is performed on the processed data to further eliminate class overlap. Experiments on extensive benchmark datasets demonstrate the effectiveness of SPMSC compared with other sampling methods.
引用
收藏
页码:216 / 231
页数:16
相关论文
共 50 条
  • [41] INTEGRATING DIMENSION REDUCTION WITH MEAN-SHIFT CLUSTERING FOR BIOLOGICAL SHAPE CLASSIFICATION
    Lee, Hao-Chih
    Yang, Ge
    2014 IEEE 11TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2014, : 254 - 257
  • [42] Imbalanced Data Classification Method Based on Ensemble Learning
    Xiang, Yu
    Xie, Yongping
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 18 - 24
  • [43] Intrusion detection method based on imbalanced learning classification
    Li, Xiangjun
    Kong, Ke
    Shen, Hua
    Wei, Zhixiang
    Liao, Xiaofeng
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2024, 36 (05) : 657 - 677
  • [44] Linguistic Steganalysis Based on Clustering and Ensemble Learning in Imbalanced Scenario
    Guo, Shengnan
    Chen, Xuekai
    Wang, Zhuang
    Yang, Zhongliang
    Zhou, Linna
    DIGITAL FORENSICS AND WATERMARKING, IWDW 2023, 2024, 14511 : 304 - 318
  • [45] A Method of Imbalanced Traffic Classification Based on Ensemble Learning
    Ding, Yaojun
    2015 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (ICSPCC), 2015, : 265 - 268
  • [46] Imbalanced Learning with Oversampling based on Classification Contribution Degree
    Jiang, Zhenhao
    Yang, Jie
    Liu, Yan
    ADVANCED THEORY AND SIMULATIONS, 2021, 4 (05)
  • [47] Consensus Clustering-Based Undersampling Approach to Imbalanced Learning
    Onan, Aytug
    SCIENTIFIC PROGRAMMING, 2019, 2019
  • [48] A fresh look at mean-shift based modal clustering
    Ameijeiras-Alonso, Jose
    Einbeck, Jochen
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024, 18 (04) : 1067 - 1095
  • [49] Mean shift-based clustering of remotely sensed data
    Friedman, L
    Netanyahu, NS
    Shoshany, M
    IGARSS 2003: IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, VOLS I - VII, PROCEEDINGS: LEARNING FROM EARTH'S SHAPES AND SIZES, 2003, : 3432 - 3434
  • [50] Mean shift-based clustering for misaligned functional data
    Welbaum, Andrew
    Qiao, Wanli
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2025, 206