Incremental SMOTE with Control Coefficient for Classifiers in Data Starved Medical Applications

被引:0
|
作者
Bae, Wan D. [1 ]
Alkobaisi, Shayma [2 ]
Bankar, Siddheshwari [1 ]
Bhuvaji, Sartaj [1 ]
Singhvi, Jay [1 ]
Irukulla, Madhuroopa [1 ]
McDonnell, William [1 ]
机构
[1] Seattle Univ, Comp Sci, Seattle, WA 98122 USA
[2] United Arab Emirates Univ, Coll Informat Technol, Al Ain, U Arab Emirates
关键词
class imbalance problem; synthetic minority oversampling technique; rare event prediction; data starved contexts; control coefficient;
D O I
10.1007/978-3-031-68323-7_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Prediction models for data-starved medical applications lag behind general machine learning solutions, despite their potential to improve early interventions. This is largely due to the assumption that optimization approaches are applied on a balanced distribution of events, yet medical data often has an imbalanced distribution within classes. The curse of dimensionality is further exacerbated by small samples and a high number of features in individual-based risk prediction models. In this paper, we propose a data augmentation system to gradually create synthetic minority samples with a control coefficient, which improves the quality of generated data over time and consequently boosts prediction model performance. This system incrementally adjusts to the data distribution, avoiding overfitting. We evaluate our approach using four synthetic oversampling techniques on real asthma patient data. Our results show that this system enhances classifiers' overall performance across all four techniques. Specifically, applying the incremental data augmentation approach to three oversampling methods led to an increase in sensitivity of 4.01% to 7.79% in deep transfer learning-based classifiers.
引用
收藏
页码:112 / 119
页数:8
相关论文
共 50 条
  • [1] Synthetic Data Generation and Evaluation Techniques for Classifiers in Data Starved Medical Applications
    Bae, Wan D.
    Alkobaisi, Shayma
    Horak, Matthew
    Bankar, Siddheshwari
    Bhuvaji, Sartaj
    Kim, Sungroul
    Park, Choon-Sik
    IEEE ACCESS, 2025, 13 : 16584 - 16602
  • [2] SELF-ORGANIZING MAPS AS DATA CLASSIFIERS IN MEDICAL APPLICATIONS
    Tuckova, Jana
    Bartu, Marek
    Zetocha, Petr
    Grill, Pavel
    NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : 422 - 429
  • [3] Incremental learning of ensemble classifiers on ECG data
    Macek, J
    18TH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2005, : 315 - 320
  • [4] ACTIVE SMOTE for Imbalanced Medical Data Classification
    Sena, Raul
    Ben Hamida, Sana
    ADVANCES IN INFORMATION SYSTEMS, ARTIFICIAL INTELLIGENCE AND KNOWLEDGE MANAGEMENT, ICIKS 2023, 2024, 486 : 81 - 97
  • [5] Incremental Weighted Naive Bays Classifiers for Data Stream
    Salperwyck, Christophe
    Lemaire, Vincent
    Hue, Carine
    DATA SCIENCE, LEARNING BY LATENT STRUCTURES, AND KNOWLEDGE DISCOVERY, 2015, : 179 - 190
  • [6] Effective Prediction of Type II Diabetes Mellitus Using Data Mining Classifiers and SMOTE
    Shuja, Mirza
    Mittal, Sonu
    Zaman, Majid
    ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 195 - 211
  • [7] Selecting Classifiers for Medical Data Analysis
    Abin, Deepa
    Potey, M. A.
    2013 INTERNATIONAL CONFERENCE ON MACHINE INTELLIGENCE AND RESEARCH ADVANCEMENT (ICMIRA 2013), 2013, : 285 - 289
  • [8] INCREMENTAL CONTROL - APPLICATIONS ABOUND
    BAILEY, SJ
    CONTROL ENGINEERING, 1977, 24 (04) : 47 - 49
  • [9] A histogram SMOTE-based sampling algorithm with incremental learning for imbalanced data classification
    Liaw, Lawrence Chuin Ming
    Tan, Shing Chiang
    Goh, Pey Yun
    Lim, Chee Peng
    INFORMATION SCIENCES, 2025, 686
  • [10] The incremental SMOTE: A new approach based on the incremental k-means algorithm for solving imbalanced data set problem
    Turan, Duygu Selin
    Ordin, Burak
    INFORMATION SCIENCES, 2025, 711