Imbalanced generative sampling of training data for improving quality of machine learning model

被引:0
|
作者
Coskun, Umut Can [1 ]
Dogan, Kemal Mert [2 ]
Gunpinar, Erkan [3 ]
机构
[1] Numedyne Informat & Engn Inc, Istanbul, Turkiye
[2] Yildiz Tech Univ, TR-34210 Istanbul, Turkiye
[3] Istanbul Tech Univ, Istanbul, Turkiye
关键词
Imbalanced sampling; Machine learning; Computer-aided design; Design exploration; Training data; Computational fluid dynamics; DESIGN; OPTIMIZATION; PERFORMANCE; UNCERTAINTY; ALGORITHM; SYSTEM;
D O I
10.1016/j.aei.2024.102631
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Design exploration in engineering applications often requires a meticulous experimental or numerical study to evaluate performance ( Y) of each design, which may require great effort, time or resources. Reducing the number of these tests for finding a good design is of paramount importance in all engineering fields. This study aims at computing a machine learning (ML) model using less number of designs as training data. Uniform sampling (US) in the design space (based on predefined design parameters) to obtain a training data is a promising approach. We further extend this sampling concept to obtain designs in the design space by also employing the ML model. The designs are selected via two non -uniform (imbalanced) sampling methods (namely, height -based sampling - HBS and gradient -based sampling - GBS) while considering their Y and gradient, dY, values. These values are divided into uniform intervals, and we aim at equalizing the number of designs in the training data at each interval as much as possible. This can force designs to have minimum or maximum Y or dY values, which, in fact, lie on small portion of the design space, in general. Therefore, capturing designs from all design space portions can be enabled. Results of the proposed methods are compared against US along with two well studied non -uniform sampling strategies, Stratified Over Sampling (SOS) and Gaussian -Process Based Sampling (GPBS). To reliably investigate quality of ML models obtained using designs sampled via US, SOS, GPBS, HBS and GBS, we utilize standard test (known) functions (such as Easom and Beale ) as substitutes for engineering problems. According to the results presented, ML models using HBS and GBS have either better prediction accuracy or wider applicability compared to all other tested sampling methods.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] An Effective Sampling Strategy for Ensemble Learning with Imbalanced Data
    Zhang, Chen
    Zhang, Xiaolong
    INTELLIGENT COMPUTING METHODOLOGIES, ICIC 2017, PT III, 2017, 10363 : 377 - 388
  • [22] Combine Sampling Support Vector Machine for Imbalanced Data Classification
    Sain, Hartayuni
    Purnami, Santi Wulan
    THIRD INFORMATION SYSTEMS INTERNATIONAL CONFERENCE 2015, 2015, 72 : 59 - 66
  • [23] Evaluation of Sampling Methods for Learning from Imbalanced Data
    Goel, Garima
    Maguire, Liam
    Li, Yuhua
    McLoone, Sean
    INTELLIGENT COMPUTING THEORIES, 2013, 7995 : 392 - 401
  • [24] Optimized hybrid imbalanced data sampling for decision tree training
    Wegier, Weronika
    Koziarski, Michal
    Wozniak, Michal
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 339 - 342
  • [25] Online Sequential Extreme Learning Machine with Under-Sampling and Over-Sampling for Imbalanced Big Data Classification
    Du, Jie
    Vong, Chi-Man
    Chang, Yajie
    Jiao, Yang
    PROCEEDINGS OF ELM-2016, 2018, 9 : 229 - 239
  • [26] An Implementation of a Machine Learning Methodology for Improving Test Data Quality
    Ho, Heng Wah
    Leenukiat, Pojcharapol
    2024 35TH ANNUAL SEMI ADVANCED SEMICONDUCTOR MANUFACTURING CONFERENCE, ASMC, 2024,
  • [27] On the least amount of training data for a machine learning model
    Zhao, Dazhi
    Hao, Yunquan
    Li, Weibin
    Tu, Zhe
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (03) : 4891 - 4906
  • [28] Improving Intrusion Detection for Imbalanced Network Traffic using Generative Deep Learning
    Alqarni, Amani A.
    El-Alfy, El-Sayed M.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 959 - 967
  • [29] Improving transformer failure classification on imbalanced DGA data using data-level techniques and machine learning
    Azmi, Putri Azmira R.
    Yusoff, Marina
    Sallehud-din, Mohamad Taufik Mohd
    ENERGY REPORTS, 2025, 13 : 264 - 277
  • [30] The Effects of Data Sampling with Deep Learning and Highly Imbalanced Big Data
    Justin M. Johnson
    Taghi M. Khoshgoftaar
    Information Systems Frontiers, 2020, 22 : 1113 - 1131