Fast and simple dataset selection for machine learning

被引:5
|
作者
Peter, Timm J. [1 ]
Nelles, Oliver [1 ]
机构
[1] Univ Siegen, Inst Mechan & Regelungstech Mechatron, Dept Maschinenbau, Paul Bonatz Str 9-11, D-57068 Siegen, Germany
关键词
machine learning; dataset selection; design of experiments; space-filling design; domain adaptation;
D O I
10.1515/auto-2019-0010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of data reduction is discussed and a novel selection approach which allows to control the optimal point distribution of the selected data subset is proposed. The proposed approach utilizes the estimation of probability density functions (pdfs). Due to its structure, the new method is capable of selecting a subset either by approximating the pdf of the original dataset or by approximating an arbitrary, desired target pdf. The new strategy evaluates the estimated pdfs solely on the selected data points, resulting in a simple and efficient algorithm with low computational and memory demand. The performance of the new approach is investigated for two different scenarios. For representative subset selection of a dataset, the new approach is compared to a recently proposed, more complex method and shows comparable results. For the demonstration of the capability of matching a target pdf, a uniform distribution is chosen as an example. Here the new method is compared to strategies for space-filling design of experiments and shows convincing results.
引用
收藏
页码:833 / 842
页数:10
相关论文
共 50 条
  • [1] Learning dataset representation for automatic machine learning algorithm selection
    Noy Cohen-Shapira
    Lior Rokach
    Knowledge and Information Systems, 2022, 64 : 2599 - 2635
  • [2] Learning dataset representation for automatic machine learning algorithm selection
    Cohen-Shapira, Noy
    Rokach, Lior
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (10) : 2599 - 2635
  • [3] Fast Blind Deconvolution with Simple Machine Learning
    Takeshi, Nagata
    PROCEEDINGS OF THE SEVENTH ASIA INTERNATIONAL SYMPOSIUM ON MECHATRONICS, VOL II, 2020, 589 : 967 - 975
  • [4] Information-Theoretic Dataset Selection for Fast Kernel Learning
    Paiva, Antonio R. C.
    2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 2088 - 2095
  • [5] Moment set selection for the SMM using simple machine learning
    Zila, Eric
    Kukacka, Jiri
    JOURNAL OF ECONOMIC BEHAVIOR & ORGANIZATION, 2023, 212 : 366 - 391
  • [6] Dataset Shift in Machine Learning
    Adams, Niall
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2010, 173 : 274 - 274
  • [7] Hyperparameter selection for dataset-constrained semantic segmentation: Practical machine learning optimization
    Boyd, Chris
    Brown, Gregory C.
    Kleinig, Timothy J.
    Mayer, Wolfgang
    Dawson, Joseph
    Jenkinson, Mark
    Bezak, Eva
    JOURNAL OF APPLIED CLINICAL MEDICAL PHYSICS, 2024, 25 (12):
  • [8] A new feature selection method based on machine learning technique for air quality dataset
    Sethi, Jasleen Kaur
    Mittal, Mamta
    JOURNAL OF STATISTICS & MANAGEMENT SYSTEMS, 2019, 22 (04): : 697 - 705
  • [9] A Method for Fast Selection of Machine-Learning Classifiers for Spam Filtering
    Rapacz, Sylwia
    Cholda, Piotr
    Natkaniec, Marek
    ELECTRONICS, 2021, 10 (17)
  • [10] A survey on dataset quality in machine learning
    Gong, Youdi
    Liu, Guangzhen
    Xue, Yunzhi
    Li, Rui
    Meng, Lingzhong
    INFORMATION AND SOFTWARE TECHNOLOGY, 2023, 162