Fast and simple dataset selection for machine learning

被引:5
|
作者
Peter, Timm J. [1 ]
Nelles, Oliver [1 ]
机构
[1] Univ Siegen, Inst Mechan & Regelungstech Mechatron, Dept Maschinenbau, Paul Bonatz Str 9-11, D-57068 Siegen, Germany
关键词
machine learning; dataset selection; design of experiments; space-filling design; domain adaptation;
D O I
10.1515/auto-2019-0010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of data reduction is discussed and a novel selection approach which allows to control the optimal point distribution of the selected data subset is proposed. The proposed approach utilizes the estimation of probability density functions (pdfs). Due to its structure, the new method is capable of selecting a subset either by approximating the pdf of the original dataset or by approximating an arbitrary, desired target pdf. The new strategy evaluates the estimated pdfs solely on the selected data points, resulting in a simple and efficient algorithm with low computational and memory demand. The performance of the new approach is investigated for two different scenarios. For representative subset selection of a dataset, the new approach is compared to a recently proposed, more complex method and shows comparable results. For the demonstration of the capability of matching a target pdf, a uniform distribution is chosen as an example. Here the new method is compared to strategies for space-filling design of experiments and shows convincing results.
引用
收藏
页码:833 / 842
页数:10
相关论文
共 50 条
  • [41] A hybrid machine learning approach to identify coronary diseases using feature selection mechanism on heart disease dataset
    Doppala, Bhanu Prakash
    Bhattacharyya, Debnath
    Chakkravarthy, Midhun
    Kim, Tai-hoon
    DISTRIBUTED AND PARALLEL DATABASES, 2023, 41 (1-2) : 1 - 20
  • [42] Evaluation of Feature Selection Methods for Preserving Machine Learning Performance in the Presence of Temporal Dataset Shift in Clinical Medicine
    Lemmon, Joshua
    Guo, Lin Lawrence
    Posada, Jose
    Pfohl, Stephen R.
    Fries, Jason
    Fleming, Scott Lanyon
    Aftandilian, Catherine
    Shah, Nigam
    Sung, Lillian
    METHODS OF INFORMATION IN MEDICINE, 2023, 62 (01/02) : 60 - 69
  • [43] Machine Learning Techniques for Intrusion Detection on Public Dataset
    Thanthrige, Udaya Sampath K. Perera Miriya
    Samarabandu, Jagath
    Wang, Xianbin
    2016 IEEE CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (CCECE), 2016,
  • [44] The AmTriangle Meta-Dataset for Playing with Machine Learning
    Marques, Artur
    de Amorim Silva, Rafael
    Madeira, Filipe
    PERSPECTIVES AND TRENDS IN EDUCATION AND TECHNOLOGY, ICITED 2022, 2023, 320 : 243 - 252
  • [45] AESA Antennas using Machine Learning with Reduced Dataset
    Zaib, Alam
    Masood, Abdur Rehman
    Abdullah, Muhammad Asad
    Khattak, Shahid
    Bin Saleem, Aasim
    Ullah, Irfan
    RADIOENGINEERING, 2024, 33 (03) : 397 - 405
  • [46] Comparative evaluation of machine learning classifiers with Obesity dataset
    Ramya, A.
    Rohini, K.
    2021 INTERNATIONAL CONFERENCE ON COMPUTING SCIENCES (ICCS 2021), 2021, : 38 - 41
  • [47] An Indoor Sound Source Localization Dataset for Machine Learning
    Wu, Tao
    Jiang, Yong
    Li, Nan
    Feng, Tao
    PROCEEDINGS OF 2018 THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ARTIFICIAL INTELLIGENCE (CSAI 2018) / 2018 THE 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND MULTIMEDIA TECHNOLOGY (ICIMT 2018), 2018, : 28 - 32
  • [48] Effectiveness of dataset reduction in testing machine learning algorithms
    Chandrasekaran, Jaganmohan
    Feng, Huadong
    Lei, Yu
    Kacker, Raghu
    Kuhn, D. Richard
    2020 IEEE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE TESTING (AITEST), 2020, : 133 - 140
  • [49] A dataset of oracle characters for benchmarking machine learning algorithms
    Wang, Mei
    Deng, Weihong
    SCIENTIFIC DATA, 2024, 11 (01)
  • [50] HARTH: A Human Activity Recognition Dataset for Machine Learning
    Logacjov, Aleksej
    Bach, Kerstin
    Kongsvold, Atle
    Bardstu, Hilde Bremseth
    Mork, Paul Jarle
    SENSORS, 2021, 21 (23)