Measuring the prediction difficulty of individual cases in a dataset using machine learning

被引:0
|
作者
Kwon, Hyunjin [1 ,2 ]
Greenberg, Matthew [3 ]
Josephson, Colin Bruce [4 ,6 ]
Lee, Joon [2 ,5 ,6 ,7 ]
机构
[1] Univ Calgary, Schulich Sch Engn, Dept Biomed Engn, Calgary, AB, Canada
[2] Univ Calgary, Cumming Sch Med, Data Intelligence Hlth Lab, Calgary, AB, Canada
[3] Univ Calgary, Dept Math & Stat, Fac Sci, Calgary, AB, Canada
[4] Univ Calgary, Cumming Sch Med, Dept Clin Neurosci, Calgary, AB, Canada
[5] Univ Calgary, Cumming Sch Med, Dept Cardiac Sci, Calgary, AB, Canada
[6] Univ Calgary, Cumming Sch Med, Dept Community Hlth Sci, Calgary, AB, Canada
[7] Kyung Hee Univ, Sch Med, Dept Prevent Med, Seoul, South Korea
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1038/s41598-024-61284-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Different levels of prediction difficulty are one of the key factors that researchers encounter when applying machine learning to data. Although previous studies have introduced various metrics for assessing the prediction difficulty of individual cases, these metrics require specific dataset preconditions. In this paper, we propose three novel metrics for measuring the prediction difficulty of individual cases using fully-connected feedforward neural networks. The first metric is based on the complexity of the neural network needed to make a correct prediction. The second metric employs a pair of neural networks: one makes a prediction for a given case, and the other predicts whether the prediction made by the first model is likely to be correct. The third metric assesses the variability of the neural network's predictions. We investigated these metrics using a variety of datasets, visualized their values, and compared them to fifteen existing metrics from the literature. The results demonstrate that the proposed case difficulty metrics were better able to differentiate various levels of difficulty than most of the existing metrics and show constant effectiveness across diverse datasets. We expect our metrics will provide researchers with a new perspective on understanding their datasets and applying machine learning in various fields.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Dynamic mortality prediction using machine learning techniques for acute cardiovascular cases
    Metsker, Oleg
    Sikorsky, Sergey
    Yakovlev, Aleksey
    Kovalchuk, Sergey
    7TH INTERNATIONAL YOUNG SCIENTISTS CONFERENCE ON COMPUTATIONAL SCIENCE, YSC2018, 2018, 136 : 351 - 358
  • [22] Grape dataset: A dataset for disease prediction and classification for machine learning applications through environmental parameters
    Gawande, Apeksha
    Gadge, Swati Sherekar Sant
    DATA IN BRIEF, 2024, 54
  • [23] AESA Antennas using Machine Learning with Reduced Dataset
    Zaib, Alam
    Masood, Abdur Rehman
    Abdullah, Muhammad Asad
    Khattak, Shahid
    Bin Saleem, Aasim
    Ullah, Irfan
    RADIOENGINEERING, 2024, 33 (03) : 397 - 405
  • [24] Machine Learning for Bankruptcy Prediction in the American Stock Market: Dataset and Benchmarks
    Lombardo, Gianfranco
    Pellegrino, Mattia
    Adosoglou, George
    Cagnoni, Stefano
    Pardalos, Panos M.
    Poggi, Agostino
    FUTURE INTERNET, 2022, 14 (08):
  • [25] Binary dataset for machine learning applications to tropical cyclone formation prediction
    Kieu, Chanh
    Nguyen, Quan
    SCIENTIFIC DATA, 2024, 11 (01)
  • [26] Machine Learning Algorithm-Based Prediction of Diabetes Among Female Population Using PIMA Dataset
    Ahmed, Afshan
    Khan, Jalaluddin
    Arsalan, Mohd
    Ahmed, Kahksha
    Shahat, Abdelaaty A.
    Alhalmi, Abdulsalam
    Naaz, Sameena
    HEALTHCARE, 2025, 13 (01)
  • [27] Prediction of domestic power peak demand and consumption using supervised machine learning with smart meter dataset
    Geetha, R.
    Ramyadevi, K.
    Balasubramanian, M.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (13) : 19675 - 19693
  • [28] Prediction of domestic power peak demand and consumption using supervised machine learning with smart meter dataset
    R. Geetha
    K. Ramyadevi
    M. Balasubramanian
    Multimedia Tools and Applications, 2021, 80 : 19675 - 19693
  • [29] An Approach to Measuring the Difficulty of Learning Activities
    Gallego-Duran, Francisco J.
    Molina-Carmona, Rafael
    Llorens-Largo, Faraon
    LEARNING AND COLLABORATION TECHNOLOGIES, LCT 2016, 2016, 9753 : 417 - 428
  • [30] Software Metrics for Fault Prediction Using Machine Learning Approaches A Literature Review with PROMISE Repository Dataset
    Meiliana
    Karim, Syaeful
    Warnars, Harco Leslie Hendric Spits
    Gaol, Ford Lumban
    Abdurachman, Edi
    Soewito, Benfano
    2017 IEEE INTERNATIONAL CONFERENCE ON CYBERNETICS AND COMPUTATIONAL INTELLIGENCE (CYBERNETICSCOM), 2017, : 19 - 23