Measuring the prediction difficulty of individual cases in a dataset using machine learning

被引:0
|
作者
Kwon, Hyunjin [1 ,2 ]
Greenberg, Matthew [3 ]
Josephson, Colin Bruce [4 ,6 ]
Lee, Joon [2 ,5 ,6 ,7 ]
机构
[1] Univ Calgary, Schulich Sch Engn, Dept Biomed Engn, Calgary, AB, Canada
[2] Univ Calgary, Cumming Sch Med, Data Intelligence Hlth Lab, Calgary, AB, Canada
[3] Univ Calgary, Dept Math & Stat, Fac Sci, Calgary, AB, Canada
[4] Univ Calgary, Cumming Sch Med, Dept Clin Neurosci, Calgary, AB, Canada
[5] Univ Calgary, Cumming Sch Med, Dept Cardiac Sci, Calgary, AB, Canada
[6] Univ Calgary, Cumming Sch Med, Dept Community Hlth Sci, Calgary, AB, Canada
[7] Kyung Hee Univ, Sch Med, Dept Prevent Med, Seoul, South Korea
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1038/s41598-024-61284-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Different levels of prediction difficulty are one of the key factors that researchers encounter when applying machine learning to data. Although previous studies have introduced various metrics for assessing the prediction difficulty of individual cases, these metrics require specific dataset preconditions. In this paper, we propose three novel metrics for measuring the prediction difficulty of individual cases using fully-connected feedforward neural networks. The first metric is based on the complexity of the neural network needed to make a correct prediction. The second metric employs a pair of neural networks: one makes a prediction for a given case, and the other predicts whether the prediction made by the first model is likely to be correct. The third metric assesses the variability of the neural network's predictions. We investigated these metrics using a variety of datasets, visualized their values, and compared them to fifteen existing metrics from the literature. The results demonstrate that the proposed case difficulty metrics were better able to differentiate various levels of difficulty than most of the existing metrics and show constant effectiveness across diverse datasets. We expect our metrics will provide researchers with a new perspective on understanding their datasets and applying machine learning in various fields.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Measuring and Visualizing Dataset Coverage for Machine Learning
    Kuhn, D. Richard
    Raunak, M. S.
    Kacker, Raghu N.
    COMPUTER, 2025, 58 (04) : 18 - 26
  • [2] Measuring Difficulty of Learning Using Ensemble Methods
    Chen, Bowen
    Koh, Yun Sing
    Halstead, Ben
    DATA MINING, AUSDM 2022, 2022, 1741 : 28 - 42
  • [3] Water quality prediction based on sparse dataset using enhanced machine learning
    Huang, Sheng
    Xia, Jun
    Wang, Yueling
    Lei, Jiarui
    Wang, Gangsheng
    ENVIRONMENTAL SCIENCE AND ECOTECHNOLOGY, 2024, 20
  • [4] SGBBA: An Efficient Method for Prediction System in Machine Learning using Imbalance Dataset
    Islam, Saiful
    Sara, Umme
    Kawsar, Abu
    Rahman, Anichur
    Kundu, Dipanjali
    Dipta, Diganta Das
    Karim, A. N. M. Rezaul
    Hasan, Mahedi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 430 - 441
  • [5] Modified COVID-19 Indian and international dataset for automatic prediction of risk in an individual using machine learning models using a mobile APP
    Bindra, Jatin
    Ahlawat, Savita
    Javed, Mohammed
    INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2021, 9 (02) : 142 - 160
  • [6] Individual prediction for remission following electroconvulsive therapy using machine learning
    Nakajima, Kazuki
    Takamiya, Akihiro
    Kudo, Shun
    Minami, Fusaka
    Liang, Kuo-ching
    Kishimoto, Taishiro
    Kikuchi, Toshiaki
    Yamagata, Bun
    Mimura, Masaru
    Hirano, Jinichi
    JOURNAL OF ECT, 2021, 37 (03) : 212 - 212
  • [7] Using Machine Learning to Predict Chat Difficulty
    Walker, Jeremy
    Coleman, Jason
    COLLEGE & RESEARCH LIBRARIES, 2021, 82 (05): : 683 - 707
  • [8] Performance evaluation of software defect prediction with NASA dataset using machine learning techniques
    Siddiqui T.
    Mustaqeem M.
    International Journal of Information Technology, 2023, 15 (8) : 4131 - 4139
  • [9] Dissecting the Problem of Individual Home Power Consumption Prediction using Machine Learning
    Casella, Enrico
    Sudduth, Eleanor
    Silvestri, Simone
    2022 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2022), 2022, : 156 - 158
  • [10] INDIVIDUAL DIFFERENCES IN LEARNING DIFFICULTY
    Chau, Kenora
    Karavdic, Senad
    Baumann, Michele
    Chau, Nearkasen
    INPACT 2014: INTERNATIONAL PSYCHOLOGICAL APPLICATIONS CONFERENCE AND TRENDS, 2014, : 302 - 304