Measuring the prediction difficulty of individual cases in a dataset using machine learning

被引:0
|
作者
Kwon, Hyunjin [1 ,2 ]
Greenberg, Matthew [3 ]
Josephson, Colin Bruce [4 ,6 ]
Lee, Joon [2 ,5 ,6 ,7 ]
机构
[1] Univ Calgary, Schulich Sch Engn, Dept Biomed Engn, Calgary, AB, Canada
[2] Univ Calgary, Cumming Sch Med, Data Intelligence Hlth Lab, Calgary, AB, Canada
[3] Univ Calgary, Dept Math & Stat, Fac Sci, Calgary, AB, Canada
[4] Univ Calgary, Cumming Sch Med, Dept Clin Neurosci, Calgary, AB, Canada
[5] Univ Calgary, Cumming Sch Med, Dept Cardiac Sci, Calgary, AB, Canada
[6] Univ Calgary, Cumming Sch Med, Dept Community Hlth Sci, Calgary, AB, Canada
[7] Kyung Hee Univ, Sch Med, Dept Prevent Med, Seoul, South Korea
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1038/s41598-024-61284-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Different levels of prediction difficulty are one of the key factors that researchers encounter when applying machine learning to data. Although previous studies have introduced various metrics for assessing the prediction difficulty of individual cases, these metrics require specific dataset preconditions. In this paper, we propose three novel metrics for measuring the prediction difficulty of individual cases using fully-connected feedforward neural networks. The first metric is based on the complexity of the neural network needed to make a correct prediction. The second metric employs a pair of neural networks: one makes a prediction for a given case, and the other predicts whether the prediction made by the first model is likely to be correct. The third metric assesses the variability of the neural network's predictions. We investigated these metrics using a variety of datasets, visualized their values, and compared them to fifteen existing metrics from the literature. The results demonstrate that the proposed case difficulty metrics were better able to differentiate various levels of difficulty than most of the existing metrics and show constant effectiveness across diverse datasets. We expect our metrics will provide researchers with a new perspective on understanding their datasets and applying machine learning in various fields.
引用
收藏
页数:15
相关论文
共 50 条
  • [11] Prediction of soil thermal conductivity using individual and ensemble machine learning models
    Wang, Caijin
    Wu, Meng
    Cai, Guojun
    He, Huan
    Zhao, Zening
    Chang, Jianxin
    JOURNAL OF THERMAL ANALYSIS AND CALORIMETRY, 2024, 149 (11) : 5415 - 5432
  • [12] Application of machine learning algorithms in early prediction of diabetes dataset
    Dou, Yifeng
    ASIA-PACIFIC JOURNAL OF CLINICAL ONCOLOGY, 2022, 18 : 46 - 47
  • [13] Software Defect Prediction on Unlabelled Dataset with Machine Learning Techniques
    Ronchieri, Elisabetta
    Canaparo, Marco
    Belgiovine, Mauro
    Salomoni, Davide
    2019 IEEE NUCLEAR SCIENCE SYMPOSIUM AND MEDICAL IMAGING CONFERENCE (NSS/MIC), 2019,
  • [14] Using Machine Learning to Estimate Difficulty Levels of Problems
    Koshino, Makoto
    Koizumi, Takuya
    SENSORS AND MATERIALS, 2020, 32 (11) : 3559 - 3566
  • [15] Supervised Machine Learning Models for Prediction of COVID-19 Infection using Epidemiology Dataset
    Muhammad L.J.
    Algehyne E.A.
    Usman S.S.
    Ahmad A.
    Chakraborty C.
    Mohammed I.A.
    SN Computer Science, 2021, 2 (1)
  • [16] Prediction modelling of COVID using machine learning methods from B-cell dataset
    Jain, Nikita
    Jhunthra, Srishti
    Garg, Harshit
    Gupta, Vedika
    Mohan, Senthilkumar
    Ahmadian, Ali
    Salahshour, Soheil
    Ferrara, Massimiliano
    RESULTS IN PHYSICS, 2021, 21
  • [17] Wind Turbine Remaining Useful Life Prediction Using Small Dataset and Machine Learning Techniques
    Gomes, Gabriel de Souza Pereira
    Lopes, Sofia Moreira de Andrade
    Araujo, Daniel Carrijo Polonio
    Flauzino, Rogerio Andrade
    Pinto, Murilo Marques
    Alves, Marcos Eduardo Guerra
    JOURNAL OF CONTROL AUTOMATION AND ELECTRICAL SYSTEMS, 2024, 35 (02) : 337 - 345
  • [18] Improving potato leaf chlorophyll content prediction using a machine learning model with a hybrid dataset
    Yang, Haibo
    Hu, Yuncai
    Yin, Hang
    Jin, Qingyu
    Li, Fei
    Yu, Kang
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2025,
  • [19] An Efficient Technique for Disease Prediction by Using Enhanced Machine Learning Algorithms for Categorical Medical Dataset
    Anusuya, V. Veera
    Gomathi, V
    INFORMATION TECHNOLOGY AND CONTROL, 2021, 50 (01): : 102 - 122
  • [20] Machine learning-based risk prediction model for cardiovascular disease using a hybrid dataset
    Kanagarathinam, Karthick
    Sankaran, Durairaj
    Manikandan, R.
    DATA & KNOWLEDGE ENGINEERING, 2022, 140