Measuring the prediction difficulty of individual cases in a dataset using machine learning

被引：0

作者：

Kwon, Hyunjin ^{[1
,2
]}

Greenberg, Matthew ^{[3
]}

Josephson, Colin Bruce ^{[4
,6
]}

Lee, Joon ^{[2
,5
,6
,7
]}

机构：

[1] Univ Calgary, Schulich Sch Engn, Dept Biomed Engn, Calgary, AB, Canada

[2] Univ Calgary, Cumming Sch Med, Data Intelligence Hlth Lab, Calgary, AB, Canada

[3] Univ Calgary, Dept Math & Stat, Fac Sci, Calgary, AB, Canada

[4] Univ Calgary, Cumming Sch Med, Dept Clin Neurosci, Calgary, AB, Canada

[5] Univ Calgary, Cumming Sch Med, Dept Cardiac Sci, Calgary, AB, Canada

[6] Univ Calgary, Cumming Sch Med, Dept Community Hlth Sci, Calgary, AB, Canada

[7] Kyung Hee Univ, Sch Med, Dept Prevent Med, Seoul, South Korea

来源：

SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

10.1038/s41598-024-61284-z

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Different levels of prediction difficulty are one of the key factors that researchers encounter when applying machine learning to data. Although previous studies have introduced various metrics for assessing the prediction difficulty of individual cases, these metrics require specific dataset preconditions. In this paper, we propose three novel metrics for measuring the prediction difficulty of individual cases using fully-connected feedforward neural networks. The first metric is based on the complexity of the neural network needed to make a correct prediction. The second metric employs a pair of neural networks: one makes a prediction for a given case, and the other predicts whether the prediction made by the first model is likely to be correct. The third metric assesses the variability of the neural network's predictions. We investigated these metrics using a variety of datasets, visualized their values, and compared them to fifteen existing metrics from the literature. The results demonstrate that the proposed case difficulty metrics were better able to differentiate various levels of difficulty than most of the existing metrics and show constant effectiveness across diverse datasets. We expect our metrics will provide researchers with a new perspective on understanding their datasets and applying machine learning in various fields.

引用

页数：15

共 50 条

[1] Measuring and Visualizing Dataset Coverage for Machine Learning
Kuhn, D. Richard
Raunak, M. S.
Kacker, Raghu N.
COMPUTER, 2025, 58 (04) : 18 - 26
[2] Measuring Difficulty of Learning Using Ensemble Methods
Chen, Bowen
Koh, Yun Sing
Halstead, Ben
DATA MINING, AUSDM 2022, 2022, 1741 : 28 - 42
[3] Water quality prediction based on sparse dataset using enhanced machine learning
Huang, Sheng
Xia, Jun
Wang, Yueling
Lei, Jiarui
Wang, Gangsheng
ENVIRONMENTAL SCIENCE AND ECOTECHNOLOGY, 2024, 20
[4] SGBBA: An Efficient Method for Prediction System in Machine Learning using Imbalance Dataset
Islam, Saiful
Sara, Umme
Kawsar, Abu
Rahman, Anichur
Kundu, Dipanjali
Dipta, Diganta Das
Karim, A. N. M. Rezaul
Hasan, Mahedi
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 430 - 441
[5] Modified COVID-19 Indian and international dataset for automatic prediction of risk in an individual using machine learning models using a mobile APP
Bindra, Jatin
Ahlawat, Savita
Javed, Mohammed
INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2021, 9 (02) : 142 - 160
[6] Individual prediction for remission following electroconvulsive therapy using machine learning
Nakajima, Kazuki
Takamiya, Akihiro
Kudo, Shun
Minami, Fusaka
Liang, Kuo-ching
Kishimoto, Taishiro
Kikuchi, Toshiaki
Yamagata, Bun
Mimura, Masaru
Hirano, Jinichi
JOURNAL OF ECT, 2021, 37 (03) : 212 - 212
[7] Using Machine Learning to Predict Chat Difficulty
Walker, Jeremy
Coleman, Jason
COLLEGE & RESEARCH LIBRARIES, 2021, 82 (05): : 683 - 707
[8] Performance evaluation of software defect prediction with NASA dataset using machine learning techniques
Siddiqui T.
Mustaqeem M.
International Journal of Information Technology, 2023, 15 (8) : 4131 - 4139
[9] Dissecting the Problem of Individual Home Power Consumption Prediction using Machine Learning
Casella, Enrico
Sudduth, Eleanor
Silvestri, Simone
2022 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2022), 2022, : 156 - 158
[10] INDIVIDUAL DIFFERENCES IN LEARNING DIFFICULTY
Chau, Kenora
Karavdic, Senad
Baumann, Michele
Chau, Nearkasen
INPACT 2014: INTERNATIONAL PSYCHOLOGICAL APPLICATIONS CONFERENCE AND TRENDS, 2014, : 302 - 304

← 1 2 3 4 5 →