Balancing Fined-Tuned Machine Learning Models Between Continuous and Discrete Variables - A Comprehensive Analysis Using Educational Data

被引：3

作者：

Drousiotis, Efthyvoulos ^{[1
]}

Pentaliotis, Panagiotis ^{[1
]}

Shi, Lei ^{[2
]}

Cristea, Alexandra, I ^{[2
]}

机构：

[1] Univ Liverpool, Dept Elect Engn & Elect, Liverpool, Merseyside, England

[2] Univ Durham, Dept Comp Sci, Durham, England

来源：

ARTIFICIAL INTELLIGENCE IN EDUCATION, PT I | 2022年 / 13355卷

关键词：

Neural networks; Tree-based algorithms; Educational data mining; Feature engineering; MOOCs;

D O I：

10.1007/978-3-031-11644-5_21

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Along with the exponential increase of students enrolling in MOOCs [26] arises the problem of a high student dropout rate. Researchers worldwide are interested in predicting whether students will drop out of MOOCs to prevent it. This study explores and improves ways of handling notoriously challenging continuous variables datasets, to predict dropout. Importantly, we propose a fair comparison methodology: unlike prior studies and, for the first time, when comparing various models, we use algorithms with the dataset they are intended for, thus `like for like.' We use a time-series dataset with algorithms suited for time-series, and a converted discrete-variables dataset, through feature engineering, with algorithms known to handle discrete variables well. Moreover, in terms of predictive ability, we examine the importance of finding the optimal hyperparameters for our algorithms, in combination with the most effective pre-processing techniques for the data. We show that these much lighter discrete models outperform the time-series models, enabling faster training and testing. This result also holds over fine-tuning of pre-processing and hyperparameter optimisation.

引用

页码：256 / 268

页数：13

共 50 条

[1] Automatic Labelling of Clusters with Discrete and Continuous Data Using Supervised Machine Learning
de Sousa Junior, Joselito Mendes
de Sales Santos, Roney Lira
Lopes, Lucas Araujo
Machado, Vinicius Ponte
Silva, Ivan Saraiva
PROCEEDINGS OF THE 2016 35TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2016,
[2] Investigating landslide data balancing for susceptibility mapping using generative and machine learning models
Jiang, Yuhang
Wang, Wei
Zou, Lifang
Cao, Yajun
Xie, Wei-Chau
LANDSLIDES, 2025, 22 (01) : 189 - 204
[3] Using machine learning for continuous updating of meta-analysis in educational context
Chernikova, Olga
Stadler, Matthias
Melev, Ivan
Fischer, Frank
COMPUTERS IN HUMAN BEHAVIOR, 2024, 156
[4] Continuous glucose monitoring using machine learning models and IoT device data: A meta-analysis
Kapoor, Yagyesh
Hasija, Yasha
TECHNOLOGY AND HEALTH CARE, 2025, 33 (01) : 577 - 591
[5] Sentiment Analysis and Comprehensive Evaluation of Supervised Machine Learning Models Using Twitter Data on Russia–Ukraine War
Wadhwani G.K.
Varshney P.K.
Gupta A.
Kumar S.
SN Computer Science, 4 (4)
[6] Predicting student success in MOOCs: a comprehensive analysis using machine learning models
Althibyani, Hosam A.
PeerJ Computer Science, 2024, 10
[7] Predicting student success in MOOCs: a comprehensive analysis using machine learning models
Althibyani, Hosam A.
PEERJ COMPUTER SCIENCE, 2024, 10
[8] Comprehensive Analysis of Computational Models for Prediction of Anticancer Peptides Using Machine Learning and Deep Learning
Ali, Farman
Ibrahim, Nouf
Alsini, Raed
Masmoudi, Atef
Alghamdi, Wajdi
Alkhalifah, Tamim
Alturise, Fahad
ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING, 2025,
[9] Computing the Hazard Ratios Associated With Explanatory Variables Using Machine Learning Models of Survival Data
Sundrani, Sameer
Lu, James
JCO CLINICAL CANCER INFORMATICS, 2021, 5 : 364 - 378
[10] Using machine learning to select variables in data envelopment analysis: Simulations and application using electricity distribution data
Duras, Toni
Javed, Farrukh
Mansson, Kristofer
Sjolander, Paer
Soderberg, Magnus
ENERGY ECONOMICS, 2023, 120

← 1 2 3 4 5 →