A machine learning approximation of the 2015 Portuguese high school student grades: A hybrid approach

被引:0
|
作者
Ricardo Costa-Mendes
Tiago Oliveira
Mauro Castelli
Frederico Cruz-Jesus
机构
[1] Universidade Nova de Lisboa,NOVA Information Management School (NOVA IMS)
来源
关键词
Machine learning; Stacking; Random forest; Support vector regression; Academic achievement; High school grades;
D O I
暂无
中图分类号
学科分类号
摘要
This article uses an anonymous 2014–15 school year dataset from the Directorate-General for Statistics of Education and Science (DGEEC) of the Portuguese Ministry of Education as a means to carry out a predictive power comparison between the classic multilinear regression model and a chosen set of machine learning algorithms. A multilinear regression model is used in parallel with random forest, support vector machine, artificial neural network and extreme gradient boosting machine stacking ensemble implementations. Designing a hybrid analysis is intended where classical statistical analysis and artificial intelligence algorithms are blended to augment the ability to retain valuable conclusions and well-supported results. The machine learning algorithms attain a higher level of predictive ability. In addition, the stacking appropriateness increases as the base learner output correlation matrix determinant increases and the random forest feature importance empirical distributions are correlated with the structure of p-values and the statistical significance test ascertains of the multiple linear model. An information system that supports the nationwide education system should be designed and further structured to collect meaningful and precise data about the full range of academic achievement antecedents. The article concludes that no evidence is found in favour of smaller classes.
引用
收藏
页码:1527 / 1547
页数:20
相关论文
共 50 条