Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data

被引：264

作者：

Schratz, Patrick ^{[1
]}

Muenchow, Jannes ^{[1
]}

Iturritxa, Eugenia ^{[2
]}

Richter, Jakob ^{[3
]}

Brenning, Alexander ^{[1
]}

机构：

[1] GISci Grp, Dept Geog, Grietgasse 6, D-07743 Jena, Germany

[2] NEIKER, Apdo 46, Vitoria 01080, Arab, Spain

[3] TU Dortmund Univ, Dept Stat, Dortmund, Germany

来源：

ECOLOGICAL MODELLING | 2019年 / 406卷

关键词：

Spatial modeling; Machine-learning; Spatial autocorrelation; Hyperparameter tuning; Spatial cross-validation; MODEL-SELECTION; LANDSLIDE SUSCEPTIBILITY; SPECIES DISTRIBUTION; CROSS-VALIDATION; PREDICTION; AUTOCORRELATION; CLASSIFICATION; OPTIMIZATION; CLASSIFIERS; CLIMATE;

D O I：

10.1016/j.ecolmodel.2019.06.002

中图分类号：

Q14 [生态学（生物生态学）];

学科分类号：

071012 ; 0713 ;

摘要：

While the application of machine-learning algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages (such as R or Python), there are several practical challenges in the field of ecological modeling related to unbiased performance estimation. One is the influence of spatial autocorrelation in both hyperparameter tuning and performance estimation. Grouped cross-validation strategies have been proposed in recent years in environmental as well as medical contexts to reduce bias in predictive performance. In this study we show the effects of spatial autocorrelation on hyperparameter tuning and performance estimation by comparing several widely used machine-learning algorithms such as boosted regression trees (BRT), k-nearest neighbor (KNN), random forest (RF) and support vector machine (SVM) with traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM) in terms of predictive performance. Spatial and non-spatial cross-validation methods were used to evaluate model performances aiming to obtain bias-reduced performance estimates. A detailed analysis on the sensitivity of hyperparameter tuning when using different resampling methods (spatial/non-spatial) was performed. As a case study the spatial distribution of forest disease (Diplodia sapinea) in the Basque Country (Spain) was investigated using common environmental variables such as temperature, precipitation, soil and lithology as predictors. Random Forest (mean Brier score estimate of 0.166) outperformed all other methods with regard to predictive accuracy. Though the sensitivity to hyperparameter tuning differed between the ML algorithms, there were in most cases no substantial differences between spatial and non-spatial partitioning for hyperparameter tuning. However, spatial hyperparameter tuning maintains consistency with spatial estimation of classifier performance and should be favored over non-spatial hyperparameter optimization. High performance differences (up to 47%) between the bias-reduced (spatial crossvalidation) and overoptimistic (non-spatial cross-validation) cross-validation settings showed the high need to account for the influence of spatial autocorrelation. Overoptimistic performance estimates may lead to false actions in ecological decision making based on biased model predictions.

引用

页码：109 / 120

页数：12

共 50 条

[21] High Per Parameter: A Large-Scale Study of Hyperparameter Tuning for Machine Learning Algorithms
Sipper, Moshe
ALGORITHMS, 2022, 15 (09)
[22] Enabling Hyperparameter Tuning of Machine Learning Classifiers in Production
Sandha, Sandeep Singh
Aggarwal, Mohit
Saha, Swapnil Sayan
Srivastava, Mani
2021 IEEE THIRD INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2021), 2021, : 262 - 271
[23] Exploring Hyperparameter Usage and Tuning in Machine Learning Research
Simon, Sebastian
Kolyada, Nikolay
Akiki, Christopher
Potthast, Martin
Stein, Benno
Siegmund, Norbert
2023 IEEE/ACM 2ND INTERNATIONAL CONFERENCE ON AI ENGINEERING - SOFTWARE ENGINEERING FOR AI, CAIN, 2023, : 68 - 79
[24] Assessment of XCMS Optimization Methods with Machine-Learning Performance
Lassen, Johan
Nielsen, Kirstine Lykke
Johannsen, Mogens
Villesen, Palle
ANALYTICAL CHEMISTRY, 2021, 93 (40) : 13459 - 13466
[25] Ship performance monitoring using machine-learning
Gupta, Prateek
Rasheed, Adil
Steen, Sverre
OCEAN ENGINEERING, 2022, 254
[26] Interactive effects of hyperparameter optimization techniques and data characteristics on the performance of machine learning algorithms for building energy metamodeling
Si, Binghui
Ni, Zhenyu
Xu, Jiacheng
Li, Yanxia
Liu, Feng
CASE STUDIES IN THERMAL ENGINEERING, 2024, 55
[27] Accelerating Hyperparameter Tuning in Machine Learning for Alzheimer's Disease With High Performance Computing
Zhang, Fan
Petersen, Melissa
Johnson, Leigh
Hall, James
O'Bryant, Sid E.
FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4
[28] Stock market prediction based on statistical data using machine learning algorithms
Akhtar, Md. Mobin
Zamani, Abu Sarwar
Khan, Shakir
Shatat, Abdallah Saleh Ali
Dilshad, Sara
Samdani, Faizan
JOURNAL OF KING SAUD UNIVERSITY SCIENCE, 2022, 34 (04)
[29] Using sequential statistical tests for efficient hyperparameter tuning
Buczak, Philip
Groll, Andreas
Pauly, Markus
Rehof, Jakob
Horn, Daniel
ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2024, 108 (02) : 441 - 460
[30] Prediction of brain maturity in infants using machine-learning algorithms
Smyser, Christopher D.
Dosenbach, Nico U. F.
Smyser, Tara A.
Snyder, Abraham Z.
Rogers, Cynthia E.
Inder, Terrie E.
Schlaggar, Bradley L.
Neil, Jeffrey J.
NEUROIMAGE, 2016, 136 : 1 - 9

← 1 2 3 4 5 →