Prediction of hepatitis E using machine learning models

被引：25

作者：

Guo, Yanhui ^{[1
]}

Feng, Yi ^{[2
,3
]}

Qu, Fuli ^{[1
]}

Zhang, Li ^{[2
,3
]}

Yan, Bingyu ^{[2
,3
]}

Lv, Jingjing ^{[2
,3
]}

机构：

[1] Shandong Womens Univ, Sch Data & Comp Sci, Jinan, Shandong, Peoples R China

[2] Shandong Ctr Dis Control & Prevent, Shandong Prov Key Lab Infect Dis Control & Preven, Jinan, Shandong, Peoples R China

[3] Shandong Univ, Acad Prevent Med, Jinan, Shandong, Peoples R China

来源：

PLOS ONE | 2020年 / 15卷 / 09期

关键词：

D O I：

10.1371/journal.pone.0237750

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Background Accurate and reliable predictions of infectious disease can be valuable to public health organizations that plan interventions to decrease or prevent disease transmission. A great variety of models have been developed for this task. However, for different data series, the performance of these models varies. Hepatitis E, as an acute liver disease, has been a major public health problem. Which model is more appropriate for predicting the incidence of hepatitis E? In this paper, three different methods are used and the performance of the three methods is compared. Methods Autoregressive integrated moving average(ARIMA), support vector machine(SVM) and long short-term memory(LSTM) recurrent neural network were adopted and compared. ARIMA was implemented by python with the help of statsmodels. SVM was accomplished by matlab with libSVM library. LSTM was designed by ourselves with Keras, a deep learning library. To tackle the problem of overfitting caused by limited training samples, we adopted dropout and regularization strategies in our LSTM model. Experimental data were obtained from the monthly incidence and cases number of hepatitis E from January 2005 to December 2017 in Shandong province, China. We selected data from July 2015 to December 2017 to validate the models, and the rest was taken as training set. Three metrics were applied to compare the performance of models, including root mean square error(RMSE), mean absolute percentage error(MAPE) and mean absolute error(MAE). Results By analyzing data, we tookARIMA(1, 1, 1),ARIMA(3, 1, 2) as monthly incidence prediction model and cases number prediction model, respectively. Cross-validation and grid search were used to optimize parameters of SVM. Penalty coefficientCand kernel function parametergwere set 8, 0.125 for incidence prediction, and 22, 0.01 for cases number prediction. LSTM has 4 nodes. Dropout and L2 regularization parameters were set 0.15, 0.001, respectively. By the metrics of RMSE, we obtained 0.022, 0.0204, 0.01 for incidence prediction, using ARIMA, SVM and LSTM. And we obtained 22.25, 20.0368, 11.75 for cases number prediction, using three models. For MAPE metrics, the results were 23.5%, 21.7%, 15.08%, and 23.6%, 21.44%, 13.6%, for incidence prediction and cases number prediction, respectively. For MAE metrics, the results were 0.018, 0.0167, 0.011 and 18.003, 16.5815, 9.984, for incidence prediction and cases number prediction, respectively. Conclusions Comparing ARIMA, SVM and LSTM, we found that nonlinear models(SVM, LSTM) outperform linear models(ARIMA). LSTM obtained the best performance in all three metrics of RSME, MAPE, MAE. Hence, LSTM is the most suitable for predicting hepatitis E monthly incidence and cases number.

引用

页数：12

共 50 条

[41] USING MACHINE LEARNING TO DEVELOP MODELS FOR THE PREDICTION OF UPPER GASTROINTESTINAL CANCERS
Ho, Kai Man Alexander
Rosenfeld, Avi
Hogan, Aine
McBain, Hazel
Duku, Margaret
Wolfson, Paul
Wilson, Ashley
Lovat, Laurence
GUT, 2022, 71 : A3 - A3
[42] Gold and Silver Price Prediction using Hybrid Machine Learning Models
Goel, Sakshi
Saxena, Merry
Sarangi, Pradeepta Kumar
Rani, Lekha
PDGC 2022 - 2022 7th International Conference on Parallel, Distributed and Grid Computing, 2022, : 390 - 395
[43] Analysis of breast cancer prediction and visualisation using machine learning models
Magesh G.
Swarnalatha P.
International Journal of Cloud Computing, 2022, 11 (01) : 43 - 60
[44] Protein structure prediction (RMSD ≤ 5 Å) using machine learning models
Pathak, Yadunath
Rana, Prashant Singh
Singh, P. K.
Saraswat, Mukesh
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2016, 14 (01) : 71 - 85
[45] Prediction of hospital readmission of multimorbid patients using machine learning models
Le Lay, Jules
Alfonso-Lizarazo, Edgar
Augusto, Vincent
Bongue, Bienvenu
Masmoudi, Malek
Xie, Xiaolan
Gramont, Baptiste
Celarier, Thomas
PLOS ONE, 2022, 17 (12):
[46] Prediction of magnetic nature of oxide compositions by using machine learning models
Siddique, Abu Bakar
Ali, Nasir
Hamraz, Muhammad
Khan, Saadut Ullah
Khattak, Shaukat Ali
COMPUTATIONAL CONDENSED MATTER, 2024, 40
[47] Landslide Classification and Prediction of Debris Flow Using Machine Learning Models
Shameem Ansar, A.
Sudha, S.
Vinayagamoorthi, Savita
Menachery, Michelle Marianne
Francis, Suresh
IETE JOURNAL OF RESEARCH, 2024, 70 (04) : 3763 - 3779
[48] Prediction of baking quality using machine learning based intelligent models
Hilal Isleroglu
Selami Beyhan
Heat and Mass Transfer, 2020, 56 : 2045 - 2055
[49] Effect of Climate on Photovoltaic Yield Prediction Using Machine Learning Models
Alcaniz, Alba
Lindfors, Anders, V
Zeman, Miro
Ziar, Hesan
Isabella, Olindo
GLOBAL CHALLENGES, 2023, 7 (01)
[50] Sustainable Stock Market Prediction Framework Using Machine Learning Models
Garcia Penalvo, Francisco Jose
Maan, Tamanna
Singh, Sunil K.
Kumar, Sudhakar
Arya, Varsha
Chui, Kwok Tai
Singh, Gaurav Pratap
INTERNATIONAL JOURNAL OF SOFTWARE SCIENCE AND COMPUTATIONAL INTELLIGENCE-IJSSCI, 2022, 14 (01):

← 1 2 3 4 5 →