A Novel Method for Disease Prediction: Hybrid of Random Forest and Multivariate Adaptive Regression Splines

被引:26
|
作者
Yao, Dengju [1 ]
Yang, Jing [1 ]
Zhan, Xiaojuan [2 ]
机构
[1] Harbin Engn Univ, Coll Comp Sci & Technol, Harbin, Heilongjiang, Peoples R China
[2] Heilongjiang Inst Technol, Dept Comp Sci & Technol, Harbin, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
data mining; medical data; random forest; multivariate adaptive regression splines;
D O I
10.4304/jcp.8.1.170-177
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Using data mining technology for disease prediction and diagnosis has become the focus of attention. Data mining technology provides an important means for extracting valuable medical rules hidden in medical data and acts as an important role in disease prediction and clinical diagnosis. This paper surveys some kind of popular data mining techniques for disease prediction and diagnosis, such as decision tree, associated rule analysis and clustering analysis. Then, a novel hybrid method of random forest and multivariate adaptive regression splines is proposed for building disease prediction model. Firstly, random forest algorithm is used to perform a preliminary screening of variables and to gain an importance ranks. Then, the new dataset selected by top-k important predictors is input into the MARS procedure, which is responsible for building interpretable models for predicting disease survivability. The capability of this combination method is evaluated using basic performance measurements (e.g., accuracy, sensitivity, and specificity) along with a 10-fold crossvalidation. Experimental results show that the proposed method provides a higher accuracy and a relatively simple model.
引用
收藏
页码:170 / 177
页数:8
相关论文
共 50 条