Bayesian and non-Bayesian regression analysis applied on wind speed data

被引:7
|
作者
Tanoe, Vincent [1 ]
Henderson, Saul [2 ]
Shahirinia, Amir [3 ]
Bina, Mohammad Tavakoli [4 ]
机构
[1] Univ Dist Columbia, Comp Sci & Engn Dept, 4200 Connecticut Ave NW, Washington, DC 20008 USA
[2] Univ Dist Columbia, Elect & Comp Engn Dept, 4200 Connecticut Ave NW, Washington, DC 20008 USA
[3] Univ Dist Columbia, Elect & Engn Dept, 4200 Connecticut Ave NW, Washington, DC 20008 USA
[4] KN Toosi Univ Technol, Tehran, Iran
基金
美国国家科学基金会;
关键词
D O I
10.1063/5.0056237
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Statistical methods are widely used to analyze the relationship between several independent variables (predictors) and a dependent variable. As wind energy rapidly becomes an important source of renewable energy, it is prudent to deeply evaluate any potential existing relationships among the data. This paper aims to apply the frequentist statistical approach, namely, non-Bayesian and the Bayesian approach, to multiple linear regression to wind speed data to investigate the differences between the two methodologies. This study uses the NREL wind speed data from fifteen different wind farms. In the proposed study, a correlation matrix was implemented to select the significantly correlated variables among all and use it as the dependent variable. This method is followed by a Random Forest machine learning technique for feature selection and considering the most important features that will be used for the Bayesian and non-Bayesian regression models. We first run a multiple linear regression (non-Bayesian regression model) in which we apply the variance inflation factor to detect any multicollinearity problem to get the fitted model. We then apply the Bayesian approach to the fitted model to analyze the relationship between the dependent and independent variables. The results from both non-Bayesian and the Bayesian approaches show close coefficients and parameters estimations. Moreover, using different wind speed data sample sizes of hourly, daily, and weekly data, we found that the daily data provide a strong coefficient estimator and the highest R-squared compared to the hourly and weekly data.
引用
收藏
页数:12
相关论文
共 50 条