Cross-validation Revisited

被引:20
|
作者
Dutta, Santanu [1 ]
机构
[1] Tezpur Univ, Dept Math Sci, Tezpur, Assam, India
关键词
Density estimation; Least-squares cross-validation; Pseudo-likelihood; 62G07; KERNEL DENSITY-ESTIMATION; BANDWIDTH SELECTION; CONVERGENCE;
D O I
10.1080/03610918.2013.862275
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Data-based choice of the bandwidth is an important problem in kernel density estimation. The pseudo-likelihood and the least-squares cross-validation bandwidth selectors are well known, but widely criticized in the literature. For heavy-tailed distributions, the L-1 distance between the pseudo-likelihood-based estimator and the density does not seem to converge in probability to zero with increasing sample size. Even for normal-tailed densities, the rate of L-1 convergence is disappointingly slow. In this article, we report an interesting finding that with minor modifications both the cross-validation methods can be implemented effectively, even for heavy-tailed densities. For both these estimators, the L-1 distance (from the density) are shown to converge completely to zero irrespective of the tail of the density. The expected L-1 distance also goes to zero. These results hold even in the presence of a strongly mixing-type dependence. Monte Carlo simulations and analysis of the Old Faithful geyser data suggest that if implemented appropriately, contrary to the traditional belief, the cross-validation estimators compare well with the sophisticated plug-in and bootstrap-based estimators.
引用
收藏
页码:472 / 490
页数:19
相关论文
共 50 条