A reservoir sampling algorithm with adaptive estimation of conditional expectation

被引:4
|
作者
Malbasa, Vuk [1 ]
Vucetic, Slobodan [1 ]
机构
[1] Temple Univ, Ctr Informat Sci & Technol, Dept Comp & Informat Sci, Philadelphia, PA 19122 USA
关键词
D O I
10.1109/IJCNN.2007.4371299
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Resource-constrained data mining introduces many constraints when learning from large datasets. It is often not practical or possible to keep the entire data set in main memory and often the data could be observed in a single run in the order in which they are presented. Traditional reservoir-based approaches perform well in this situation. One drawback of these approaches is that the examples not included in the final reservoir are often ignored. To remedy this situation we propose a modification to the baseline reservoir algorithm. Instead of keeping the actual target values of reservoir examples, an estimate of their conditional expectation is kept and updated online as new data are observed from the stream. The estimate is obtained by averaging target values of the similar examples. The proposed algorithm uses a paired t-test to determine the similarity threshold. Thorough evaluation on generated two dimensional data shows that the proposed algorithm is producing reservoirs with considerably reduced target noise. This property allows training of significantly improved prediction models as compared with the baseline reservoir-based approach.
引用
收藏
页码:2200 / 2204
页数:5
相关论文
共 50 条