Study and implementation of frequent sequences mining based prefetching algorithm

被引:0
|
作者
Wang F. [1 ,2 ]
Wang P. [2 ]
Zhu C. [1 ]
机构
[1] Wuhan National Laboratory for Optoelectronics (Huazhong University of Science and Technology), Wuhan
[2] School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan
关键词
Frequent sequences mining; Multistep matching; Prefetching algorithm; Subtree partitioning; Trie tree;
D O I
10.7544/issn1000-1239.2016.20148040
中图分类号
学科分类号
摘要
Prefetching technology is widely used as an efficient means to improve the performance of storage systems. However, traditional prefetching algorithms are mostly based on detecting sequential access features, which makes them hard to work in the environment with less or no sequential access features. What's worse, the storage system may even suffer from negative effects with poor prefetching accuracy. Whereas the proposed prefetching algorithm based on frequent sequences mining can make some contributions to the storage system in such environment by analyzing the behavior of the data accessing to find the potential rules. Meanwhile, in some application scenarios where the cache capacity may be limited, such as the embedded system, the proposed prefetching algorithm improves the prefetching accuracy to avoid some adverse impacts which may be caused by prefetching. The new proposed prefetching algorithm is based on the frequent sequences mining technology, and the prefetching rules derived from the mined frequent sequences are organized in a Trie tree. To improve the accuracy of the prefetching, the multistep matching technology and the subtree partitioning technology are introduced, which can subtly control the using of prefetching rules, so that the prefetching algorithm with relatively high prefetching accuracy can efficiently improve the performance of the storage system. © 2016, Science Press. All right reserved.
引用
收藏
页码:443 / 448
页数:5
相关论文
共 9 条
  • [1] Agrawal R., Imielinski T., Swami A., Mining association rules between sets of items in large databases, Proc of the 19th ACM SIGMOD Int Conf on Management of Data, pp. 207-216, (1993)
  • [2] Agrawal R., Srikant R., Fast algorithms for mining association rules, Proc of the 20th Int Conf on Very Large Data Bases, pp. 487-499, (1994)
  • [3] Pei J., Han J., Mortazavi-Asl B., Et al., PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth, Proc of the 17th Int Conf on Data Engineering, pp. 215-224, (2001)
  • [4] Yan X., Han J., Afshar R., CloSpan: Mining closed sequential patterns in large datasets, Proc of the 3rd SIAM Int Conf on Data Mining, pp. 166-177, (2003)
  • [5] Wu F., Prefetching algorithm in Linux kernel, (2008)
  • [6] Ding X., Jiang S., Chen F., Et al., DiskSeen: Exploiting disk layout and access history to enhance I/O prefetch, Proc of the 2007 USENIX Annual Technical Conf., pp. 261-274, (2007)
  • [7] Li Z., Chen Z., Sudarshan M., Et al., C-Miner: Mining block correlations in storage systems, Proc of the 3rd USENIX Conf on File and Storage Technologies, pp. 173-186, (2004)
  • [8] Avichai G., Dan P., Eran R., Et al., Using machine learning technique to enhance the performance of automatic backup and recovery system, Proc of the 3rd Annual Haifa Experimental Systems Conf., pp. 1-10, (2010)
  • [9] Li Z., Lu S., Myagmar S., Et al., CP-Miner: Finding copy-paste and related bugs in large-scale software code, IEEE Trans on Software Engineering, 32, 3, pp. 176-192, (2006)