Investigating Bloom Filters for Web Archives' Holdings

被引:2
|
作者
Klein, Martin [1 ]
Balakireva, Lyudmila [1 ]
Holub, Karolina [2 ]
Celjak, Drazenko [3 ]
Rudomino, Ingeborg [2 ]
机构
[1] Los Alamos Natl Lab, Los Alamos, NM 87545 USA
[2] Natl & Univ Lib Zagreb, Zagreb, Croatia
[3] Univ Zagreb Univ, Comp Ctr, Zagreb, Croatia
关键词
bloom filters; web archives; web archive profiling; index sharing;
D O I
10.1145/3529372.3530934
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
What web archives hold is often opaque to the public and even experts in the domain struggle to provide precise assessments. Given the increasing need for and use of crawled and archived web resources, discovery of individual records as well as sharing of entire holdings are pressing use cases. We investigate Bloom Filters (BFs) and their applicability to address these use cases. We experiment with and analyze parameters for their creation, measure their performance, outline an approach for scalability, and describe various pilot implementations that showcase their potential to meet our needs. BFs come with beneficial characteristics and hence have enjoyed popularity in various domains. We highlight their suitability for web archiving use cases and how they can contribute to very fast and accurate search services.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Bloom Filters in Adversarial Environments
    Naor, Moni
    Yogev, Eylon
    ADVANCES IN CRYPTOLOGY, PT II, 2015, 9216 : 565 - 584
  • [32] A Case for Partitioned Bloom Filters
    Almeida, Paulo Sergio
    IEEE TRANSACTIONS ON COMPUTERS, 2023, 72 (06) : 1681 - 1691
  • [33] Reducing the Number of Bloom Filters
    Gong, Qingge
    Yang, Tong
    Tong, Hongwei
    Shi, Kai
    Li, Jinghui
    Wu, Xianyan
    PROCEEDINGS OF 2014 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2014, : 572 - 576
  • [34] Algebraic operations on Bloom filters
    School of Computer and Communication, Hunan University, Changsha 410082, China
    不详
    不详
    不详
    Tien Tzu Hsueh Pao, 2008, 5 (869-874):
  • [35] Sears Holdings Archives. Sears, Roebuck and Co
    Tyson, Amy M.
    PUBLIC HISTORIAN, 2011, 33 (04): : 110 - 113
  • [36] PASSWORD GENERATION BY BLOOM FILTERS
    STALLINGS, W
    DR DOBBS JOURNAL, 1994, 19 (08): : 119 - &
  • [37] Fast Bloom Filters and Their Generalization
    Qiao, Yan
    Li, Tao
    Chen, Shigang
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (01) : 93 - 103
  • [38] Bloofi: Multidimensional Bloom filters
    Crainiceanu, Adina
    Lemire, Daniel
    INFORMATION SYSTEMS, 2015, 54 : 311 - 324
  • [39] NATURES FILTERS SET TO BLOOM
    OTTEWELL, S
    CHEMICAL ENGINEER-LONDON, 1993, (552): : 16 - 16
  • [40] Bloom filters in probabilistic verification
    Dillinger, PC
    Manolios, P
    FORMAL METHODS IN COMPUTER-AIDED DESIGN, PROCEEDINGS, 2004, 3312 : 367 - 381