Efficient Top-K Retrieval with Signatures

被引:5
|
作者
Chappell, Timothy [1 ]
Geva, Shlomo [1 ]
Anthony Nguyen [2 ]
Zuccon, Guido [2 ]
机构
[1] Queensland Univ Technol, Fac Sci & Technol, Brisbane, Qld 4001, Australia
[2] CSIRO, Australian E Hlth Res Ctr, Brisbane, Qld, Australia
关键词
Document Signatures; Near-Duplicate Detection; Hamming Distance; Locality-Sensitive Hashing; Nearest Neighbour; Top-K;
D O I
10.1145/2537734.2537742
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability. Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.
引用
收藏
页码:10 / 17
页数:8
相关论文
共 50 条
  • [1] Efficient Top-k Retrieval on Massive Data
    Han, Xixian
    Li, Jianzhong
    Gao, Hong
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (10) : 2687 - 2699
  • [2] Efficient skyline and top-k retrieval in subspaces
    Tao, Yufei
    Xiao, Xiaokui
    Pei, Jian
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (08) : 1072 - 1088
  • [3] Efficient Top-k Retrieval on Massive Data
    Han, Xixian
    Li, Jianzhong
    Gao, Hong
    2016 32ND IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2016, : 1496 - 1497
  • [4] Efficient In-Memory Top-k Document Retrieval
    Culpepper, J. Shane
    Petri, Matthias
    Scholer, Falk
    SIGIR 2012: PROCEEDINGS OF THE 35TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2012, : 225 - 234
  • [5] Efficient Retrieval of Top-k Weighted Spatial Triangles
    Taniguchi, Ryosuke
    Amagata, Daichi
    Hara, Takahiro
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2022, PT I, 2022, : 224 - 231
  • [6] Efficient Compressed Indexing for Approximate Top-k String Retrieval
    Ferrada, Hector
    Navarro, Gonzalo
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2014, 2014, 8799 : 18 - 30
  • [7] Towards Efficient Retrieval of Top-k Entities in Systems of Engagement
    Mondal, Anirban
    Padhariya, Nilesh
    Mohania, Mukesh
    WEB INFORMATION SYSTEMS ENGINEERING, WISE 2020, PT II, 2020, 12343 : 52 - 67
  • [8] Scalable Top-K Retrieval with Sparta
    Sheffi, Gali
    Basin, Dmitry
    Bortnikov, Edward
    Carmel, David
    Keidar, Idit
    PROCEEDINGS OF THE 25TH ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING (PPOPP '20), 2020, : 62 - 73
  • [9] Diversifying Top-k Service Retrieval
    Sha, Chaofeng
    Wang, Keqiang
    Zhang, Kai
    Wang, Xiaoling
    Zhou, Aoying
    2014 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2014), 2014, : 227 - 234
  • [10] Space-Efficient Framework for Top-k String Retrieval Problems
    Hon, Wing-Kai
    Shah, Rahul
    Vitter, Jeffrey Scott
    2009 50TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE: FOCS 2009, PROCEEDINGS, 2009, : 713 - 722