Efficient Top-K Retrieval with Signatures

被引:5
|
作者
Chappell, Timothy [1 ]
Geva, Shlomo [1 ]
Anthony Nguyen [2 ]
Zuccon, Guido [2 ]
机构
[1] Queensland Univ Technol, Fac Sci & Technol, Brisbane, Qld 4001, Australia
[2] CSIRO, Australian E Hlth Res Ctr, Brisbane, Qld, Australia
关键词
Document Signatures; Near-Duplicate Detection; Hamming Distance; Locality-Sensitive Hashing; Nearest Neighbour; Top-K;
D O I
10.1145/2537734.2537742
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability. Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.
引用
收藏
页码:10 / 17
页数:8
相关论文
共 50 条
  • [31] Supporting efficient distributed top-k monitoring
    Deng, Bo
    Jia, Yan
    Yang, Shuqiang
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2006, 4016 : 496 - 507
  • [32] Efficient processing of distributed top-k queries
    Yu, HL
    Li, HG
    Wu, P
    Agrawal, D
    El Abbadi, A
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2005, 3588 : 65 - 74
  • [33] Efficient Processing of Top-k Joins in MapReduce
    Saouk, Mei
    Doulkeridis, Christos
    Vlachou, Akrivi
    Norvag, Kjetil
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 570 - 577
  • [34] Efficient Techniques for Crowdsourced Top-k Lists
    de Alfaro, Luca
    Polychronopoulos, Vassilis
    Polyzotis, Neoklis
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4801 - 4805
  • [35] Resource Efficient Top-K Sorter on FPGA
    He, Binhao
    Xue, Meiting
    Liu, Shubiao
    Yu, Feng
    Chen, Weijie
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2022, E105 (08)
  • [36] Efficient top-k aggregation of ranked inputs
    Mamoulis, Nikos
    Yiu, Man Lung
    Cheng, Kit Hung
    Cheung, David W.
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2007, 32 (03):
  • [37] Resource Efficient Top-K Sorter on FPGA
    He, Binhao
    Xue, Meiting
    Liu, Shubiao
    Yu, Feng
    Chen, Weijie
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2022, E105A (09) : 1372 - 1376
  • [38] Efficient Top-k Closeness Centrality Search
    Olsen, Paul W., Jr.
    Labouseur, Alan G.
    Hwang, Jeong-Hyon
    2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 196 - 207
  • [39] Efficient Top-k Queries for Orthogonal Ranges
    Rahul, Saladi
    Gupta, Prosenjit
    Janardan, Ravi
    Rajan, K. S.
    WALCOM: ALGORITHMS AND COMPUTATION, 2011, 6552 : 110 - +
  • [40] Efficient maintenance of materialized top-k views
    Yi, K
    Yu, H
    Yang, J
    Xia, GQ
    Chen, YG
    19TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2003, : 189 - 200