Efficient Top-K Retrieval with Signatures

被引:5
|
作者
Chappell, Timothy [1 ]
Geva, Shlomo [1 ]
Anthony Nguyen [2 ]
Zuccon, Guido [2 ]
机构
[1] Queensland Univ Technol, Fac Sci & Technol, Brisbane, Qld 4001, Australia
[2] CSIRO, Australian E Hlth Res Ctr, Brisbane, Qld, Australia
关键词
Document Signatures; Near-Duplicate Detection; Hamming Distance; Locality-Sensitive Hashing; Nearest Neighbour; Top-K;
D O I
10.1145/2537734.2537742
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a new method of indexing and searching large binary signature collections to efficiently find similar signatures, addressing the scalability problem in signature search. Signatures offer efficient computation with acceptable measure of similarity in numerous applications. However, performing a complete search with a given search argument (a signature) requires a Hamming distance calculation against every signature in the collection. This quickly becomes excessive when dealing with large collections, presenting issues of scalability that limit their applicability. Our method efficiently finds similar signatures in very large collections, trading memory use and precision for greatly improved search speed. Experimental results demonstrate that our approach is capable of finding a set of nearest signatures to a given search argument with a high degree of speed and fidelity.
引用
收藏
页码:10 / 17
页数:8
相关论文
共 50 条
  • [41] Efficient evaluation of Top-k Skyline queries
    Departamento de Computación, Universidad Simón Bolívar, Sartenejas-Baruta, Venezuela
    Revista Tecnica de la Facultad de Ingenieria Universidad del Zulia, 2009, 2 (170-179):
  • [42] Efficient Top-k Skyline Computation in MapReduce
    Song, Baoyan
    Liu, Aili
    Ding, Linlin
    2015 12TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2015, : 67 - 70
  • [43] Efficient evaluation of Top-k Skyline queries
    Goncalves, Marlene
    Vidal, Maria-Esther
    REVISTA TECNICA DE LA FACULTAD DE INGENIERIA UNIVERSIDAD DEL ZULIA, 2009, 32 (02): : 170 - 179
  • [44] Efficient Retrieval of Matrix Factorization-Based Top-k Recommendations: A Survey of Recent Approaches
    Le, Dung D.
    Lauw, Hady W.
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2021, 70 : 1441 - 1479
  • [45] Efficient retrieval of matrix factorization-based top-k recommendations: A survey of recent approaches
    Le, Dung D.
    Lauw, Hady W.
    Journal of Artificial Intelligence Research, 2021, 70 : 1441 - 1479
  • [46] Efficient Retrieval of Top-K Most Similar Users from Travel Smart Card Data
    Zheng, Bolong
    Zheng, Kai
    Sharaf, Mohamed A.
    Zhou, Xiaofang
    Sadiq, Shazia
    2014 IEEE 15TH INTERNATIONAL CONFERENCE ON MOBILE DATA MANAGEMENT (MDM), VOL 1, 2014, : 259 - 268
  • [47] In Good Company: Efficient Retrieval of the Top-k Most Relevant Event-Partner Pairs
    Wu, Dingming
    Zhu, Yi
    Jensen, Christian S.
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT II, 2019, 11447 : 519 - 535
  • [48] Optimizing Top-k Retrieval: Submodularity Analysis and Search Strategies
    Sha, Chaofeng
    Wang, Keqiang
    Zhang, Dell
    Wang, Xiaoling
    Zhou, Aoying
    WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 18 - 29
  • [49] Optimizing top-k retrieval:submodularity analysis and search strategies
    Chaofeng SHA
    Keqiang WANG
    Dell ZHANG
    Xiaoling WANG
    Aoying ZHOU
    Frontiers of Computer Science, 2016, 10 (03) : 477 - 487
  • [50] A Top-K Retrieval algorithm based on a decomposition of ranking functions
    Madrid, Nicolas
    Rusnok, Pavel
    INFORMATION SCIENCES, 2019, 474 : 136 - 153