StdSort: Efficient Pre-Processing for Faster Vector Similarity Join Using Standard Deviation

被引:0
|
作者
Kim, Hyun Joon [1 ]
Lee, Sang-goo [1 ]
机构
[1] Seoul Natl Univ, Sch Comp Sci & Engn, Seoul, South Korea
来源
ACM IMCOM 2015, Proceedings | 2015年
关键词
Vector Similarity Join; Prefix Filtering; Length Filtering; All-Pair Similarity Search; Vector Pre-Processing; Similarity Join Pre-Processing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Vector Similarity Join is an important operation that is used in duplication detection, entity resolution and other data analysis. It is an essential operation used in many fields, therefore researched extensively. In this paper we propose an efficient data pre-processing technique called StdSort. It utilizes the fact that the dimensions of vectors have different standard deviation values. Applied to the prefix and length filtering technique, StdSort method can expedite the vector similarity join process. It requires 0(n) of pre-processing time which is equal to the existing pre-processing method. Through experiments, we showed that StdSort reduces the overall time taken for similarity join operation and the number of candidates for similar pairs than existing pre-processing method.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] Subtraction CT angiography using non-rigid registration: The impact of similarity measure and image pre-processing
    Drisis, S
    Srivastava, S
    Seghers, D
    Coudyzer, W
    D'Agostino, E
    Maes, F
    Suetens, P
    Marchal, G
    CARS 2005: Computer Assisted Radiology and Surgery, 2005, 1281 : 328 - 333
  • [42] Efficient Pre-Processing for Enhanced Semantics Based Distributed Document Clustering
    Shah, Neepa
    Mahajan, Sunita
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 338 - 343
  • [43] Pre-processing Image Database for Efficient Content Based Image Retrieval
    Jenni, Kommineni
    Mandala, Satria
    2014 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2014, : 968 - 972
  • [44] Efficient Pre-processing PIR Without Public-Key Cryptography
    Ghoshal, Ashrujit
    Zhou, Mingxun
    Shi, Elaine
    ADVANCES IN CRYPTOLOGY, PT VII, EUROCRYPT 2024, 2024, 14657 : 210 - 240
  • [45] Efficient Pre-processing PIR Without Public-Key Cryptography
    Ghoshal, Ashrujit
    Zhou, Mingxun
    Shi, Elaine
    ADVANCES IN CRYPTOLOGY, PT VI, EUROCRYPT 2024, 2024, 14656 : 210 - 240
  • [46] OTIF: Efficient Tracker Pre-processing over Large Video Datasets
    Bastani, Favyen
    Madden, Samuel
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 2091 - 2104
  • [47] Efficient pre-processing in the parallel block-Jacobi SVD algorithm
    Oksa, G
    Vajtersic, M
    PARALLEL COMPUTING, 2006, 32 (02) : 166 - 176
  • [48] A Brain Computer Interface based on neural network with efficient pre-processing
    Nakayama, Kenji
    Inagaki, Kiyoto
    2006 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1 AND 2, 2006, : 616 - +
  • [49] Efficient Natural Language Pre-processing for Analyzing Large Data Sets
    Billal, Belainine
    Fonseca, Alexsandro
    Sadat, Fatiha
    2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3864 - 3871
  • [50] PRE-PROCESSING OF FIBERS FOR EFFICIENT CHARACTERIZATION BY PATTERN-RECOGNITION TECHNOLOGY
    KAYE, BH
    NAYLOR, AG
    ROBB, NI
    TIMBRELL, V
    POWDER TECHNOLOGY, 1976, 14 (01) : 189 - 190