StdSort: Efficient Pre-Processing for Faster Vector Similarity Join Using Standard Deviation

被引:0
|
作者
Kim, Hyun Joon [1 ]
Lee, Sang-goo [1 ]
机构
[1] Seoul Natl Univ, Sch Comp Sci & Engn, Seoul, South Korea
来源
ACM IMCOM 2015, Proceedings | 2015年
关键词
Vector Similarity Join; Prefix Filtering; Length Filtering; All-Pair Similarity Search; Vector Pre-Processing; Similarity Join Pre-Processing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Vector Similarity Join is an important operation that is used in duplication detection, entity resolution and other data analysis. It is an essential operation used in many fields, therefore researched extensively. In this paper we propose an efficient data pre-processing technique called StdSort. It utilizes the fact that the dimensions of vectors have different standard deviation values. Applied to the prefix and length filtering technique, StdSort method can expedite the vector similarity join process. It requires 0(n) of pre-processing time which is equal to the existing pre-processing method. Through experiments, we showed that StdSort reduces the overall time taken for similarity join operation and the number of candidates for similar pairs than existing pre-processing method.
引用
收藏
页数:5
相关论文
共 50 条
  • [21] Measurement of Similarity Between Requirement Elicitation and Requirement Specification Using Text Pre-Processing in the Cinemaloka Application
    Pamungkas, Junifar Adam
    Priyadi, Yudi
    Alibasa, Muhammad Johan
    2022 IEEE WORLD AI IOT CONGRESS (AIIOT), 2022, : 672 - 678
  • [22] Efficient Implementation of Pre-Processing Techniques for Image Forgery Detection
    Baumy, Amira
    Abdalla, Mahmoud
    Soiliman, Naglaa. F.
    Abd El-Samie, Fathi E.
    2017 PROCEEDINGS OF THE JAPAN-AFRICA CONFERENCE ON ELECTRONICS, COMMUNICATIONS, AND COMPUTERS (JAC-ECC), 2017, : 53 - 56
  • [23] Hardness of approximating the closest vector problem with pre-processing [Extended abstract]
    Alekhnovich, M
    Khot, SA
    Kindler, G
    Vishnoi, NK
    46th Annual IEEE Symposium on Foundations of Computer Science, Proceedings, 2005, : 216 - 225
  • [24] Fast hardware for modular exponentiation with efficient exponent pre-processing
    Nedjah, Nadia
    Mourelle, Luiza de Macedo
    JOURNAL OF SYSTEMS ARCHITECTURE, 2007, 53 (2-3) : 99 - 108
  • [25] A DOA estimation pre-processing method based on steering vector transformation
    Jiang, Bai-Feng
    Lü, Xiao-De
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2012, 34 (07): : 1552 - 1557
  • [26] A Heuristic Based Pre-processing Methodology for Short Text Similarity Measures in Microblogs
    Alnajran, Noufa
    Crockett, Keeley
    McLean, David
    Latham, Annabel
    IEEE 20TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS / IEEE 16TH INTERNATIONAL CONFERENCE ON SMART CITY / IEEE 4TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND SYSTEMS (HPCC/SMARTCITY/DSS), 2018, : 1627 - 1633
  • [27] Efficient temporal join processing using indices
    Zhang, DH
    Tsotras, VJ
    Seeger, B
    18TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2002, : 103 - 113
  • [28] Efficient Join Processing Using Partial Precomputation
    Kian-Lee Tan
    Cheng Hian Goh
    Mong Li Lee
    Beng Chin Ooi
    Knowledge and Information Systems, 1999, 1 (4) : 481 - 514
  • [29] Similarity Michaelis-Menten Law Pre-processing Descriptor for Face Recognition
    Ji, Suli
    Zhang, Baochang
    Du, Dandan
    He, Biao
    Liu, Jianzhuang
    PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 1272 - 1277
  • [30] A fast similarity join algorithm using graphics processing units
    Lieberman, Michael D.
    Sankaranarayanan, Jagan
    Samet, Hanan
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1111 - +