StdSort: Efficient Pre-Processing for Faster Vector Similarity Join Using Standard Deviation

被引:0
|
作者
Kim, Hyun Joon [1 ]
Lee, Sang-goo [1 ]
机构
[1] Seoul Natl Univ, Sch Comp Sci & Engn, Seoul, South Korea
来源
ACM IMCOM 2015, Proceedings | 2015年
关键词
Vector Similarity Join; Prefix Filtering; Length Filtering; All-Pair Similarity Search; Vector Pre-Processing; Similarity Join Pre-Processing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Vector Similarity Join is an important operation that is used in duplication detection, entity resolution and other data analysis. It is an essential operation used in many fields, therefore researched extensively. In this paper we propose an efficient data pre-processing technique called StdSort. It utilizes the fact that the dimensions of vectors have different standard deviation values. Applied to the prefix and length filtering technique, StdSort method can expedite the vector similarity join process. It requires 0(n) of pre-processing time which is equal to the existing pre-processing method. Through experiments, we showed that StdSort reduces the overall time taken for similarity join operation and the number of candidates for similar pairs than existing pre-processing method.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] The recursive combination filter approach of pre-processing for the estimation of standard deviation of RR series
    Mishra, Alok
    Swati, D.
    AUSTRALASIAN PHYSICAL & ENGINEERING SCIENCES IN MEDICINE, 2015, 38 (03) : 413 - 423
  • [2] The recursive combination filter approach of pre-processing for the estimation of standard deviation of RR series
    Alok Mishra
    D. Swati
    Australasian Physical & Engineering Sciences in Medicine, 2015, 38 : 413 - 423
  • [3] Efficient and Scalable Processing of String Similarity Join
    Rong, Chuitian
    Lu, Wei
    Wang, Xiaoli
    Du, Xiaoyong
    Chen, Yueguo
    Tung, Anthony K. H.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (10) : 2217 - 2230
  • [4] Improved Segmentation of Cardiac MRI Using Efficient Pre-Processing Techniques
    Joshi, Nikita
    Jain, Sarika
    JOURNAL OF INFORMATION TECHNOLOGY RESEARCH, 2022, 15 (01)
  • [5] HARDNESS OF APPROXIMATING THE CLOSEST VECTOR PROBLEM WITH PRE-PROCESSING
    Alekhnovich, Mikhail
    Khot, Subhash A.
    Kindler, Guy
    Vishnoi, Nisheeth K.
    COMPUTATIONAL COMPLEXITY, 2011, 20 (04) : 741 - 753
  • [6] Hardness of Approximating the Closest Vector Problem with Pre-Processing
    Mikhail Alekhnovich
    Subhash A. Khot
    Guy Kindler
    Nisheeth K. Vishnoi
    computational complexity, 2011, 20 : 741 - 753
  • [7] Pre-processing using topographic mappings
    Wu, Y
    Fyfe, C
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 1881 - 1884
  • [8] Speech enhancement using pre-processing
    Singh, L
    Sridharan, S
    IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 755 - 758
  • [9] SentReP: Sentiment Classification of Movie Reviews using Efficient Repetitive Pre-Processing
    Manek, Asha S.
    Pallavi, R. P.
    Bhat, Veena H.
    Shenoy, P. Deepa
    Mohan, M. Chandra
    Venugopal, K. R.
    Patnaik, L. M.
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [10] Eco efficient optimization of pre-processing and metal smelting
    van Heukelem, AMH
    Reuter, MA
    Huisman, J
    Hagelüken, C
    Brusselaers, J
    Electronics Goes Green 2004 (Plus): Driving Forces for Future Electronics, Proceedings, 2004, : 657 - 661