StdSort: Efficient Pre-Processing for Faster Vector Similarity Join Using Standard Deviation

被引:0
|
作者
Kim, Hyun Joon [1 ]
Lee, Sang-goo [1 ]
机构
[1] Seoul Natl Univ, Sch Comp Sci & Engn, Seoul, South Korea
来源
ACM IMCOM 2015, Proceedings | 2015年
关键词
Vector Similarity Join; Prefix Filtering; Length Filtering; All-Pair Similarity Search; Vector Pre-Processing; Similarity Join Pre-Processing;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Vector Similarity Join is an important operation that is used in duplication detection, entity resolution and other data analysis. It is an essential operation used in many fields, therefore researched extensively. In this paper we propose an efficient data pre-processing technique called StdSort. It utilizes the fact that the dimensions of vectors have different standard deviation values. Applied to the prefix and length filtering technique, StdSort method can expedite the vector similarity join process. It requires 0(n) of pre-processing time which is equal to the existing pre-processing method. Through experiments, we showed that StdSort reduces the overall time taken for similarity join operation and the number of candidates for similar pairs than existing pre-processing method.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Efficient pre-processing for large window-based modular exponentiation using genetic algorithms
    Nedjah, N
    Mourelle, LD
    DEVELOPMENTS IN APPLIED ARTIFICIAL INTELLIGENCE, 2003, 2718 : 625 - 635
  • [32] Efficient batch similarity join processing of social images based on arbitrary features
    Yi Zhuang
    Nan Jiang
    Zhi-Ang Wu
    Jie Cao
    Chunhua Ju
    World Wide Web, 2016, 19 : 725 - 753
  • [33] Efficient Dengue Spread Prediction Using Machine Learning Models with Various Pre-processing Techniques
    Saraswathi, K.
    Rohini, K.
    2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [34] Pre-processing of MR Images for Efficient Quantitative Image Analysis using Deep Learning Techniques
    Poornachandra, S.
    Naveena, C.
    2017 INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN ELECTRONICS AND COMMUNICATION TECHNOLOGY (ICRAECT), 2017, : 191 - 195
  • [35] Efficient pre-processing for large window-based modular exponentiation using ant colony
    Nedjah, N
    Mourelle, LD
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 4, PROCEEDINGS, 2005, 3684 : 640 - 646
  • [36] Pre-processing for segmentation using independent component analysis
    Nakai, T
    Muraki, S
    Isoda, H
    Takehara, Y
    Sakahara, H
    Matsuo, K
    Kato, C
    Miki, Y
    NEUROIMAGE, 2001, 13 (06) : S207 - S207
  • [37] Research on digital watermark using pre-processing technology
    Ru, Guo-Bao
    Niu, Hui-Fang
    Yang, Rui
    Sun, Hong
    Shi, Hong-Ling
    Huang, Tian-Xi
    2003, Wuhan University (08):
  • [38] Efficient Spatio-textual Similarity Join Using MapReduce
    Zhang, Yu
    Ma, Youzhong
    Meng, Xiaofeng
    2014 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2014, : 52 - 59
  • [39] Research on Digital Watermark Using Pre-Processing Technology
    Ru Guo\|bao 1
    2. Mathematics Department
    Wuhan University Journal of Natural Sciences, 2003, (03) : 842 - 846
  • [40] Efficient batch similarity join processing of social images based on arbitrary features
    Zhuang, Yi
    Jiang, Nan
    Wu, Zhi-Ang
    Cao, Jie
    Ju, Chunhua
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2016, 19 (04): : 725 - 753