Multidimensional Similarity Join Using MapReduce

被引:1
|
作者
Li, Ye [1 ]
Wang, Jian [1 ]
Hou, Leong U. [1 ]
机构
[1] Univ Macau, Zhuhai Res Inst, Macau, Peoples R China
来源
关键词
D O I
10.1007/978-3-319-39958-4_36
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Similarity join is arguably one of the most important operators in multidimensional data analysis tasks. However, processing a similarity join is costly especially for large volume and high dimensional data. In this work, we attempt to process the similarity join on MapReduce such that the join computation can be scaled horizontally. In order to make the workload balancing among all MapReduce nodes, we systemically select the most profitable feature based on a novel data selectivity approach. Given the selected feature, we develop the partitioning scheme for MapReduce processing based on two different optimization goals. Our proposed techniques are extensively evaluated on real datasets.
引用
收藏
页码:457 / 468
页数:12
相关论文
共 50 条
  • [21] A Set Similarity Self-Join Algorithm with FP-tree and MapReduce
    Feng Y.
    Wu K.
    Huang Z.
    Feng Y.
    Chen H.
    Bai J.
    Ming Z.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (12): : 2890 - 2906
  • [22] An efficient parallel top-k similarity join for massive multidimensional data using Spark
    Chen, Dehua
    Shen, Changgan
    Feng, Jieying
    Le, Jiajin
    International Journal of Database Theory and Application, 2015, 8 (03): : 57 - 68
  • [23] Metric Similarity Joins Using MapReduce
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (03) : 656 - 669
  • [24] Handling data skew in join algorithms using MapReduce
    Myung, Jaeseok
    Shim, Junho
    Yeon, Jongheum
    Lee, Sang-goo
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 51 : 286 - 299
  • [25] A Survey on Parallel Join Algorithms Using MapReduce on Hadoop
    Barhoush, Malek Mahmoud
    AlSobeh, Anas Mohammad
    Al Rawashdeh, Ahmad
    2019 IEEE JORDAN INTERNATIONAL JOINT CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATION TECHNOLOGY (JEEIT), 2019, : 381 - 388
  • [26] XML Structural Similarity Search Using MapReduce
    Yuan, Peisen
    Sha, Chaofeng
    Wang, Xiaoling
    Yang, Bin
    Zhou, Aoying
    Yang, Su
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2010, 6184 : 169 - +
  • [27] Detecting Text Similarity Using MapReduce Framework
    Birjali, Marouane
    Beni-Hssane, Abderrahim
    Erritali, Mohammed
    Madani, Youness
    EUROPE AND MENA COOPERATION ADVANCES IN INFORMATION AND COMMUNICATION TECHNOLOGIES, 2017, 520 : 383 - 389
  • [28] Privacy preserving similarity joins using MapReduce
    Ding, Xiaofeng
    Yang, Wanlu
    Choo, Kim-Kwang Raymond
    Wang, Xiaoli
    Jin, Hai
    INFORMATION SCIENCES, 2019, 493 : 20 - 33
  • [29] PHiDJ: Parallel Similarity Self-Join for High-Dimensional Vector Data with MapReduce
    Fries, Sergej
    Boden, Brigitte
    Stepien, Grzegorz
    Seidl, Thomas
    2014 IEEE 30TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2014, : 796 - 807
  • [30] Metric Similarity Joins Using MapReduce (Extended abstract)
    Chen, Gang
    Yang, Keyu
    Chen, Lu
    Gao, Yunjun
    Zheng, Baihua
    Chen, Chun
    2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1787 - 1788