Join processing with threshold-based filtering in MapReduce

被引:0
|
作者
Taewhi Lee
Hye-Chan Bae
Hyoung-Joo Kim
机构
[1] Electronics and Telecommunications Research Institute,BigData Software Platform Research Department
[2] Samsung Electronics Co.,Media Solution Center
[3] Ltd.,Department of Computer Science and Engineering
[4] Seoul National University,undefined
来源
关键词
Join processing; Threshold-based filtering; MapReduce; Hadoop;
D O I
暂无
中图分类号
学科分类号
摘要
Data analytics, in particular those involving heterogeneous data, often require join operations on datasets collected from different sources. MapReduce, one of the most popular frameworks for large-scale data processing, is not suited for joining multiple datasets. This is because MapReduce often produces a large number of redundant intermediate results, irrespective of the size of the joined records. Although several existing approaches attempt to reduce the number of such redundant results using Bloom filters, they may be inefficient if large portions of records are joined or the number of distinct keys is large. To alleviate this problem, we propose a join processing method with threshold-based filtering in MapReduce, called TMFR-Join, which is an abbreviation for “Threshold-based Map-Filter-Reduce Join”. TMFR-Join applies filters according to their performance, which is estimated in terms of false-positive rates. It also provides a general framework for exploiting various filtering techniques that support certain desired operations. The experimental results indicate that the performance of TMFR-Join is close to that of the better of existing join processing techniques, both with and without filters.
引用
收藏
页码:793 / 813
页数:20
相关论文
共 50 条
  • [21] Analysis of a threshold-based priority queue
    Bruneel, Herwig
    QUEUEING SYSTEMS, 2025, 109 (01)
  • [22] Efficient Snapshot KNN Join Processing for Large Data Using MapReduce
    Hu, Yupeng
    Yang, Chong
    Ji, Cun
    Xu, Yang
    Li, Xueqing
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 713 - 720
  • [23] A Density-Aware Similarity Join Query Processing Algorithm on MapReduce
    Jang, Miyoung
    Song, Youngho
    Chang, Jae-Woo
    ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING: FUTURETECH & MUE, 2016, 393 : 469 - 475
  • [24] Adaptive threshold-based admission control
    Sandström, H
    Bodin, U
    Schelén, O
    ICC 2005: IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, VOLS 1-5, 2005, : 48 - 52
  • [25] A Threshold-based Improved Algorithm of PTS
    Zhang, Hua-wei
    Li, Nan
    ADVANCES IN COMPUTING, CONTROL AND INDUSTRIAL ENGINEERING, 2012, 235 : 53 - 57
  • [26] An Efficient MapReduce-Based Parallel Processing Framework for User-Based Collaborative Filtering
    Jeong, Hanjo
    Cha, Kyung Jin
    SYMMETRY-BASEL, 2019, 11 (06):
  • [27] A THRESHOLD-BASED CONTROLLER FOR MULTIAGENT SYSTEMS
    Ogunnusi, Olumide Simeon
    Abd Razak, Shukor
    Abdullah, Abdul Hanan
    JURNAL TEKNOLOGI, 2015, 77 (18): : 37 - 42
  • [28] Threshold-based Naive Bayes classifier
    Romano, Maurizio
    Contu, Giulia
    Mola, Francesco
    Conversano, Claudio
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024, 18 (02) : 325 - 361
  • [29] Threshold-Based Parallel Multiuser Scheduling
    Nam, Sung Sik
    Alouini, Mohamed-Slim
    Yang, Hong-Chuan
    Qaraqe, Khalid A.
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2009, 8 (04) : 2150 - 2159
  • [30] A Scalable Similarity Join Algorithm Based on MapReduce and LSH
    Sébastien Rivault
    Mostafa Bamha
    Sébastien Limet
    Sophie Robert
    International Journal of Parallel Programming, 2022, 50 : 360 - 380