Join processing with threshold-based filtering in MapReduce

被引:0
|
作者
Taewhi Lee
Hye-Chan Bae
Hyoung-Joo Kim
机构
[1] Electronics and Telecommunications Research Institute,BigData Software Platform Research Department
[2] Samsung Electronics Co.,Media Solution Center
[3] Ltd.,Department of Computer Science and Engineering
[4] Seoul National University,undefined
来源
关键词
Join processing; Threshold-based filtering; MapReduce; Hadoop;
D O I
暂无
中图分类号
学科分类号
摘要
Data analytics, in particular those involving heterogeneous data, often require join operations on datasets collected from different sources. MapReduce, one of the most popular frameworks for large-scale data processing, is not suited for joining multiple datasets. This is because MapReduce often produces a large number of redundant intermediate results, irrespective of the size of the joined records. Although several existing approaches attempt to reduce the number of such redundant results using Bloom filters, they may be inefficient if large portions of records are joined or the number of distinct keys is large. To alleviate this problem, we propose a join processing method with threshold-based filtering in MapReduce, called TMFR-Join, which is an abbreviation for “Threshold-based Map-Filter-Reduce Join”. TMFR-Join applies filters according to their performance, which is estimated in terms of false-positive rates. It also provides a general framework for exploiting various filtering techniques that support certain desired operations. The experimental results indicate that the performance of TMFR-Join is close to that of the better of existing join processing techniques, both with and without filters.
引用
收藏
页码:793 / 813
页数:20
相关论文
共 50 条
  • [31] Optimizations for filter-based join algorithms in MapReduce
    Rababa, Salahaldeen
    Al-Badarneh, Amer
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (05) : 8963 - 8980
  • [32] A Scalable Similarity Join Algorithm Based on MapReduce and LSH
    Rivault, Sebastien
    Bamha, Mostafa
    Limet, Sebastien
    Robert, Sophie
    INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2022, 50 (3-4) : 360 - 380
  • [33] SigMR: MapReduce-based SPARQL query processing by signature encoding and multi-way join
    Ahn, Jinhyun
    Im, Dong-Hyuk
    Kim, Hong-Gee
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (10): : 3695 - 3725
  • [34] SigMR: MapReduce-based SPARQL query processing by signature encoding and multi-way join
    Jinhyun Ahn
    Dong-Hyuk Im
    Hong-Gee Kim
    The Journal of Supercomputing, 2015, 71 : 3695 - 3725
  • [35] Threshold-based belief change: Rankings and semiorders
    Raidl, Eric
    Rott, Hans
    AUSTRALASIAN JOURNAL OF LOGIC, 2023, 20 (03) : 429 - 477
  • [36] Threshold-Based Hybrid Relay Selection Scheme
    Song, Xin
    Zhang, MingLei
    Liu, WenMin
    Liu, Feng
    PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 222 - 227
  • [37] Threshold-Based Relaying in Coded Cooperative Networks
    Al-Habian, Ghaleb
    Ghrayeb, Ali
    Hasna, Mazen
    Abu-Dayya, Adnan
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2011, 60 (01) : 123 - 135
  • [38] Threshold-based selection combining for relay networks
    Nagarajan, V.
    Balasaraswathi, M.
    COMPUTERS & ELECTRICAL ENGINEERING, 2017, 60 : 129 - 139
  • [39] Threshold-based epidemic dynamics in systems with memory
    Bodych, Marcin
    Ganguly, Niloy
    Krueger, Tyll
    Mukherjee, Animesh
    Siegmund-Schultze, Rainer
    Sikdar, Sandipan
    EPL, 2016, 116 (04)
  • [40] Analysis of Threshold-Based Selection Diversity Receivers
    Bithas, Petros S.
    Rontogiannis, Athanasios A.
    2014 IEEE 80TH VEHICULAR TECHNOLOGY CONFERENCE (VTC FALL), 2014,