Join processing with threshold-based filtering in MapReduce

被引:0
|
作者
Taewhi Lee
Hye-Chan Bae
Hyoung-Joo Kim
机构
[1] Electronics and Telecommunications Research Institute,BigData Software Platform Research Department
[2] Samsung Electronics Co.,Media Solution Center
[3] Ltd.,Department of Computer Science and Engineering
[4] Seoul National University,undefined
来源
关键词
Join processing; Threshold-based filtering; MapReduce; Hadoop;
D O I
暂无
中图分类号
学科分类号
摘要
Data analytics, in particular those involving heterogeneous data, often require join operations on datasets collected from different sources. MapReduce, one of the most popular frameworks for large-scale data processing, is not suited for joining multiple datasets. This is because MapReduce often produces a large number of redundant intermediate results, irrespective of the size of the joined records. Although several existing approaches attempt to reduce the number of such redundant results using Bloom filters, they may be inefficient if large portions of records are joined or the number of distinct keys is large. To alleviate this problem, we propose a join processing method with threshold-based filtering in MapReduce, called TMFR-Join, which is an abbreviation for “Threshold-based Map-Filter-Reduce Join”. TMFR-Join applies filters according to their performance, which is estimated in terms of false-positive rates. It also provides a general framework for exploiting various filtering techniques that support certain desired operations. The experimental results indicate that the performance of TMFR-Join is close to that of the better of existing join processing techniques, both with and without filters.
引用
收藏
页码:793 / 813
页数:20
相关论文
共 50 条
  • [1] Join processing with threshold-based filtering in MapReduce
    Lee, Taewhi
    Bae, Hye-Chan
    Kim, Hyoung-Joo
    JOURNAL OF SUPERCOMPUTING, 2014, 69 (02): : 793 - 813
  • [2] An effective threshold-based neighbor selection in collaborative filtering
    Kim, Taek-Hun
    Yang, Sung-Bong
    ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 712 - +
  • [3] A Threshold-Based Approach for Acoustic Signal Processing
    Chen, Dongming
    Zhu, Zhiliang
    2010 CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-5, 2010, : 1039 - 1042
  • [4] Threshold-Based Processing for Continuous Query in Wireless Sensor Networks
    Sun, Jun-Zhao
    2008 IEEE 19TH INTERNATIONAL SYMPOSIUM ON PERSONAL, INDOOR AND MOBILE RADIO COMMUNICATIONS, 2008, : 1837 - 1841
  • [5] Threshold-based declustering
    Tosun, Ali Saman
    INFORMATION SCIENCES, 2007, 177 (05) : 1309 - 1331
  • [6] Threshold-based filtering buffer management scheme in a shared buffer packet switch
    Yang, JP
    Liang, MC
    Chu, YS
    JOURNAL OF COMMUNICATIONS AND NETWORKS, 2003, 5 (01) : 82 - 89
  • [7] Threshold-based filtering buffer management scheme in a shared buffer packet switch
    Yang, Jui-Pin
    Liang, Ming-Cheng
    Chu, Yuan-Sun
    2003, Korean Institute of Communications and Information Sciences (05)
  • [8] Algorithm for processing k-nearest join based on R-tree in MapReduce
    Liu, Yi
    Jing, Ning
    Chen, Luo
    Xiong, Wei
    Ruan Jian Xue Bao/Journal of Software, 2013, 24 (08): : 1836 - 1851
  • [9] Threshold-Based Quantum Optimization
    Golden, John
    Baertschi, Andreas
    O'Malley, Daniel
    Eidenbenz, Stephan
    2021 IEEE INTERNATIONAL CONFERENCE ON QUANTUM COMPUTING AND ENGINEERING (QCE 2021) / QUANTUM WEEK 2021, 2021, : 137 - 147
  • [10] Threshold-based forward guidance
    Boneva, Lena
    Harrison, Richard
    Waldron, Matt
    JOURNAL OF ECONOMIC DYNAMICS & CONTROL, 2018, 90 : 138 - 155