A cluster-based data deduplication technology

被引:1
|
作者
Tseng, Chuan-Mu [1 ]
Ciou, Jheng-Rong [2 ]
Liu, Tzong-Jye [2 ]
机构
[1] Jeh Teh Jr Coll Med Nursing & Management, Dept Appl Digital Media, Miaoli, Taiwan
[2] Feng Chia Univ, Dept Informat Engn & Comp Sci, Taichung, Taiwan
关键词
Bloom filter; cluster; data deduplication;
D O I
10.1109/CANDAR.2014.22
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data deduplication technology usually identifies redundant data quickly and correctly by using bloom filter technology. A bloom filter can determine whether there is redundant data. However, there are the presences of false positives. In order to avoid false positives, we need to compare a new chunk with chunks that have been stored. In order to reduce the time to exclude the bloom filter false positives, current research uses many small size index tables to store chunk ID. However, the target chunk ID only stores in one index table. Searching for the target chunk ID at another index table uselessly took a great deal of time. In this paper, we cluster the stored chunks to reduce the time of excluding the false positive problem induced by bloom filter.
引用
收藏
页码:226 / 230
页数:5
相关论文
共 50 条
  • [41] A comparative study of cluster-based Big Data Cube implementations
    Morielo Caetano, Andre Francisco
    Hirata, Celso Massaki
    Silva, Rodrigo Rocha
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 133 : 240 - 253
  • [42] A cluster-based approach for distributed anonymisation of vertically partitioned data
    Xenakis, Antonios
    Chen, Zhiyuan
    Karabatis, George
    International Journal of Web Engineering and Technology, 2024, 19 (04) : 397 - 420
  • [43] Cluster-based stability evaluation in time series data sets
    Klassen, Gerhard
    Tatusch, Martha
    Conrad, Stefan
    APPLIED INTELLIGENCE, 2023, 53 (13) : 16606 - 16629
  • [44] Cluster-based zero-shot learning for multivariate data
    Toshitaka Hayashi
    Hamido Fujita
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 1897 - 1911
  • [45] Kernel cluster-based ensemble SVM approaches for unbalanced data
    Tao, X. (taoxinmin@hrbeu.edu.cn), 2013, Editorial Board of Journal of Harbin Engineering (34):
  • [46] Cluster-based feature extraction and data fusion in the wavelet domain
    Sveinsson, JR
    Ulfarsson, MO
    Benediktsson, JA
    IGARSS 2001: SCANNING THE PRESENT AND RESOLVING THE FUTURE, VOLS 1-7, PROCEEDINGS, 2001, : 867 - 869
  • [47] A cluster-based hybrid sampling approach for imbalanced data classification
    Feng, Shou
    Zhao, Chunhui
    Fu, Ping
    REVIEW OF SCIENTIFIC INSTRUMENTS, 2020, 91 (05):
  • [48] Cluster-based mining of microarray data in PHP/MYSQL environment
    Udoh, E.
    Bhuiyan, S.
    ADVANCES IN SYSTEMS, COMPUTING SCIENCES AND SOFTWARE ENGINEERING, 2006, : 197 - +
  • [49] A cluster-based secured data transmission protocol for efficient data gathering in WSN
    Sharmila P.
    Priyadharson A.S.M.
    International Journal of Vehicle Information and Communication Systems, 2019, 4 (04) : 316 - 330
  • [50] Cluster-Based Query Expansion
    Kalmanovich, Inna Gelfer
    Kurland, Oren
    PROCEEDINGS 32ND ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2009, : 646 - 647