A cluster-based data deduplication technology

被引:1
|
作者
Tseng, Chuan-Mu [1 ]
Ciou, Jheng-Rong [2 ]
Liu, Tzong-Jye [2 ]
机构
[1] Jeh Teh Jr Coll Med Nursing & Management, Dept Appl Digital Media, Miaoli, Taiwan
[2] Feng Chia Univ, Dept Informat Engn & Comp Sci, Taichung, Taiwan
关键词
Bloom filter; cluster; data deduplication;
D O I
10.1109/CANDAR.2014.22
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data deduplication technology usually identifies redundant data quickly and correctly by using bloom filter technology. A bloom filter can determine whether there is redundant data. However, there are the presences of false positives. In order to avoid false positives, we need to compare a new chunk with chunks that have been stored. In order to reduce the time to exclude the bloom filter false positives, current research uses many small size index tables to store chunk ID. However, the target chunk ID only stores in one index table. Searching for the target chunk ID at another index table uselessly took a great deal of time. In this paper, we cluster the stored chunks to reduce the time of excluding the false positive problem induced by bloom filter.
引用
收藏
页码:226 / 230
页数:5
相关论文
共 50 条
  • [21] Preserving Privacy of Outsourced Data: A Cluster-Based Approach
    Sayi, T. J. V. R. K. M. K.
    Krishna, R. K. N. Sai
    Mukkamala, R.
    Baruah, P. K.
    2012 IEEE 13TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI), 2012, : 215 - 223
  • [22] Linguistic and Graphical Explanation of a Cluster-Based Data Structure
    Smits, Gregory
    Pivert, Olivier
    SCALABLE UNCERTAINTY MANAGEMENT (SUM 2015), 2015, 9310 : 186 - 200
  • [23] Optimizing data aggregation for cluster-based internet services
    Chu, LK
    Tang, H
    Yang, T
    Shen, K
    ACM SIGPLAN NOTICES, 2003, 38 (10) : 119 - 130
  • [24] Cluster-based sampling approaches to imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 427 - 436
  • [25] Cluster-Based Instance Selection for the Imbalanced Data Classification
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2018, PT II, 2018, 11056 : 191 - 200
  • [26] VeSCA: Vehicular Stable Cluster-based Data Aggregation
    Ucar, Seyhan
    Ergen, Sinem Coleri
    Ozkasap, Oznur
    2014 INTERNATIONAL CONFERENCE ON CONNECTED VEHICLES AND EXPO (ICCVE), 2014, : 1080 - 1085
  • [27] A Cluster-Based Data Routing for Wireless Sensor Networks
    Wang, Hao-Li
    Chao, Yu-Yang
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, PROCEEDINGS, 2009, 5574 : 129 - 136
  • [28] Data Deduplication Technology for Cloud Storage
    He, Qinlu
    Bian, Genqing
    Shao, Bilin
    Zhang, Weiqi
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2020, 27 (05): : 1444 - 1451
  • [29] Cluster-Based Boosting
    Miller, L. Dee
    Soh, Leen-Kiat
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (06) : 1491 - 1504
  • [30] Cluster-based selection
    Dunbar, JB
    PERSPECTIVES IN DRUG DISCOVERY AND DESIGN, 1997, 7-8 : 51 - 63