A new content-defined chunking algorithm for data deduplication in cloud storage

被引:36
|
作者
Widodo, Ryan N. S. [1 ]
Lim, Hyotaek [2 ]
Atiquzzaman, Mohammed [3 ]
机构
[1] Dongseo Univ, Dept Ubiquitous IT, Busan 617716, South Korea
[2] Dongseo Univ, Div Comp Engn, Busan 617716, South Korea
[3] Univ Oklahoma, Sch Comp Sci, Norman, OK 73019 USA
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2017年 / 71卷
基金
新加坡国家研究基金会;
关键词
Data deduplication; Cloud storage; Content-defined chunking; Hash-less chunking; Asymmetric window;
D O I
10.1016/j.future.2017.02.013
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Chunking is a process to split a file into smaller files called chunks. In some applications, such as remote data compression, data synchronization, and data deduplication, chunking is important because it determines the duplicate detection performance of the system. Content-defined chunking (CDC) is a method to split files into variable length chunks, where the cut points are defined by some internal features of the files. Unlike fixed-length chunks, variable-length chunks are more resistant to byte shifting. Thus, it increases the probability of finding duplicate chunks within a file and between files. However, CDC algorithms require additional computation to find the cut points which might be computationally expensive for some applications. In our previous work (Widodo et al., 2016), the hash-based CDC algorithm used in the system took more process time than other processes in the deduplication system. This paper proposes a high throughput hash-less chunking method called Rapid Asymmetric Maximum (RAM). Instead of using hashes, RAM uses bytes value to declare the cut points. The algorithm utilizes a fix-sized window and a variable-sized window to find a maximum-valued byte which is the cut point. The maximum-valued byte is included in the chunk and located at the boundary of the chunk. This configuration allows RAM to do fewer comparisons while retaining the CDC property. We compared RAM with existing hash-based and hash-less deduplication systems. The experimental results show that our proposed algorithm has higher throughput and bytes saved per second compared to other chunking algorithms. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:145 / 156
页数:12
相关论文
共 50 条
  • [21] Data Deduplication Using Dynamic Chunking Algorithm
    Moon, Young Chan
    Jung, Ho Min
    Yoo, Chuck
    Ko, Young Woong
    COMPUTATIONAL COLLECTIVE INTELLIGENCE - TECHNOLOGIES AND APPLICATIONS, PT II, 2012, 7654 : 59 - 68
  • [22] A Novel Chunk Coalescing Algorithm for Data Deduplication in Cloud Storage
    Luo, Siwei
    Hou, Mengshu
    2013 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2013,
  • [23] Deduplication method based on content defined pre-chunking and sliding window
    Wang, C. (wangcan1977@uestc.edu.cn), 2012, Northeast University (27):
  • [24] Byte-index Chunking Algorithm for Data Deduplication System
    Lkhagvasuren, Ider
    So, Jung Min
    Lee, Jeong Gun
    Yoo, Chuck
    Ko, Young Woong
    INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2013, 7 (05): : 415 - 424
  • [25] MII: A Novel Content Defined Chunking Algorithm for Finding Incremental Data in data Synchronization
    Zhang, Changjian
    Qi, Deyu
    Cai, Zhe
    Huang, Wenhao
    Wang, Xinyang
    Li, Wenlin
    Guo, Jing
    IEEE ACCESS, 2019, 7 : 86932 - 86945
  • [26] A Design of Parallel Content-Defined Chunking System Using Non-Hashing Algorithms on FPGA
    Hung Vuong
    Hung Nguyen
    Linh Tran
    IEEE ACCESS, 2022, 10 : 82036 - 82048
  • [27] Dynamic Data Deduplication in Cloud Storage
    Leesakul, Waraporn
    Townend, Paul
    Xu, Jie
    2014 IEEE 8TH INTERNATIONAL SYMPOSIUM ON SERVICE ORIENTED SYSTEM ENGINEERING (SOSE), 2014, : 320 - 325
  • [28] Data Deduplication Technology for Cloud Storage
    He, Qinlu
    Bian, Genqing
    Shao, Bilin
    Zhang, Weiqi
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2020, 27 (05): : 1444 - 1451
  • [29] Encrypted Data Deduplication in Cloud Storage
    Fan, Chun-I
    Huang, Shi-Yuan
    Hsu, Wen-Che
    2015 10TH ASIA JOINT CONFERENCE ON INFORMATION SECURITY (ASIAJCIS), 2015, : 18 - 25
  • [30] Double Sliding Window Chunking Algorithm for Data Deduplication in Ocean Observation
    Guo, Shuai
    Mao, Xiaodong
    Sun, Meng
    Wang, Shuang
    IEEE ACCESS, 2023, 11 : 70470 - 70481