ObjDedup: High-Throughput Object Storage Layer for Backup Systems With Block-Level Deduplication

被引:3
|
作者
Jackowski, Andrzej [1 ]
Slusarczyk, Lukasz [1 ]
Lichota, Krzysztof [1 ]
Welnicki, Michal [1 ]
Wijata, Rafal [1 ]
Kielar, Mateusz [1 ]
Kopec, Tadeusz [1 ]
Dubnicki, Cezary [1 ]
Iwanicki, Konrad [2 ]
机构
[1] LLC 9LivesData, PL-02796 Warsaw, Poland
[2] Univ Warsaw, Fac Math Informat & Mech, PL-00927 Warsaw, Poland
关键词
Metadata; Engines; Throughput; Object recognition; Cloud computing; Aerospace electronics; Quality of service; Backup storage; deduplication; object storage; secondary storage;
D O I
10.1109/TPDS.2023.3250501
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The immense popularity of object storage is also affecting the market of backup. Not only have novel backup solutions emerged that utilize cloud-based object storage as backends, but also support for object storage interfaces is increasingly expected from traditional dedicated backup appliances. This latter trend especially concerns systems with data deduplication, as they can offer compelling gains in storage capacity and throughput. However, such systems have been designed for interfaces and workloads that are markedly different from those encountered in object storage. Notably, they expect data to be written in portions that are orders of magnitude longer than those in the novel object-storage-oriented backup applications. In this light, we contribute twofold. First, contrasting the properties of object storage interfaces with usage patterns from 686 commercial deployments of backup appliances, we identify specific issues an implementation of such an interface has to address to offer adequate performance in a backup system with block-level deduplication. In particular, we show that a major challenge is efficient metadata management. Second, we present distributed data structures and algorithms to handle object metadata in backup systems with block-level deduplication. Subsequently, we implement them as an object storage layer for our HYDRAstor backup system. In comparison to object storage without in-line deduplication, our solution achieves 1.8-3.93x higher write throughput. Compared to object storage on top of a state-of-the-art file-based backup system, it processes 5.26-11.34x more object put operations per time unit.
引用
收藏
页码:2180 / 2197
页数:18
相关论文
共 30 条
  • [1] Updatable block-level deduplication of encrypted data with efficient auditing in cloud storage
    Dang Qianlong
    Xie Ying
    Li Donghao
    Hu Gongcheng
    The Journal of China Universities of Posts and Telecommunications, 2019, 26 (03) : 56 - 72
  • [2] Updatable block-level deduplication of encrypted data with efficient auditing in cloud storage
    Qianlong D.
    Ying X.
    Donghao L.
    Gongcheng H.
    Journal of China Universities of Posts and Telecommunications, 2019, 26 (03): : 56 - 72
  • [3] Privacy-preserving and Updatable Block-level Data Deduplication in Cloud Storage Services
    Shin, Hyungjune
    Koo, Dongyoung
    Shin, Youngjoo
    Hur, Junbeom
    PROCEEDINGS 2018 IEEE 11TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING (CLOUD), 2018, : 392 - 400
  • [4] uStorage - A Storage Architecture to Provide Block-Level Storage Through Object-Based Storage
    Gutierrez, Felipe Oliveira
    Garcia, Vinicius Cardoso
    Cardoso, Jose Fernando S.
    Jamir, Thiago
    Neto, Josino R.
    Assad, Rodrigo
    Barreto, Marcos
    SERVICE-ORIENTED AND CLOUD COMPUTING (ESOCC 2017), 2017, 10465 : 213 - 228
  • [5] MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services
    Wei, Jiansheng
    Jiang, Hong
    Zhou, Ke
    Feng, Dan
    2010 IEEE 26TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2010,
  • [6] Extensible block-level storage virtualization in cluster-based systems
    Flouris, Michail D.
    Lachaize, Renaud
    Chasapis, Konstantinos
    Bilas, Angelos
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2010, 70 (08) : 800 - 824
  • [7] BLog: Block-level Log-block Management for NAND Flash Memory Storage Systems
    Guan, Yong
    Wang, Guohui
    Wang, Yi
    Chen, Renhai
    Shao, Zili
    ACM SIGPLAN NOTICES, 2013, 48 (05) : 111 - 120
  • [8] A Block-Level Log-Block Management Scheme for MLC NAND Flash Memory Storage Systems
    Guan, Yong
    Wang, Guohui
    Ma, Chenlin
    Chen, Renhai
    Wang, Yi
    Shao, Zili
    IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (09) : 1464 - 1477
  • [9] High-Throughput Object Recognition and Sizing in Disperse Systems
    Voelp, Annika Ricarda
    Fessler, Felix
    Reiner, Jasmin
    Willenbacher, Norbert
    CHEMICAL ENGINEERING & TECHNOLOGY, 2020, 43 (09) : 1897 - 1902
  • [10] Research issues in high-throughput distributed object systems
    Martin, PA
    BT TECHNOLOGY JOURNAL, 1999, 17 (02) : 60 - 68