An End-to-end High-performance Deduplication Scheme for Docker Registries and Docker Container Storage Systems

被引:1
|
作者
Zhao, Nannan [1 ]
Lin, Muhui [2 ]
Albahar, Hadeel [3 ]
Paul, Arnab K. [4 ]
Huang, Zhijie [5 ]
Abraham, Subil [6 ]
Chen, Keren [7 ]
Tarasov, Vasily [8 ]
Skourtis, Dimitrios [8 ]
Anwar, Ali [9 ]
Butt, Ali R. [7 ]
机构
[1] Northwestern Polytech Univ Shenzhen, Res & Dev Inst, Xian, Peoples R China
[2] Alibaba Grp, Hangzhou, Peoples R China
[3] Sabah Al Salem Univ City, Kuwait Univ, Kuwait, Kuwait
[4] BITS Pilani, KK Birla Goa Campus, Zuarinagar 403726, Goa, India
[5] Northwestern Polytech Univ, Xian 710129, Shaanxi, Peoples R China
[6] Oak Ridge Natl Lab, Oak Ridge, TN 37830 USA
[7] Virginia Tech, Blacksburg, VA 24061 USA
[8] IBM Res Almaden, San Jose, CA 95120 USA
[9] Univ Minnesota, Twin Cities Campus, Minneapolis, MN 55455 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Docker registry; docker storage driver; linux file system; deduplication;
D O I
10.1145/3643819
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The wide adoption of Docker containers for supporting agile and elastic enterprise applications has led to a broad proliferation of container images. The associated storage performance and capacity requirements place a high pressure on the infrastructure of container registries that store and distribute images and container storage systems on the Docker client side that manage image layers and store ephemeral data generated at container runtime. The storage demand is worsened by the large amount of duplicate data in images. Moreover, container storage systems that use Copy-on-Write (CoW) file systems as storage drivers exacerbate the redundancy. Exploiting the high file redundancy in real-world images is a promising approach to drastically reduce the growing storage requirements of container registries and improve the space efficiency of container storage systems. However, existing deduplication techniques significantly degrade the performance of both registries and container storage systems because of data reconstruction overhead as well as the deduplication cost. We propose DupHunter, an end-to-end deduplication scheme that deduplicates layers for both Docker registries and container storage systems while maintaining a high image distribution speed and container I/O performance. DupHunter is divided into three tiers: registry tier, middle tier, and client tier. Specifically, we first build a high-performance deduplication engine at the registry tier that not only natively deduplicates layers for space savings but also reduces layer restore overhead. Then, we use deduplication offloading at the middle tier to eliminate the redundant files from the client tier and avoid bringing deduplication overhead to the clients. To further reduce the data duplicates caused by CoWs and improve the container I/O performance, we utilize a container-aware storage system at the client tier that reserves space for each container and arranges the placement of files and their modifications on the disk to preserve locality. Under real workloads, DupHunter reduces storage space by up to 6.9x and reduces the GET layer latency up to 2.8x compared to the state-of-the-art. Moreover, DupHunter can improve the container I/O performance by up to 93% for reads and 64% for writes.
引用
收藏
页数:35
相关论文
共 50 条
  • [41] A High-Performance Hierarchical Snapshot Scheme for Hybrid Storage Systems
    Yu Xiao
    Tan Yu'an
    Zhang Changyou
    Liang Chen
    Aourra, Khaled
    Zheng Jun
    Zhang Quanxin
    CHINESE JOURNAL OF ELECTRONICS, 2018, 27 (01) : 76 - 85
  • [42] A High-Performance Hierarchical Snapshot Scheme for Hybrid Storage Systems
    YU Xiao
    TAN Yu'an
    ZHANG Changyou
    LIANG Chen
    Khaled AOURRA
    ZHENG Jun
    ZHANG Quanxin
    Chinese Journal of Electronics, 2018, 27 (01) : 76 - 85
  • [43] An End-to-end and Adaptive I/O Optimization Tool for Modern HPC Storage Systems
    Yang, Bin
    Zou, Yanliang
    Liu, Weiguo
    Xue, Wei
    2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2022), 2022, : 1294 - 1304
  • [44] Building a High-performance Fine-grained Deduplication Framework for Backup Storage with High Deduplication Ratio
    Zou, Xiangyu
    Xia, Wen
    Shilane, Philip
    Zhang, Haijun
    Wang, Xuan
    PROCEEDINGS OF THE 2022 USENIX ANNUAL TECHNICAL CONFERENCE, 2022, : 19 - 35
  • [45] Application of the harmonic mean statistics to the end-to-end performance of transmission systems with relays
    Hasna, MO
    Alouini, MS
    GLOBECOM'02: IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-3, CONFERENCE RECORDS: THE WORLD CONVERGES, 2002, : 1310 - 1314
  • [46] End-to-end performance guarantee for distributed real-time embedded systems
    School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
    Dianzi Keji Diaxue Xuebao, 2007, 3 (541-544): : 541 - 544
  • [47] POEMS: End-to-end performance design of large parallel adaptive computational systems
    Adve, VS
    Bagrodia, R
    Browne, JC
    Deelman, E
    Dube, A
    Houstis, EN
    Rice, JR
    Sakellariou, R
    Sundaram-Stukel, DJ
    Teller, PJ
    Vernon, MK
    IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2000, 26 (11) : 1027 - 1048
  • [48] Concurrent and Robust End-to-End Data Integrity Verification Scheme for Flash-Based Storage Devices
    Kim, Hwajung
    Hwang, Inhwi
    Lee, Jeongeun
    Yeom, Heon Y.
    Sung, Hanul
    IEEE ACCESS, 2022, 10 : 36350 - 36361
  • [49] High-performance end-to-end deep learning IM/DD link using optics-informed neural networks
    Roumpos, Ioannis
    De Marinis, Lorenzo
    Kirtas, Manos
    Passalis, Nikolaos
    Tefas, Anastasios
    Contestabile, Giampiero
    Pleros, Nikos
    Moralis-Pegios, Miltiadis
    Vyrsokinos, Konstantinos
    OPTICS EXPRESS, 2023, 31 (12): : 20068 - 20079
  • [50] Building a High Performance End-to-End Explicit Discourse Parser for Practical Application
    Wang, Jianxiang
    Lan, Man
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2015, 2015, 9403 : 324 - 335