Towards a Signature Based Compression Technique for Big Data Storage

被引:0
|
作者
Costa, Constantinos [1 ]
Chrysanthis, Panos K. [1 ]
Costa, Marios [1 ]
Stavrakis, Efstathios [2 ]
Nicolaou, Nicolas [2 ]
机构
[1] Rinnoco Ltd, Limassol, Cyprus
[2] Algolysis Ltd, Limassol, Cyprus
来源
2023 IEEE 39TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS, ICDEW | 2023年
关键词
signature based; compression; column stores; hybrid store;
D O I
10.1109/ICDEW58674.2023.00022
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although the volume of stored data doubles every year, storage capacity costs decline only at a rate of less than 1/5 per year. At the same time, data is stored in multiple physical locations and remotely retrieved from multiple sites. Thus, minimizing data storage costs while maintaining data fidelity and efficient retrieval is still a key challenge in database systems. In addition to the raw big data, its associated metadata and indexes equally demand tremendous storage that impacts the I/O footprint of data centers. In this vision paper, we propose a new signature-based compression (SIBACO) technique that is able to: (i) incrementally store big data in an efficient way; and (ii) improve the retrieval time for data-intensive applications. SIBACO achieves higher compression ratios by combining and compressing columns differently based on the type and distribution of data and can be easily integrated with column and hybrid stores. We evaluate our proposed tool using real datasets showing that SIBACO outperforms "monolithic" compression schemes in terms of storage cost.
引用
收藏
页码:100 / 104
页数:5
相关论文
共 50 条
  • [1] Data Reduction Based on Compression Technique for Big Data in IoT
    Abdulzahra, Suha Abdulhussein
    Al-Qurabat, Ali Kadhum M.
    Idrees, Ali Kadhum
    2020 INTERNATIONAL CONFERENCE ON EMERGING SMART COMPUTING AND INFORMATICS (ESCI), 2020, : 103 - 108
  • [2] Bucket Based Data Deduplication Technique for Big Data Storage System
    Kumar, Naresh
    Rawat, Rahul
    Jain, S. C.
    2016 5TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (TRENDS AND FUTURE DIRECTIONS) (ICRITO), 2016, : 267 - 271
  • [3] A New Rockburst Experiment Data Compression Storage Algorithm Based on Big Data Technology
    Zhang, Yu
    Wang, Yan-Ge
    Bai, Yan-Ping
    Li, Yong-Zhen
    Lv, Zhao-Yong
    Ding, Hong-Wei
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2019, 25 (03): : 561 - 572
  • [4] NEW TECHNIQUE FOR COMPRESSION AND STORAGE OF DATA
    HAHN, B
    COMMUNICATIONS OF THE ACM, 1974, 17 (08) : 434 - 436
  • [5] Algorithm for Fuzzy based Compression of Gray JPEG Images for Big Data Storage
    Kaur, Navneet
    Bawa, Navneet
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 518 - 523
  • [6] Hardware Based Compression in Big Data
    Jain, Deepak
    McFadden, Gordon
    Will, Brian
    2016 DATA COMPRESSION CONFERENCE (DCC), 2016, : 605 - 605
  • [7] An efficient ECG data compression technique based on predefined signature and envelope vector banks
    Gürkan, H
    Güz, Ü
    Yarman, BS
    2005 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), VOLS 1-6, CONFERENCE PROCEEDINGS, 2005, : 1334 - 1337
  • [8] Cluster Mapping Compression Storage of Monitoring Big Data in Distribution Network Based on Hive
    Qu, Zhi-jian
    Chen, Ding-long
    Peng, Xiang
    Wang, Qun-feng
    Zhao, Liang
    MATERIALS, INFORMATION, MECHANICAL, ELECTRONIC AND COMPUTER ENGINEERING (MIMECE 2016), 2016, : 366 - 371
  • [9] Towards Resilient and Efficient Big Data Storage: Evaluating a STEM Repository Based on HDFS
    Saenko, Igor
    Kotenko, Igor
    30TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2022), 2022, : 290 - 297
  • [10] A Systematic Review of Different Data Compression Technique of Cloud Big Sensing Data
    Rani, I. Sandhya
    Venkateswarlu, Bondu
    SECOND INTERNATIONAL CONFERENCE ON COMPUTER NETWORKS AND COMMUNICATION TECHNOLOGIES, ICCNCT 2019, 2020, 44 : 222 - 228