A novel lossless encoding algorithm for data compression-genomics data as an exemplar

被引:0
|
作者
Al-okaily, Anas [1 ]
Tbakhi, Abdelghani [2 ]
机构
[1] King Hussein Canc Ctr, Dept Cell Therapy Appl Genom, Amman, Jordan
[2] McMaster Univ, Dept Pathol & Mol Med, Hamilton, ON, Canada
来源
关键词
compression; Huffman encoding; LZ; genomics; BWT; SEQUENCES; FORMAT;
D O I
10.3389/fbinf.2024.1489704
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Data compression is a challenging and increasingly important problem. As the amount of data generated daily continues to increase, efficient transmission and storage have never been more critical. In this study, a novel encoding algorithm is proposed, motivated by the compression of DNA data and associated characteristics. The proposed algorithm follows a divide-and-conquer approach by scanning the whole genome, classifying subsequences based on similarities in their content, and binning similar subsequences together. The data is then compressed into each bin independently. This approach is different than the currently known approaches: entropy, dictionary, predictive, or transform-based methods. Proof-of-concept performance was evaluated using a benchmark dataset with seventeen genomes ranging in size from kilobytes to gigabytes. The results showed a considerable improvement in the compression of each genome, preserving several megabytes compared to state-of-the-art tools. Moreover, the algorithm can be applied to the compression of other data types include mainly text, numbers, images, audio, and video which are being generated daily and unprecedentedly in massive volumes.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Grouping algorithm for lossless data compression
    Tadayon, N
    Feng, GL
    Rao, TRN
    Hinds, E
    DCC '98 - DATA COMPRESSION CONFERENCE, 1998, : 574 - 574
  • [2] A lossless compression algorithm for hyperspectral data
    Gladkova, I.
    Grossberg, M.
    SATELLITE DATA COMPRESSION, COMMUNICATIONS AND ARCHIVING II, 2006, 6300
  • [3] Particle algorithm for lossless data compression
    Dianxun Shuai
    Ping Zhang
    Bin Zhang
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 3766 - +
  • [4] The Block Lossless Data Compression Algorithm
    Chang, Weiling
    Fang, Binxing
    Yun, Xiaochun
    Wang, Shupeng
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (10): : 124 - 131
  • [5] Lossless Data Compression Algorithm for Satellite Packet Telemetry Data
    Li Guojun
    Shi Jian
    Zhang Running
    PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC), 2013, : 2756 - 2759
  • [6] A pattern tracking algorithm for lossless data compression
    Hebert, Thomas J.
    Karulkar, Shruti N.
    INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2011, 4 (03) : 135 - 141
  • [7] An algorithm for a lossless compression of raw radar data
    Pikacz, Bartosz
    PHOTONICS APPLICATIONS IN ASTRONOMY, COMMUNICATIONS, INDUSTRY, AND HIGH-ENERGY PHYSICS EXPERIMENTS 2014, 2014, 9290
  • [8] A character elimination algorithm for lossless data compression
    Hosang, M
    DCC 2002: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2002, : 457 - 457
  • [9] A fast algorithm for lossless compression of data tables by reordering
    Vucetic, Slobodan
    DCC 2006: Data Compression Conference, Proceedings, 2006, : 469 - 469
  • [10] A Lossless Compression Algorithm For Vibration Data Of Space Systems
    Abraham, Jijo George
    Mishra, Rahul
    Deepa, J.
    2016 INTERNATIONAL CONFERENCE ON NEXT GENERATION INTELLIGENT SYSTEMS (ICNGIS), 2016, : 162 - 168