A novel lossless encoding algorithm for data compression-genomics data as an exemplar

被引:0
|
作者
Al-okaily, Anas [1 ]
Tbakhi, Abdelghani [2 ]
机构
[1] King Hussein Canc Ctr, Dept Cell Therapy Appl Genom, Amman, Jordan
[2] McMaster Univ, Dept Pathol & Mol Med, Hamilton, ON, Canada
来源
关键词
compression; Huffman encoding; LZ; genomics; BWT; SEQUENCES; FORMAT;
D O I
10.3389/fbinf.2024.1489704
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Data compression is a challenging and increasingly important problem. As the amount of data generated daily continues to increase, efficient transmission and storage have never been more critical. In this study, a novel encoding algorithm is proposed, motivated by the compression of DNA data and associated characteristics. The proposed algorithm follows a divide-and-conquer approach by scanning the whole genome, classifying subsequences based on similarities in their content, and binning similar subsequences together. The data is then compressed into each bin independently. This approach is different than the currently known approaches: entropy, dictionary, predictive, or transform-based methods. Proof-of-concept performance was evaluated using a benchmark dataset with seventeen genomes ranging in size from kilobytes to gigabytes. The results showed a considerable improvement in the compression of each genome, preserving several megabytes compared to state-of-the-art tools. Moreover, the algorithm can be applied to the compression of other data types include mainly text, numbers, images, audio, and video which are being generated daily and unprecedentedly in massive volumes.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] An encoding method for both image compression and data lossless information hiding
    Wang, Zhi-Hui
    Chang, Chin-Chen
    Chen, Kuo-Nan
    Li, Ming-Chu
    JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (11) : 2073 - 2082
  • [22] An Improved Lossless ECG Data Compression using ASCII Character Encoding
    Gurve, Dharmendra
    Saini, B. S.
    InduSaini
    PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 758 - 764
  • [23] Novel lossless compression of Doppler weather radar data
    Huang, Yun-Xian
    Ma, Shuo
    Ai, Wei-Hua
    Jiefangjun Ligong Daxue Xuebao/Journal of PLA University of Science and Technology (Natural Science Edition), 2012, 13 (02): : 232 - 236
  • [24] A novel generalized particle model for lossless data compression
    Shuai, Dianxun
    Shuai, Qing
    SNPD 2006: SEVENTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, PROCEEDINGS, 2006, : 202 - +
  • [25] A new fast encoding algorithm for data compression
    Liu, LJ
    Zou, XC
    Shen, XB
    CHINESE JOURNAL OF ELECTRONICS, 2004, 13 (01): : 35 - 39
  • [26] Novel Data Compression Algorithm for Process Data
    Purohit, Amit
    2014 IEEE CONFERENCE ON CONTROL APPLICATIONS (CCA), 2014, : 784 - 789
  • [27] A Novel Data Compression Algorithm for Dynamic Data
    Gupta, Rahul
    Gupta, Ashutosh
    Agarwal, Suneeta
    2008 IEEE REGION 8 INTERNATIONAL CONFERENCE ON COMPUTATIONAL TECHNOLOGIES IN ELECTRICAL AND ELECTRONICS ENGINEERING: SIBIRCON 2008, PROCEEDINGS, 2008, : 266 - +
  • [28] DMBRLE: A Lossless Compression Algorithm for Solar Irradiance Data Acquisition
    Roy, Soumya
    Panja, Subhas Chandra
    Patra, Sankar Narayan
    2015 IEEE 2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION SYSTEMS (RETIS), 2015, : 450 - 454
  • [29] Implementation of LZW Data Lossless Compression Algorithm Based on VB
    Yuan Qinghui
    Nie Xiujun
    Yuan Qingfei
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (ICCSE 2016), 2016, 68 : 37 - 44
  • [30] A lossless data compression algorithm for real-time database
    Huang, Wenjun
    Wang, Weimin
    Xu, Hui
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 6645 - +