LEC-Codec: Learning-Based Genome Data Compression

被引:0
|
作者
Sun, Zhenhao [1 ]
Wang, Meng [2 ]
Wang, Shiqi [1 ]
Kwong, Sam [2 ]
机构
[1] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
[2] Lingnan Univ, Sch Data Sci, Hong Kong, Peoples R China
关键词
Genomics; Bioinformatics; Encoding; Context modeling; Symbols; Predictive models; Codecs; Computational modeling; Complexity theory; Termination of employment; Data compression; learning-based method; lossless genome compression; non-reference method;
D O I
10.1109/TCBB.2024.3473899
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, we propose a Learning-based gEnome Codec (LEC), which is designed for high efficiency and enhanced flexibility. The LEC integrates several advanced technologies, including Group of Bases (GoB) compression, multi-stride coding and bidirectional prediction, all of which are aimed at optimizing the balance between coding complexity and performance in lossless compression. The model applied in our proposed codec is data-driven, based on deep neural networks to infer probabilities for each symbol, enabling fully parallel encoding and decoding with configured complexity for diverse applications. Based upon a set of configurations on compression ratios and inference speed, experimental results show that the proposed method is very efficient in terms of compression performance and provides improved flexibility in real-world applications.
引用
收藏
页码:2447 / 2458
页数:12
相关论文
共 50 条
  • [41] Deep Learning-Based Classification of Hyperspectral Data
    Chen, Yushi
    Lin, Zhouhan
    Zhao, Xing
    Wang, Gang
    Gu, Yanfeng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2014, 7 (06) : 2094 - 2107
  • [42] Learning-Based Dissimilarity for Clustering Categorical Data
    Rivera Rios, Edgar Jacob
    Angel Medina-Perez, Miguel
    Lazo-Cortes, Manuel S.
    Monroy, Raul
    APPLIED SCIENCES-BASEL, 2021, 11 (08):
  • [43] DeepRace: A learning-based data race detector
    TehraniJamsaz, Ali
    Khaleel, Mohammed
    Akbari, Reza
    Jannesari, Ali
    2021 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW 2021), 2021, : 226 - 233
  • [44] Learning-Based Cleansing for Indoor RFID Data
    Baba, Asif Iqbal
    Jaeger, Manfred
    Lu, Hua
    Pedersen, Torben Bach
    Ku, Wei-Shinn
    Xie, Xike
    SIGMOD'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2016, : 925 - 936
  • [45] Compression of medical volumetric data in a video-codec framework
    Wong, YF
    Chen, TH
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 2128 - 2131
  • [46] Learning-Based Rate Control for Video-Based Point Cloud Compression
    Wang, Taiyu
    Li, Fan
    Cosman, Pamela C.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2175 - 2189
  • [47] A feature-based scalable codec for image compression
    Kuo, LC
    Wang, SJ
    AINA 2005: 19TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 2, 2005, : 87 - 90
  • [48] A Learning-based Video Compression on Low-Quality Data by Unscented Kalman Filters with Gaussian Process Regression
    Xiong, Hongkai
    Yuan, Zhe
    Zheng, Yuan F.
    2011 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2011, : 1227 - 1230
  • [49] A Machine Learning-Based Seismic Data Compression and Interpretation Using a Novel Shifted-Matrix Decomposition Algorithm
    Brankovic, Milan
    Gildin, Eduardo
    Gibson, Richard L.
    Everett, Mark E.
    APPLIED SCIENCES-BASEL, 2021, 11 (11):
  • [50] On the performance of learning-based image compression as source coding for JPEG DNA
    Upenik, Evgeniy
    Lazzarotto, Davi
    Testolina, Michela
    Ebrahimi, Touradj
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XLVII, 2024, 13137