A new approach in DNA sequence compression: Fast DNA sequence compression using parallel chaos game representation

被引:5
|
作者
Poor, Nafise Ramezani [1 ]
Yaghoobi, Mahdi [1 ]
机构
[1] Islamic Azad Univ, Comp Engn Dept, Mashhad Branch, Mashhad, Iran
关键词
Chaos game representation; Parallel chaos game representation; DNA sequence; Huffman coding;
D O I
10.1016/j.eswa.2018.09.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DNA sequence is a long string and contains some hidden significant genetic information which are considered by biological researchers in different laboratories, comparing genomes, medicine, engineering and etc. Due to ascending growth of DNA researches, users have faced some challenges in some fields like transfer, maintenance and data storage. Due to the large size of such sequences, there is a need to have a lot of space for storage, so a method is needed to reduce the amount of required space. Data compression may be an efficient way to reduce the size of DNA sequences and results in reduced storage space and transfer bandwidth requirements. Some patterns of effectiveness and importance of methods in compressing data can be seen in compressing existed sequences in database, compressing image and video and some standards like DICOM. The proposed algorithm is a hybrid one consisting of 4 phases: in phase 1 it divides the sequences into subsequences and takes a parallel chaos game representation approach, in phase 2 it replaces the high-frequency substrings using a dictionary method, in phase 3 it uses a parallel Hoffman coding approach, and in phase 4 it creates a structure based on Hoffman results. Since the algorithm runs in parallel mode and creates a dictionary for each subsequence, it increases the compression speed. Also due to the fact that CGR provides all possible patterns, there is no need to search for patterns and results in reduced computation complexity and time. Through the use of this method a benchmarked DNA string "MPOMTCG" gained a compression ratio of 1.6. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:487 / 493
页数:7
相关论文
共 50 条
  • [21] A Compression Model for DNA Multiple Sequence Alignment Blocks
    Matos, Luis M. O.
    Pratas, Diogo
    Pinho, Armando J.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (05) : 3189 - 3198
  • [22] Advances in high throughput DNA sequence data compression
    Sardaraz, Muhammad
    Tahir, Muhammad
    Ikram, Ataul Aziz
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2016, 14 (03)
  • [23] A Novel Approach of Image Encryption Using Chaos and Dynamic DNA Sequence
    Das, Subhajit
    Mondal, Satyendra Nath
    Sanyal, Manas
    PROCEEDINGS 2019 AMITY INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AICAI), 2019, : 876 - 880
  • [24] BACTERIA DNA SEQUENCE COMPRESSION USING A MIXTURE OF FINITE-CONTEXT MODELS
    Pinho, Armando J.
    Pratas, Diogo
    Ferreira, Paulo J. S. G.
    2011 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2011, : 125 - 128
  • [25] DNA sequence compression using the normalized maximum likelihood model for discrete regression
    Tabus, I
    Korodi, G
    Rissanen, J
    DCC 2003: DATA COMPRESSION CONFERENCE, PROCEEDINGS, 2003, : 253 - 262
  • [26] An efficient normalized maximum likelihood algorithm for DNA sequence compression
    Korodi, G
    Tabus, I
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2005, 23 (01) : 3 - 34
  • [27] DNA sequence data compression method based on Memetic Algorithm
    Tan, Li
    Sun, Ji-Feng
    Guo, Li-Hua
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2014, 36 (01): : 121 - 127
  • [28] Efficient Compression and Indexing for Highly Repetitive DNA Sequence Collections
    Huo, Hongwei
    Chen, Xiaoyang
    Guo, Xu
    Vitter, Jeffrey Scott
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2021, 18 (06) : 2394 - 2408
  • [29] Bioinformatics features based DNA Sequence data compression algorithm
    Ji, Zhen
    Zhou, Jia-Rui
    Zhu, Ze-Xuan
    Wu, Q.H.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2011, 39 (05): : 991 - 995
  • [30] NEW APPROACH TO SEQUENCE ANALYSIS OF DNA
    OERTEL, W
    SCHALLER, H
    FEBS LETTERS, 1972, 27 (02) : 316 - &