A new approach in DNA sequence compression: Fast DNA sequence compression using parallel chaos game representation

被引:5
|
作者
Poor, Nafise Ramezani [1 ]
Yaghoobi, Mahdi [1 ]
机构
[1] Islamic Azad Univ, Comp Engn Dept, Mashhad Branch, Mashhad, Iran
关键词
Chaos game representation; Parallel chaos game representation; DNA sequence; Huffman coding;
D O I
10.1016/j.eswa.2018.09.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DNA sequence is a long string and contains some hidden significant genetic information which are considered by biological researchers in different laboratories, comparing genomes, medicine, engineering and etc. Due to ascending growth of DNA researches, users have faced some challenges in some fields like transfer, maintenance and data storage. Due to the large size of such sequences, there is a need to have a lot of space for storage, so a method is needed to reduce the amount of required space. Data compression may be an efficient way to reduce the size of DNA sequences and results in reduced storage space and transfer bandwidth requirements. Some patterns of effectiveness and importance of methods in compressing data can be seen in compressing existed sequences in database, compressing image and video and some standards like DICOM. The proposed algorithm is a hybrid one consisting of 4 phases: in phase 1 it divides the sequences into subsequences and takes a parallel chaos game representation approach, in phase 2 it replaces the high-frequency substrings using a dictionary method, in phase 3 it uses a parallel Hoffman coding approach, and in phase 4 it creates a structure based on Hoffman results. Since the algorithm runs in parallel mode and creates a dictionary for each subsequence, it increases the compression speed. Also due to the fact that CGR provides all possible patterns, there is no need to search for patterns and results in reduced computation complexity and time. Through the use of this method a benchmarked DNA string "MPOMTCG" gained a compression ratio of 1.6. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:487 / 493
页数:7
相关论文
共 50 条
  • [1] DNACompress: fast and effective DNA sequence compression
    Chen, X
    Li, M
    Ma, B
    Tromp, J
    BIOINFORMATICS, 2002, 18 (12) : 1696 - 1698
  • [2] DNA sequence compression
    Korodi, Gergely
    Tabus, Ioan
    Rissanen, Jorma
    Astola, Jaakko
    IEEE SIGNAL PROCESSING MAGAZINE, 2007, 24 (01) : 47 - 53
  • [3] Polynomial Based Representation for DNA Sequence Compression and Search
    Khan, Waqar Ahmad
    Khan, Aftab
    2020 IEEE PUNE SECTION INTERNATIONAL CONFERENCE (PUNECON), 2020, : 202 - 205
  • [4] An Efficient DNA Sequence Compression using Small Sequence Pattern Matching
    Murugan, A.
    Punitha, K.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2021, 21 (08): : 281 - 287
  • [5] An Improved Method for DNA Sequence Compression
    Arya, Govind Prasad
    Bharti, R. K.
    Prasad, Devendra
    Garg, Vishal
    2017 2ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATION AND NETWORKS (TEL-NET), 2017, : 470 - 473
  • [6] Improve the Compression of Bacterial DNA Sequence
    Bakr, Nour S.
    Sharawi, Amr A.
    2017 13TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO), 2017, : 286 - 290
  • [7] A DNA Sequence Corpus for Compression Benchmark
    Pratas, Diogo
    Pinho, Armando J.
    PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 803 : 208 - 215
  • [8] DNA Sequence Compression within Traditional Text Compression Algorithms
    Seker, Abdulkadir
    Delibas, Emre
    Diri, Banu
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [9] DNA sequence compression using the Burrows-Wheeler Transform
    Adjeroh, D
    Zhang, Y
    Mukherjee, A
    Powell, M
    Bell, T
    CSB2002: IEEE COMPUTER SOCIETY BIOINFORMATICS CONFERENCE, 2002, : 303 - 313
  • [10] Intelligent DNA sequence data compression using memetic algorithm
    College of Biomedical Engineering and Instrument Science, Zhejiang University, Hangzhou, Zhejiang 310027, China
    不详
    不详
    Tien Tzu Hsueh Pao, 2013, 3 (513-518):