A Fourier-Based Data Minimization Algorithm for Fast and Secure Transfer of Big Genomic Datasets

被引:1
|
作者
Aledhari, Mohammed [1 ]
Di Pierro, Marianne [2 ]
Saeed, Fahad [1 ]
机构
[1] Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA
[2] Western Michigan Univ, Grad Coll, Kalamazoo, MI 49008 USA
基金
美国国家科学基金会;
关键词
DNA;
D O I
10.1109/BigDataCongress.2018.00024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DNA sequencing plays an important role in the bioinformatics research community. DNA sequencing is important to all organisms, especially to humans and from multiple perspectives. These include understanding the correlation of specific mutations that plays a significant role in increasing or decreasing the risks of developing a disease or condition, or finding the implications and connections between the genotype and the phenotype. Advancements in the high-throughput sequencing techniques, tools, and equipment, have helped to generate big genomic datasets due to the tremendous decrease in the DNA sequence costs. However, the advancements have posed great challenges to genomic data storage, analysis, and transfer. Accessing, manipulating, and sharing the generated big genomic datasets present major challenges in terms of time and size, as well as privacy. Data size plays an important role in addressing these challenges. Accordingly, data minimization techniques have recently attracted much interest in the bioinformatics research community. Therefore, it is critical to develop new ways to minimize the data size. This paper presents a new real-time data minimization mechanism of big genomic datasets to shorten the transfer time in a more secure manner, despite the potential occurrence of a data breach. Our method involves the application of the random sampling of Fourier transform theory to the real-time generated big genomic datasets of both formats: FASTA and FASTQ and assigns the lowest possible code-word to the most frequent characters of the datasets. Our results indicate that the proposed data minimization algorithm is up to 79% of FASTA datasets' size reduction, with 98-fold faster and more secure than the standard data-encoding method. Also, the results show up to 45% of FASTQ datasets' size reduction with 57-fold faster than the standard data-encoding approach. Based on our results, we conclude that the proposed data minimization algorithm provides the best performance among current data-encoding approaches for big real-time generated genomic datasets.
引用
收藏
页码:128 / 134
页数:7
相关论文
共 50 条
  • [1] A Deep Learning-Based Data Minimization Algorithm for Fast and Secure Transfer of Big Genomic Datasets
    Aledhari, Mohammed
    Di Pierro, Marianne
    Hefeida, Mohamed
    Saeed, Fahad
    IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (02) : 271 - 284
  • [2] Fast Fourier-based DSP algorithm for auditory motion experiments
    Kourosh Saberi
    Behavior Research Methods, Instruments, & Computers, 2004, 36 : 585 - 589
  • [3] A fast direct fourier-based algorithm for subpixel registration of images
    Stone, HS
    Orchard, MT
    Chang, EC
    Martucci, SA
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2001, 39 (10): : 2235 - 2243
  • [4] Fast Fourier-based DSP algorithm for auditory motion experiments
    Saberi, K
    BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 2004, 36 (04): : 585 - 589
  • [5] Erratum to: Fast Fourier-based DSP algorithm for auditory motion experiments
    K. Saberi
    Behavior Research Methods, 2008, 40 : 635 - 635
  • [6] Fast adaptive Fourier-based transform and its use in multidimensional data compression
    Morhac, M
    Matousek, V
    SIGNAL PROCESSING, 1998, 68 (02) : 141 - 153
  • [7] Fast Fourier-Based Implementation of Synthetic Aperture Radar Algorithm for Multistatic Imaging System
    Abbasi, Mehryar
    Shayei, Ali
    Shabany, Mahdi
    Kavehvash, Zahra
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2019, 68 (09) : 3339 - 3349
  • [8] FOURIER-BASED FAST MULTIPOLE METHOD FOR THE HELMHOLTZ EQUATION
    Cecka, Cris
    Darve, Eric
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2013, 35 (01): : A79 - A103
  • [9] European Option Pricing With a Fast Fourier Transform Algorithm for Big Data Analysis
    Xiao, Shuang
    Ma, Shi-Hua
    Li, Guo
    Mukhopadhyay, Samar K.
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2016, 12 (03) : 1219 - 1231
  • [10] A fast classification algorithm for big data based on KNN
    Niu, Kun
    Zhao, Fang
    Zhang, Shubo
    Journal of Applied Sciences, 2013, 13 (12) : 2208 - 2212