A Fourier-Based Data Minimization Algorithm for Fast and Secure Transfer of Big Genomic Datasets

被引：1

作者：

Aledhari, Mohammed ^{[1
]}

Di Pierro, Marianne ^{[2
]}

Saeed, Fahad ^{[1
]}

机构：

[1] Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA

[2] Western Michigan Univ, Grad Coll, Kalamazoo, MI 49008 USA

来源：

2018 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS) | 2018年

基金：

美国国家科学基金会;

关键词：

DNA;

D O I：

10.1109/BigDataCongress.2018.00024

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

DNA sequencing plays an important role in the bioinformatics research community. DNA sequencing is important to all organisms, especially to humans and from multiple perspectives. These include understanding the correlation of specific mutations that plays a significant role in increasing or decreasing the risks of developing a disease or condition, or finding the implications and connections between the genotype and the phenotype. Advancements in the high-throughput sequencing techniques, tools, and equipment, have helped to generate big genomic datasets due to the tremendous decrease in the DNA sequence costs. However, the advancements have posed great challenges to genomic data storage, analysis, and transfer. Accessing, manipulating, and sharing the generated big genomic datasets present major challenges in terms of time and size, as well as privacy. Data size plays an important role in addressing these challenges. Accordingly, data minimization techniques have recently attracted much interest in the bioinformatics research community. Therefore, it is critical to develop new ways to minimize the data size. This paper presents a new real-time data minimization mechanism of big genomic datasets to shorten the transfer time in a more secure manner, despite the potential occurrence of a data breach. Our method involves the application of the random sampling of Fourier transform theory to the real-time generated big genomic datasets of both formats: FASTA and FASTQ and assigns the lowest possible code-word to the most frequent characters of the datasets. Our results indicate that the proposed data minimization algorithm is up to 79% of FASTA datasets' size reduction, with 98-fold faster and more secure than the standard data-encoding method. Also, the results show up to 45% of FASTQ datasets' size reduction with 57-fold faster than the standard data-encoding approach. Based on our results, we conclude that the proposed data minimization algorithm provides the best performance among current data-encoding approaches for big real-time generated genomic datasets.

引用

页码：128 / 134

页数：7

共 50 条

[1] A Deep Learning-Based Data Minimization Algorithm for Fast and Secure Transfer of Big Genomic Datasets
Aledhari, Mohammed
Di Pierro, Marianne
Hefeida, Mohamed
Saeed, Fahad
IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (02) : 271 - 284
[2] Fast Fourier-based DSP algorithm for auditory motion experiments
Kourosh Saberi
Behavior Research Methods, Instruments, & Computers, 2004, 36 : 585 - 589
[3] A fast direct fourier-based algorithm for subpixel registration of images
Stone, HS
Orchard, MT
Chang, EC
Martucci, SA
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2001, 39 (10): : 2235 - 2243
[4] Fast Fourier-based DSP algorithm for auditory motion experiments
Saberi, K
BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 2004, 36 (04): : 585 - 589
[5] Erratum to: Fast Fourier-based DSP algorithm for auditory motion experiments
K. Saberi
Behavior Research Methods, 2008, 40 : 635 - 635
[6] Fast adaptive Fourier-based transform and its use in multidimensional data compression
Morhac, M
Matousek, V
SIGNAL PROCESSING, 1998, 68 (02) : 141 - 153
[7] Fast Fourier-Based Implementation of Synthetic Aperture Radar Algorithm for Multistatic Imaging System
Abbasi, Mehryar
Shayei, Ali
Shabany, Mahdi
Kavehvash, Zahra
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2019, 68 (09) : 3339 - 3349
[8] FOURIER-BASED FAST MULTIPOLE METHOD FOR THE HELMHOLTZ EQUATION
Cecka, Cris
Darve, Eric
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2013, 35 (01): : A79 - A103
[9] European Option Pricing With a Fast Fourier Transform Algorithm for Big Data Analysis
Xiao, Shuang
Ma, Shi-Hua
Li, Guo
Mukhopadhyay, Samar K.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2016, 12 (03) : 1219 - 1231
[10] A fast classification algorithm for big data based on KNN
Niu, Kun
Zhao, Fang
Zhang, Shubo
Journal of Applied Sciences, 2013, 13 (12) : 2208 - 2212

← 1 2 3 4 5 →