A Fourier-Based Data Minimization Algorithm for Fast and Secure Transfer of Big Genomic Datasets

被引:1
|
作者
Aledhari, Mohammed [1 ]
Di Pierro, Marianne [2 ]
Saeed, Fahad [1 ]
机构
[1] Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA
[2] Western Michigan Univ, Grad Coll, Kalamazoo, MI 49008 USA
基金
美国国家科学基金会;
关键词
DNA;
D O I
10.1109/BigDataCongress.2018.00024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DNA sequencing plays an important role in the bioinformatics research community. DNA sequencing is important to all organisms, especially to humans and from multiple perspectives. These include understanding the correlation of specific mutations that plays a significant role in increasing or decreasing the risks of developing a disease or condition, or finding the implications and connections between the genotype and the phenotype. Advancements in the high-throughput sequencing techniques, tools, and equipment, have helped to generate big genomic datasets due to the tremendous decrease in the DNA sequence costs. However, the advancements have posed great challenges to genomic data storage, analysis, and transfer. Accessing, manipulating, and sharing the generated big genomic datasets present major challenges in terms of time and size, as well as privacy. Data size plays an important role in addressing these challenges. Accordingly, data minimization techniques have recently attracted much interest in the bioinformatics research community. Therefore, it is critical to develop new ways to minimize the data size. This paper presents a new real-time data minimization mechanism of big genomic datasets to shorten the transfer time in a more secure manner, despite the potential occurrence of a data breach. Our method involves the application of the random sampling of Fourier transform theory to the real-time generated big genomic datasets of both formats: FASTA and FASTQ and assigns the lowest possible code-word to the most frequent characters of the datasets. Our results indicate that the proposed data minimization algorithm is up to 79% of FASTA datasets' size reduction, with 98-fold faster and more secure than the standard data-encoding method. Also, the results show up to 45% of FASTQ datasets' size reduction with 57-fold faster than the standard data-encoding approach. Based on our results, we conclude that the proposed data minimization algorithm provides the best performance among current data-encoding approaches for big real-time generated genomic datasets.
引用
收藏
页码:128 / 134
页数:7
相关论文
共 50 条
  • [41] Parallel algorithm based on Fast Fourier transforms
    Cui, Y., 1600, Journal of Chemical and Pharmaceutical Research, 3/668 Malviya Nagar, Jaipur, Rajasthan, India (06):
  • [42] Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics
    Petrillo, Umberto Ferraro
    Sorella, Mara
    Cattaneo, Giuseppe
    Giancarlo, Raffaele
    Rombo, Simona E.
    BMC BIOINFORMATICS, 2019, 20 (Suppl 4)
  • [43] Fast algorithm for modular exponentiation based on fast fourier transform
    Fuguo D.
    Yuxin T.
    Lin D.
    Journal of Convergence Information Technology, 2011, 6 (06) : 500 - 506
  • [44] Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics
    Umberto Ferraro Petrillo
    Mara Sorella
    Giuseppe Cattaneo
    Raffaele Giancarlo
    Simona E. Rombo
    BMC Bioinformatics, 20
  • [45] BALLISTIC HEAT TRANSFER MODELLING IN SEMICONDUCTOR ELECTRONIC DEVICES: A MODIFIED FOURIER-BASED APPROACH
    Nabovati, Aydin
    Sellan, Daniel P.
    Amon, Cristina H.
    PROCEEDINGS OF THE ASME INTERNATIONAL MECHANICAL ENGINEERING CONGRESS AND EXPOSITION 2010, VOL 4, 2012, : 215 - 219
  • [46] A Fourier-based single phase PLL algorithm: Design, analysis, and implementation in FPGA controller
    Malkhandi, Arpan
    Ghose, Tirthadip
    INTERNATIONAL TRANSACTIONS ON ELECTRICAL ENERGY SYSTEMS, 2017, 27 (10):
  • [47] Fast-Sec: an approach to secure Big Data processing in the cloud
    dos Anjos, Julio C. S.
    Galibus, Tatiana
    Geyer, Claudio F. R.
    Fedak, Gilles
    Costa, Joao Paulo C. L.
    Pereira, Rubem
    de Freitas, Edison Pignaton
    INTERNATIONAL JOURNAL OF PARALLEL EMERGENT AND DISTRIBUTED SYSTEMS, 2019, 34 (03) : 272 - 287
  • [48] A Fourier-based algorithm for modelling aberrations in HETE-2's imaging system
    Schäfer, BM
    Kawai, N
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2003, 500 (1-3): : 263 - 271
  • [49] Processing of big heterogeneous genomic datasets for tertiary analysis of Next Generation Sequencing data
    Masseroli, Marco
    Canakoglu, Arif
    Pinoli, Pietro
    Kaitoua, Abdulrahman
    Gulino, Andrea
    Horlova, Olha
    Nanni, Luca
    Bernasconi, Anna
    Perna, Stefano
    Stamoulakatou, Eirini
    Ceri, Stefano
    BIOINFORMATICS, 2019, 35 (05) : 729 - 736
  • [50] ALGORITHM-AS186 - FAST ALGORITHM OF DATA PERMUTATION IN DISCRETE FAST FOURIER-TRANSFORM
    FRANCIK, A
    KOSCIELNIAK, J
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 1982, 31 (03) : 327 - 330