A Fourier-Based Data Minimization Algorithm for Fast and Secure Transfer of Big Genomic Datasets

被引:1
|
作者
Aledhari, Mohammed [1 ]
Di Pierro, Marianne [2 ]
Saeed, Fahad [1 ]
机构
[1] Western Michigan Univ, Dept Comp Sci, Kalamazoo, MI 49008 USA
[2] Western Michigan Univ, Grad Coll, Kalamazoo, MI 49008 USA
基金
美国国家科学基金会;
关键词
DNA;
D O I
10.1109/BigDataCongress.2018.00024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
DNA sequencing plays an important role in the bioinformatics research community. DNA sequencing is important to all organisms, especially to humans and from multiple perspectives. These include understanding the correlation of specific mutations that plays a significant role in increasing or decreasing the risks of developing a disease or condition, or finding the implications and connections between the genotype and the phenotype. Advancements in the high-throughput sequencing techniques, tools, and equipment, have helped to generate big genomic datasets due to the tremendous decrease in the DNA sequence costs. However, the advancements have posed great challenges to genomic data storage, analysis, and transfer. Accessing, manipulating, and sharing the generated big genomic datasets present major challenges in terms of time and size, as well as privacy. Data size plays an important role in addressing these challenges. Accordingly, data minimization techniques have recently attracted much interest in the bioinformatics research community. Therefore, it is critical to develop new ways to minimize the data size. This paper presents a new real-time data minimization mechanism of big genomic datasets to shorten the transfer time in a more secure manner, despite the potential occurrence of a data breach. Our method involves the application of the random sampling of Fourier transform theory to the real-time generated big genomic datasets of both formats: FASTA and FASTQ and assigns the lowest possible code-word to the most frequent characters of the datasets. Our results indicate that the proposed data minimization algorithm is up to 79% of FASTA datasets' size reduction, with 98-fold faster and more secure than the standard data-encoding method. Also, the results show up to 45% of FASTQ datasets' size reduction with 57-fold faster than the standard data-encoding approach. Based on our results, we conclude that the proposed data minimization algorithm provides the best performance among current data-encoding approaches for big real-time generated genomic datasets.
引用
收藏
页码:128 / 134
页数:7
相关论文
共 50 条
  • [31] A fast DBSCAN algorithm for big data based on efficient density calculation
    Hanafi, Nooshin
    Saadatfar, Hamid
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 203
  • [32] The fast clustering algorithm for the big data based on K-means
    Xie, Ting
    Zhang, Taiping
    INTERNATIONAL JOURNAL OF WAVELETS MULTIRESOLUTION AND INFORMATION PROCESSING, 2020, 18 (06)
  • [33] Fourier-based reconstruction via alternating direction total variation minimization in linear scan CT
    Cai, Ailong
    Wang, Linyuan
    Yan, Bin
    Zhang, Hanming
    Li, Lei
    Xi, Xiaoqi
    Li, Jianxin
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2015, 775 : 84 - 92
  • [34] A hybrid encryption algorithm based approach for secure privacy protection of big data in hospitals
    Li, Wei
    Huang, Qian
    EGYPTIAN INFORMATICS JOURNAL, 2024, 28
  • [35] Accurate Range Migration for Fast Quantitative Fourier-Based Image Reconstruction With Monostatic Radar
    Tajik, Daniel
    Kazemivala, Romina
    Nguyen, Jimmy
    Nikolova, Natalia K.
    IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, 2022, 70 (09) : 4273 - 4283
  • [36] Secure Distribution of Big Data Based on BitTorrent
    Xiao, Limin
    Xu, Chunjie
    Wu, Yanfei
    Qin, Jingchao
    Qin, Guangjun
    Zhu, Mingfa
    Ruan, Li
    Wang, Zhiyao
    Li, Mingquan
    Tan, Dongyu
    2013 IEEE 11TH INTERNATIONAL CONFERENCE ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING (DASC), 2013, : 82 - 90
  • [37] An Ultra-Fast Method for Clustering of Big Genomic Data
    Kenidra, Billel
    Benmohammed, Mohamed
    INTERNATIONAL JOURNAL OF APPLIED METAHEURISTIC COMPUTING, 2020, 11 (01) : 45 - 60
  • [38] Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator
    Hu, Zibu
    Li, Shurui
    Schwartz, Russell L. T.
    Solyanik-Gorgone, Maria
    Nouri, Behrouz Movahhed
    Miscuglio, Mario
    Gupta, Puneet
    Dalir, Hamed
    Sorger, Volker J.
    EMERGING TOPICS IN ARTIFICIAL INTELLIGENCE (ETAI) 2022, 2022, 12204
  • [39] Fast search of art culture resources based on big data and cuckoo algorithm
    Xuewen Xia
    Personal and Ubiquitous Computing, 2020, 24 : 127 - 138
  • [40] Fast search of art culture resources based on big data and cuckoo algorithm
    Xia, Xuewen
    PERSONAL AND UBIQUITOUS COMPUTING, 2020, 24 (01) : 127 - 138