High-performance IoT streaming data prediction system using Spark: a case study of air pollution

被引:0
|
作者
Ho-Yong Jin
Eun-Sung Jung
Duckki Lee
机构
[1] Hongik University,Department of Software and Communications Engineering
[2] Yonam Institute of Technology,Department of Smart Software
来源
关键词
Long Short-Term Memory (LSTM); Distributed deep learning; Distributed Keras (Dist-Keras); Apache Spark;
D O I
暂无
中图分类号
学科分类号
摘要
Internet-of-Things (IoT) devices are becoming prevalent, and some of them, such as sensors, generate continuous time-series data, i.e., streaming data. These IoT streaming data are one of Big Data sources, and they require careful consideration for efficient data processing and analysis. Deep learning is emerging as a solution to IoT streaming data analytics. However, there is a persistent problem in deep learning that it takes a long time to learn neural networks. In this paper, we propose a high-performance IoT streaming data prediction system to improve the learning speed and to predict in real time. We showed the efficacy of the system through a case study of air pollution. The experimental results show that the modified LSTM autoencoder model shows the best performance compared to a generic LSTM model. We noticed that achieving the best performance requires optimizing many parameters, including learning rate, epoch, memory cell size, input timestep size, and the number of features/predictors. In that regard, we show that the high-performance data learning/prediction frameworks (e.g., Spark, Dist-Keras, and Hadoop) are essential to rapidly fine-tune a model for training and testing before real deployment of the model as data accumulate.
引用
收藏
页码:13147 / 13154
页数:7
相关论文
共 50 条
  • [21] High-Performance FPGA Streaming Data Concentrator for GEM Electronic Measurement System for WEST Tokamak
    Kolasinski, Piotr
    Pozniak, Krzysztof T.
    Wojenski, Andrzej
    Linczuk, Pawel
    Kasprowicz, Grzegorz
    Chernyshova, Maryna
    Mazon, Didier
    Czarski, Tomasz
    Colnel, Julian
    Malinowski, Karol
    Guibert, Denis
    ELECTRONICS, 2023, 12 (17)
  • [22] Prediction of Air Pollution of Boushehr City Using Data Mining
    Sahafizadeh, Ebrahim
    Ahmadi, Esmail
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON ENVIRONMENTAL AND COMPUTER SCIENCE, 2009, : 33 - +
  • [23] High-performance data mining system
    Yaginuma, Y
    FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 2000, 36 (02): : 201 - 210
  • [24] Sensitivity Study of a Large-Scale Air Pollution Model by Using High-Performance Computations and Monte Carlo Algorithms
    Ostromsky, Tz
    Dimov, I.
    Georgieva, R.
    Marinov, P.
    Zlatev, Z.
    APPLICATION OF MATHEMATICS IN TECHNICAL AND NATURAL SCIENCES, 2013, 1561 : 153 - 163
  • [25] A case study of a distributed high-performance computing system for neurocomputing
    Anguita, D
    Boni, A
    Parodi, G
    JOURNAL OF SYSTEMS ARCHITECTURE, 2000, 46 (05) : 429 - 438
  • [26] High-performance federated continual learning algorithm for heterogeneous streaming data
    Jiang H.
    He T.
    Liu M.
    Sun S.
    Wang Y.
    Tongxin Xuebao/Journal on Communications, 2023, 44 (05): : 123 - 136
  • [27] Spark Meets MPI: Towards High-Performance Communication Framework for Spark using MPI
    Al-Attar, Kinan
    Shafi, Aamir
    Abduljabbar, Mustafa
    Subramoni, Hari
    Panda, Dhabaleswar K.
    2022 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER 2022), 2022, : 71 - 81
  • [28] On Performance Prediction of Big Data Transfer in High-performance Networks
    Liu, Wuji
    Yun, Daqing
    Wu, Chase Q.
    Rao, Nageswara S., V
    Hou, Aiqin
    Shen, Wei
    ICC 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2020,
  • [29] High performance air pollution simulation using OpenMP
    Parada, M
    Martín, MJ
    Doallo, R
    2002 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, PROCEEDINGS OF THE WORKSHOPS, 2002, : 391 - 397
  • [30] High performance air pollution simulation using OpenMP
    Martin, MJ
    Parada, M
    Doallo, R
    JOURNAL OF SUPERCOMPUTING, 2004, 28 (03): : 311 - 321