Efficient spatiotemporal interpolation with spark machine learning

被引:0
|
作者
Weitian Tong
Lixin Li
Xiaolu Zhou
Jason Franklin
机构
[1] Georgia Southern University,Department of Computer Science
[2] Georgia Southern University,Department of Geology and Geography
来源
Earth Science Informatics | 2019年 / 12卷
关键词
Spatiotemporal interpolation; Spark; Machine learning; Inverse distance weighting (IDW); k-d tree; Bootstrap aggregating;
D O I
暂无
中图分类号
学科分类号
摘要
To better assess the relationships between environmental exposures and health outcomes, an appropriate spatiotemporal interpolation is critical. Traditional spatiotemporal interpolation methods either consider the spatial and temporal dimensions separately or incorporate both dimensions simultaneously by simply treating time as another dimension in space. Such interpolation results suffer from relatively low accuracy as the true space-time domain is skewed inappropriately and the distance calculation in such domain is not accurate. We employ the efficient k-d tree structure to store spatiotemporal data and adopt several machine learning methods to learn optimal parameters. To overcome the computational difficulty with large data sets, we implement our method on an efficient cluster computing framework – Apache Spark. Real world PM2.5 data sets are utilized to test our implementation and the experimental results demonstrate the computational power of our method, which significantly outperforms the previous work in terms of both speed and accuracy.
引用
收藏
页码:87 / 96
页数:9
相关论文
共 50 条
  • [41] A Machine Learning Approach for Predicting Execution Time of Spark Jobs
    Mustafa, Sara
    Elghandour, Iman
    Ismail, Mohamed A.
    ALEXANDRIA ENGINEERING JOURNAL, 2018, 57 (04) : 3767 - 3778
  • [42] Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink
    Kamburugamuve, Supun
    Wickramasinghe, Pulasthi
    Ekanayake, Saliya
    Fox, Geoffrey C.
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2018, 32 (01): : 61 - 73
  • [43] Big Data Machine Learning using Apache Spark MLlib
    Assefi, Mehdi
    Behravesh, Ehsun
    Liu, Guangchi
    Tafti, Ahmad P.
    2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
  • [44] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
    Ali Mostafaeipour
    Amir Jahangard Rafsanjani
    Mohammad Ahmadi
    Joshuva Arockia Dhanraj
    The Journal of Supercomputing, 2021, 77 : 1273 - 1300
  • [45] SPARK-A Big Data Processing Platform for Machine Learning
    Fu, Jian
    Sun, Junwei
    Wang, Kaiyuan
    2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 48 - 51
  • [46] Automated Deployment of a Spark Cluster with Machine Learning Algorithm Integration
    Fernandez, A. M.
    Gutierrez-Aviles, D.
    Troncoso, A.
    Martinez-Alvarez, F.
    BIG DATA RESEARCH, 2020, 19-20
  • [47] Machine Learning Driven Responsible Gaming Framework with Apache Spark
    Mijic, Dejan
    Varga, Ervin
    2017 25TH TELECOMMUNICATION FORUM (TELFOR), 2017, : 796 - 799
  • [48] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
    Mostafaeipour, Ali
    Rafsanjani, Amir Jahangard
    Ahmadi, Mohammad
    Dhanraj, Joshuva Arockia
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (02): : 1273 - 1300
  • [49] Network Intrusion Detection on Apache Spark with Machine Learning Algorithms
    Kurt, Elif Merve
    Becerikli, Yasar
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2018, 2018, 893 : 130 - 141
  • [50] Parallelization of a Series of Extreme Learning Machine Algorithms Based on Spark
    Liu, Tiantian
    Fang, Zhiyi
    Zhao, Chen
    Zhou, Yingmin
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 1075 - 1079