Efficient spatiotemporal interpolation with spark machine learning

被引：0

作者：

Weitian Tong

Lixin Li

Xiaolu Zhou

Jason Franklin

机构：

[1] Georgia Southern University,Department of Computer Science

[2] Georgia Southern University,Department of Geology and Geography

来源：

Earth Science Informatics | 2019年 / 12卷

关键词：

Spatiotemporal interpolation; Spark; Machine learning; Inverse distance weighting (IDW); k-d tree; Bootstrap aggregating;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

To better assess the relationships between environmental exposures and health outcomes, an appropriate spatiotemporal interpolation is critical. Traditional spatiotemporal interpolation methods either consider the spatial and temporal dimensions separately or incorporate both dimensions simultaneously by simply treating time as another dimension in space. Such interpolation results suffer from relatively low accuracy as the true space-time domain is skewed inappropriately and the distance calculation in such domain is not accurate. We employ the efficient k-d tree structure to store spatiotemporal data and adopt several machine learning methods to learn optimal parameters. To overcome the computational difficulty with large data sets, we implement our method on an efficient cluster computing framework – Apache Spark. Real world PM2.5 data sets are utilized to test our implementation and the experimental results demonstrate the computational power of our method, which significantly outperforms the previous work in terms of both speed and accuracy.

引用

页码：87 / 96

页数：9

共 50 条

[41] A Machine Learning Approach for Predicting Execution Time of Spark Jobs
Mustafa, Sara
Elghandour, Iman
Ismail, Mohamed A.
ALEXANDRIA ENGINEERING JOURNAL, 2018, 57 (04) : 3767 - 3778
[42] Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink
Kamburugamuve, Supun
Wickramasinghe, Pulasthi
Ekanayake, Saliya
Fox, Geoffrey C.
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2018, 32 (01): : 61 - 73
[43] Big Data Machine Learning using Apache Spark MLlib
Assefi, Mehdi
Behravesh, Ehsun
Liu, Guangchi
Tafti, Ahmad P.
2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2017, : 3492 - 3498
[44] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
Ali Mostafaeipour
Amir Jahangard Rafsanjani
Mohammad Ahmadi
Joshuva Arockia Dhanraj
The Journal of Supercomputing, 2021, 77 : 1273 - 1300
[45] SPARK-A Big Data Processing Platform for Machine Learning
Fu, Jian
Sun, Junwei
Wang, Kaiyuan
2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 48 - 51
[46] Automated Deployment of a Spark Cluster with Machine Learning Algorithm Integration
Fernandez, A. M.
Gutierrez-Aviles, D.
Troncoso, A.
Martinez-Alvarez, F.
BIG DATA RESEARCH, 2020, 19-20
[47] Machine Learning Driven Responsible Gaming Framework with Apache Spark
Mijic, Dejan
Varga, Ervin
2017 25TH TELECOMMUNICATION FORUM (TELFOR), 2017, : 796 - 799
[48] Investigating the performance of Hadoop and Spark platforms on machine learning algorithms
Mostafaeipour, Ali
Rafsanjani, Amir Jahangard
Ahmadi, Mohammad
Dhanraj, Joshuva Arockia
JOURNAL OF SUPERCOMPUTING, 2021, 77 (02): : 1273 - 1300
[49] Network Intrusion Detection on Apache Spark with Machine Learning Algorithms
Kurt, Elif Merve
Becerikli, Yasar
ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2018, 2018, 893 : 130 - 141
[50] Parallelization of a Series of Extreme Learning Machine Algorithms Based on Spark
Liu, Tiantian
Fang, Zhiyi
Zhao, Chen
Zhou, Yingmin
2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 1075 - 1079

← 1 2 3 4 5 →