Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data

被引:14
|
作者
Hu, Fei [1 ,2 ]
Xu, Mengchao [1 ,2 ]
Yang, Jingchao [1 ,2 ]
Liang, Yanshou [1 ,2 ]
Cui, Kejin [1 ,2 ]
Little, Michael M. [3 ]
Lynnes, Christopher S. [3 ]
Duffy, Daniel Q. [3 ]
Yang, Chaowei [1 ,2 ]
机构
[1] George Mason Univ, NSF Spatiotemporal Innovat Ctr, Fairfax, VA 22030 USA
[2] George Mason Univ, Dept Geog & GeoInformat Sci, Fairfax, VA 22030 USA
[3] NASA, Goddard Space Flight Ctr, Greenbelt, MD 20771 USA
来源
基金
美国国家科学基金会;
关键词
big data; data container; geospatial raster data management; GIS; SYSTEM; PERFORMANCE;
D O I
10.3390/ijgi7040144
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB) for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a) system design and architecture (distributed architecture, logical data model, physical data model, and data operations); and (b) practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption). Four major conclusions are offered: (1) no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time- and resource-consuming data preprocessing to load data; (2) SciDB, Rasdaman, and MongoDB handle small/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3) SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4) SciDB, Spark, and Hive have better support of user defined functions (UDFs) to extend the system capability.
引用
收藏
页数:22
相关论文
共 50 条
  • [31] Geospatial Big Data: Survey and Challenges
    Wu, Jiayang
    Gan, Wensheng
    Chao, Han-Chieh
    Yu, Philip S.
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 17007 - 17020
  • [32] Geospatial cloud computing and big data
    Yang, Chaowei Phil
    COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2017, 61 : 119 - 119
  • [33] Open Source Solution to Geospatial Infrastructure for DEMETER Observation Data Archives
    Wang, Chaoliang
    Song, Xianfeng
    Xu, Fangzhou
    2010 18TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS, 2010,
  • [34] Topio: An Open-Source Web Platform for Trading Geospatial Data
    Ionescu, Andra
    Patroumpas, Kostas
    Psarakis, Kyriakos
    Chatzigeorgakidis, Georgios
    Collarana, Diego
    Barenscher, Kai
    Skouta, Dimitrios
    Katsifodimos, Asterios
    Athanasiou, Spiros
    WEB ENGINEERING, ICWE 2023, 2023, 13893 : 336 - 351
  • [35] Open Source Based Deployment of Environmental Data into Geospatial Information Infrastructures
    Gil, Jose
    Diaz, Laura
    Granell, Carlos
    Huerta, Joaquin
    INTERNATIONAL JOURNAL OF APPLIED GEOSPATIAL RESEARCH, 2012, 3 (02) : 6 - 23
  • [36] Evaluating Open Source Data Mining Tools for Business
    Almeida, Pedro
    Gruenwald, Le
    Bernardino, Jorge
    DATA: PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON DATA MANAGEMENT TECHNOLOGIES AND APPLICATIONS, 2016, : 87 - 94
  • [37] Geospatial Big Data Platforms: A Comprehensive Review; [Zusammenfassung": Geospatial Big Data Platforms: ein umfassender Überblick]
    Loukili Y.
    Lakhrissi Y.
    Ali S.E.B.
    KN - Journal of Cartography and Geographic Information, 2022, 72 (4) : 293 - 308
  • [38] RASTER VERSUS VECTOR DATA ENCODING AND HANDLING - A COMMENTARY
    MAFFINI, G
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 1987, 53 (10): : 1397 - 1398
  • [39] A Data Management System for Big Geospatial Data Based on Phoenix
    Chen M.
    Li L.
    Xie P.
    Fu S.
    He L.
    Zhou X.
    Li, Longhai (lhli@xidian.edu.cn), 1600, Editorial Board of Medical Journal of Wuhan University (45): : 719 - 727
  • [40] Towards a Geospatial Big Data Platform for Geospatial Information Services
    Shangguan, Boyi
    Yue, Peng
    Cao, Zhipeng
    Wang, Bo
    2019 8TH INTERNATIONAL CONFERENCE ON AGRO-GEOINFORMATICS (AGRO-GEOINFORMATICS), 2019,