Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data

被引:14
|
作者
Hu, Fei [1 ,2 ]
Xu, Mengchao [1 ,2 ]
Yang, Jingchao [1 ,2 ]
Liang, Yanshou [1 ,2 ]
Cui, Kejin [1 ,2 ]
Little, Michael M. [3 ]
Lynnes, Christopher S. [3 ]
Duffy, Daniel Q. [3 ]
Yang, Chaowei [1 ,2 ]
机构
[1] George Mason Univ, NSF Spatiotemporal Innovat Ctr, Fairfax, VA 22030 USA
[2] George Mason Univ, Dept Geog & GeoInformat Sci, Fairfax, VA 22030 USA
[3] NASA, Goddard Space Flight Ctr, Greenbelt, MD 20771 USA
来源
基金
美国国家科学基金会;
关键词
big data; data container; geospatial raster data management; GIS; SYSTEM; PERFORMANCE;
D O I
10.3390/ijgi7040144
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB) for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a) system design and architecture (distributed architecture, logical data model, physical data model, and data operations); and (b) practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption). Four major conclusions are offered: (1) no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time- and resource-consuming data preprocessing to load data; (2) SciDB, Rasdaman, and MongoDB handle small/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3) SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4) SciDB, Spark, and Hive have better support of user defined functions (UDFs) to extend the system capability.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Sharing Environmental Geospatial Data Through an Open Source WebGIS
    Caradonna, Grazia
    Figorito, Benedetto
    Tarantino, Eufemia
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2015, PT III, 2015, 9157 : 556 - 565
  • [22] An open source web application for distributed geospatial data exploration
    Patrick A. Curry
    Nils Moosdorf
    Scientific Data, 6
  • [23] Multidimensional Visualization and Processing of Big Open Urban Geospatial Data on the Web
    Kilsedar, Candan Eylul
    Brovelli, Maria Antonia
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2020, 9 (07)
  • [24] A hierarchical indexing strategy for optimizing Apache Spark with HDFS to efficiently query big geospatial raster data
    Hu, Fei
    Yang, Chaowei
    Jiang, Yongyao
    Li, Yun
    Song, Weiwei
    Duffy, Daniel Q.
    Schnase, John L.
    Lee, Tsengdar
    INTERNATIONAL JOURNAL OF DIGITAL EARTH, 2020, 13 (03) : 410 - 428
  • [25] Evaluation of Data Management Systems for Geospatial Big Data
    Amirian, Pouria
    Basiri, Anahid
    Winstanley, Adam
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2014, PT V, 2014, 8583 : 678 - +
  • [26] Scalable Transformation of Big Geospatial Data into Linked Data
    Mandilaras, George
    Koubarakis, Manolis
    SEMANTIC WEB - ISWC 2021, 2021, 12922 : 480 - 495
  • [27] Evaluating Urban Vitality Based on Geospatial Big Data in Xiamen Island, China
    Chen, Shili
    Lang, Wei
    Li, Xun
    SAGE OPEN, 2022, 12 (04):
  • [28] BIG DATA, DATA SCIENCE AND THEIR CONTRIBUTIONS TO THE DEVELOPMENT OF THE USE OF OPEN SOURCE INTELLIGENCE
    dos Passos, Danielle Sandler
    SISTEMAS & GESTAO, 2016, 11 (04): : 392 - 396
  • [29] ADVANCES IN FUSION OF BIG GEOSPATIAL DATA
    Percivall, George
    Taylor, Trevor
    2017 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2017, : 380 - 383
  • [30] Geospatial Big Data: Challenges and Opportunities
    Lee, Jae-Gil
    Kang, Minseo
    BIG DATA RESEARCH, 2015, 2 (02) : 74 - 81