Evaluating the Open Source Data Containers for Handling Big Geospatial Raster Data

被引:14
|
作者
Hu, Fei [1 ,2 ]
Xu, Mengchao [1 ,2 ]
Yang, Jingchao [1 ,2 ]
Liang, Yanshou [1 ,2 ]
Cui, Kejin [1 ,2 ]
Little, Michael M. [3 ]
Lynnes, Christopher S. [3 ]
Duffy, Daniel Q. [3 ]
Yang, Chaowei [1 ,2 ]
机构
[1] George Mason Univ, NSF Spatiotemporal Innovat Ctr, Fairfax, VA 22030 USA
[2] George Mason Univ, Dept Geog & GeoInformat Sci, Fairfax, VA 22030 USA
[3] NASA, Goddard Space Flight Ctr, Greenbelt, MD 20771 USA
来源
基金
美国国家科学基金会;
关键词
big data; data container; geospatial raster data management; GIS; SYSTEM; PERFORMANCE;
D O I
10.3390/ijgi7040144
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Big geospatial raster data pose a grand challenge to data management technologies for effective big data query and processing. To address these challenges, various big data container solutions have been developed or enhanced to facilitate data storage, retrieval, and analysis. Data containers were also developed or enhanced to handle geospatial data. For example, Rasdaman was developed to handle raster data and GeoSpark/SpatialHadoop were enhanced from Spark/Hadoop to handle vector data. However, there are few studies to systematically compare and evaluate the features and performances of these popular data containers. This paper provides a comprehensive evaluation of six popular data containers (i.e., Rasdaman, SciDB, Spark, ClimateSpark, Hive, and MongoDB) for handling multi-dimensional, array-based geospatial raster datasets. Their architectures, technologies, capabilities, and performance are compared and evaluated from two perspectives: (a) system design and architecture (distributed architecture, logical data model, physical data model, and data operations); and (b) practical use experience and performance (data preprocessing, data uploading, query speed, and resource consumption). Four major conclusions are offered: (1) no data containers, except ClimateSpark, have good support for the HDF data format used in this paper, requiring time- and resource-consuming data preprocessing to load data; (2) SciDB, Rasdaman, and MongoDB handle small/mediate volumes of data query well, whereas Spark and ClimateSpark can handle large volumes of data with stable resource consumption; (3) SciDB and Rasdaman provide mature array-based data operation and analytical functions, while the others lack these functions for users; and (4) SciDB, Spark, and Hive have better support of user defined functions (UDFs) to extend the system capability.
引用
收藏
页数:22
相关论文
共 50 条
  • [1] Geospatial Big Data Management Testing Using Open Source Technology
    Arifin, Solechoel
    Silalahi, Florence Elfriede Sinthauli
    Prayitno, Mugi
    Majid, Nur Kholis
    Amhar, Fahmi
    Gularso, Herjuno
    Mechanisms and Machine Science, 2023, 121 : 29 - 42
  • [2] LINKED OPEN DATA FOR RASTER AND VECTOR GEOSPATIAL INFORMATION PROCESSING
    Arocena, J.
    Lozano, J.
    Quartulli, M.
    Olaizola, I.
    Bermudez, J.
    2015 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2015, : 5023 - 5026
  • [3] An open source framework to add spatial extent and geospatial visibility to Big Data
    Shrestha, Biva
    Devarakonda, Ranjeet
    Palanisamy, Giriprakash
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [4] Geospatial Big Data or Big Geospatial Data: A Bibliometric Review
    Ndu, Chidinma Godsgood
    Shoko, Moreblessings
    SOUTH AFRICAN JOURNAL OF GEOMATICS, 2024, 13 (01): : 158 - 171
  • [5] Developing the Raster Big Data Benchmark: A Comparison of Raster Analysis on Big Data Platforms
    Haynes, David
    Mitchell, Philip
    Shook, Eric
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2020, 9 (11)
  • [6] Big Data Open Source Platforms
    Coimbra de Almeida, Pedro Daniel
    Bernardino, Jorge
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 268 - 275
  • [7] Big Geospatial Data or Geospatial Big Data? A Systematic Narrative Review on the Use of Spatial Data Infrastructures for Big Geospatial Sensing Data in Public Health
    Koh, Keumseok
    Hyder, Ayaz
    Karale, Yogita
    Boulos, Maged N. Kamel
    REMOTE SENSING, 2022, 14 (13)
  • [8] Geospatial big data handling theory and methods: A review and research challenges
    Li, Songnian
    Dragicevic, Suzana
    Castro, Francesc Anton
    Sester, Monika
    Winter, Stephan
    Coltekin, Arzu
    Pettit, Christopher
    Jiang, Bin
    Haworth, James
    Stein, Alfred
    Cheng, Tao
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2016, 115 : 119 - 133
  • [9] Distributed computation of raster data using open source Hadoop
    Liu, Lei
    Yin, Fang
    Feng, Min
    Liu, Rui
    Huazhong Keji Daxue Xuebao (Ziran Kexue Ban)/Journal of Huazhong University of Science and Technology (Natural Science Edition), 2013, 41 (07): : 103 - 108
  • [10] Geospatial data mining for digital raster mapping
    Wylie, Bruce K.
    Pastick, Neal J.
    Picotte, Joshua J.
    Deering, Carol A.
    GISCIENCE & REMOTE SENSING, 2019, 56 (03) : 406 - 429