A comparative study of cluster-based Big Data Cube implementations

被引:3
|
作者
Morielo Caetano, Andre Francisco [1 ]
Hirata, Celso Massaki [1 ]
Silva, Rodrigo Rocha [2 ,3 ,4 ]
机构
[1] Inst Tecnol Aeronout, Marechal Eduardo Gomes Sq 50, Sao Jose Dos Campos, Brazil
[2] Fac Tecnol Estado Sao Paulo, Carlos Barattino St 908, Mogi Das Cruzes, SP, Brazil
[3] Univ Coimbra, Paula Souza Ctr, Polo 2 Pinhal Marrocos, Coimbra, Portugal
[4] Univ Coimbra, Ctr Informat & Syst, Dept Informat Engn, Polo 2 Pinhal Marrocos, Coimbra, Portugal
关键词
Datacube; OLAP; Cloud; Big Data; Survey; Distributed; Parallel; COMPUTATION; SPARK; MPI;
D O I
10.1016/j.future.2022.03.024
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Research on Data Cubes scalability is extensive, yet sparse. Scalable design patterns for Data Cube implementations are a trend as the technology shifts from centralized and fully materialized models to distributed and partially materialized ones. The implementations explore cheaper and upgraded hardware in clusters of computer nodes. It is a common understanding that the parallel and distributed hardware enables to handle large amounts of multidimensional data for online analytical processing, up to billions of tuples or more, with increased performance and fault tolerance. However, the number of research works and their heterogeneity may overwhelm new initiatives in this field, as there is little discussion regarding the state-of-the-art and ways for improvement. Moreover, the baseline for comparison in most works is often too limited and requires that the reader crosscheck the information among many articles to identify possible gaps. In order to help identifying these gaps, we analyzed the works on Data Cube scalability and elaborated a comparative study that provides directions for new research on the parallel and distributed implementations of data cubes. We identified some features for comparison that include cube function, implementation technology, cube storage type, and various experiments information. We expect that the features and comparisons help researchers to identify research gaps and pave ways for future works on the field. (C) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:240 / 253
页数:14
相关论文
共 50 条
  • [1] Cluster-based data filtering for manufacturing big data systems
    Li, Yifu
    Deng, Xinwei
    Ba, Shan
    Myers, William R.
    Brenneman, William A.
    Lange, Steve J.
    Zink, Ron
    Jin, Ran
    JOURNAL OF QUALITY TECHNOLOGY, 2022, 54 (03) : 290 - 302
  • [2] Cluster-Based Join for Geographically Distributed Big RDF Data
    Yang, Fan
    Crainiceanu, Adina
    Chen, Zhiyuan
    Needham, Don
    2019 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS 2019), 2019, : 170 - 178
  • [3] Computational Performance Analysis of Cluster-based Technologies for Big Data Analytics
    Khan, Mukhtakj
    Salman
    Iqbal, Nadeem
    2017 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2017, : 280 - 286
  • [4] A comparative study of cluster-based methods at finite strain
    Yang, Yang
    Zhang, Lei
    Tang, Shaoqiang
    ACTA MECHANICA SINICA, 2022, 38 (04)
  • [5] ROLAP implementations of the data cube
    Morfonios, Konstantinos
    Konakas, Stratis
    Ioannidis, Yannis
    Kotsis, Nikolaos
    ACM COMPUTING SURVEYS, 2007, 39 (04)
  • [6] Cluster-based visualisation of marketing data
    Lisboa, PJG
    Patel, S
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 552 - 558
  • [7] Cluster-based analysis of FMRI data
    Heller, Ruth
    Stanley, Damian
    Yekutieli, Daniel
    Rubin, Nava
    Benjamini, Yoav
    NEUROIMAGE, 2006, 33 (02) : 599 - 608
  • [8] Cluster-based data relabelling for classification
    Wan, Huan
    Wang, Hui
    Scotney, Bryan
    Liu, Jun
    Wei, Xin
    INFORMATION SCIENCES, 2023, 648
  • [9] A cluster-based data deduplication technology
    Tseng, Chuan-Mu
    Ciou, Jheng-Rong
    Liu, Tzong-Jye
    2014 SECOND INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2014, : 226 - 230
  • [10] Cluster-Based Data Oriented Hashing
    Chafik, Sanaa
    Daoudi, Imane
    El Yacoubi, Mounim A.
    El Ouardi, Hamid
    PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 1037 - 1043