A comparative study of cluster-based Big Data Cube implementations

被引：3

作者：

Morielo Caetano, Andre Francisco ^{[1
]}

Hirata, Celso Massaki ^{[1
]}

Silva, Rodrigo Rocha ^{[2
,3
,4
]}

机构：

[1] Inst Tecnol Aeronout, Marechal Eduardo Gomes Sq 50, Sao Jose Dos Campos, Brazil

[2] Fac Tecnol Estado Sao Paulo, Carlos Barattino St 908, Mogi Das Cruzes, SP, Brazil

[3] Univ Coimbra, Paula Souza Ctr, Polo 2 Pinhal Marrocos, Coimbra, Portugal

[4] Univ Coimbra, Ctr Informat & Syst, Dept Informat Engn, Polo 2 Pinhal Marrocos, Coimbra, Portugal

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2022年 / 133卷

关键词：

Datacube; OLAP; Cloud; Big Data; Survey; Distributed; Parallel; COMPUTATION; SPARK; MPI;

D O I：

10.1016/j.future.2022.03.024

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Research on Data Cubes scalability is extensive, yet sparse. Scalable design patterns for Data Cube implementations are a trend as the technology shifts from centralized and fully materialized models to distributed and partially materialized ones. The implementations explore cheaper and upgraded hardware in clusters of computer nodes. It is a common understanding that the parallel and distributed hardware enables to handle large amounts of multidimensional data for online analytical processing, up to billions of tuples or more, with increased performance and fault tolerance. However, the number of research works and their heterogeneity may overwhelm new initiatives in this field, as there is little discussion regarding the state-of-the-art and ways for improvement. Moreover, the baseline for comparison in most works is often too limited and requires that the reader crosscheck the information among many articles to identify possible gaps. In order to help identifying these gaps, we analyzed the works on Data Cube scalability and elaborated a comparative study that provides directions for new research on the parallel and distributed implementations of data cubes. We identified some features for comparison that include cube function, implementation technology, cube storage type, and various experiments information. We expect that the features and comparisons help researchers to identify research gaps and pave ways for future works on the field. (C) 2022 Elsevier B.V. All rights reserved.

引用

页码：240 / 253

页数：14

共 50 条

[1] Cluster-based data filtering for manufacturing big data systems
Li, Yifu
Deng, Xinwei
Ba, Shan
Myers, William R.
Brenneman, William A.
Lange, Steve J.
Zink, Ron
Jin, Ran
JOURNAL OF QUALITY TECHNOLOGY, 2022, 54 (03) : 290 - 302
[2] Cluster-Based Join for Geographically Distributed Big RDF Data
Yang, Fan
Crainiceanu, Adina
Chen, Zhiyuan
Needham, Don
2019 IEEE INTERNATIONAL CONGRESS ON BIG DATA (IEEE BIGDATA CONGRESS 2019), 2019, : 170 - 178
[3] Computational Performance Analysis of Cluster-based Technologies for Big Data Analytics
Khan, Mukhtakj
Salman
Iqbal, Nadeem
2017 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2017, : 280 - 286
[4] A comparative study of cluster-based methods at finite strain
Yang, Yang
Zhang, Lei
Tang, Shaoqiang
ACTA MECHANICA SINICA, 2022, 38 (04)
[5] ROLAP implementations of the data cube
Morfonios, Konstantinos
Konakas, Stratis
Ioannidis, Yannis
Kotsis, Nikolaos
ACM COMPUTING SURVEYS, 2007, 39 (04)
[6] Cluster-based visualisation of marketing data
Lisboa, PJG
Patel, S
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING IDEAL 2004, PROCEEDINGS, 2004, 3177 : 552 - 558
[7] Cluster-based analysis of FMRI data
Heller, Ruth
Stanley, Damian
Yekutieli, Daniel
Rubin, Nava
Benjamini, Yoav
NEUROIMAGE, 2006, 33 (02) : 599 - 608
[8] Cluster-based data relabelling for classification
Wan, Huan
Wang, Hui
Scotney, Bryan
Liu, Jun
Wei, Xin
INFORMATION SCIENCES, 2023, 648
[9] A cluster-based data deduplication technology
Tseng, Chuan-Mu
Ciou, Jheng-Rong
Liu, Tzong-Jye
2014 SECOND INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR), 2014, : 226 - 230
[10] Cluster-Based Data Oriented Hashing
Chafik, Sanaa
Daoudi, Imane
El Yacoubi, Mounim A.
El Ouardi, Hamid
PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 1037 - 1043

← 1 2 3 4 5 →