On the Timed Analysis of Big-Data Applications

被引：5

作者：

Marconi, Francesco ^{[1
]}

Quattrocchi, Giovanni ^{[1
]}

Baresi, Luciano ^{[1
]}

Bersani, Marcello M. ^{[1
]}

Rossi, Matteo ^{[1
]}

机构：

[1] Politecn Milan, DEIB, Milan, Italy

来源：

NASA FORMAL METHODS, NFM 2018 | 2018年 / 10811卷

基金：

欧盟地平线“2020”;

关键词：

Big-Data Applications; Metric temporal logic; Formal verification; Apache Spark;

D O I：

10.1007/978-3-319-77935-5_22

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

Apache Spark is one of the best-known frameworks for executing big-data batch applications over a cluster of (virtual) machines. Defining the cluster (i.e., the number of machines and CPUs) to attain guarantees on the execution times (deadlines) of the application is indeed a trade-off between the cost of the infrastructure and the time needed to execute the application. Sizing the computational resources, in order to prevent cost overruns, can benefit from the use of formal models as a means to capture the execution time of applications. Our model of Spark applications, based on the CLTLoc logic, is defined by considering the directed acyclic graph around which Spark programs are organized, the number of available CPUs, the number of tasks elaborated by the application, and the average execution times of tasks. If the outcome of the analysis is positive, then the execution is feasible-that is, it can be completed within a given time span. The analysis tool has been implemented on top of the Zot formal verification tool. A preliminary evaluation shows that our model is sufficiently accurate: the formal analysis identifies execution times that are close (the error is less than 10%) to those obtained by actually running the applications.

引用

页码：315 / 332

页数：18

共 50 条

[31] Lessons for big-data projects
Ewan Birney
Nature, 2012, 489 : 49 - 51
[32] SIMD parallel MCMC sampling with applications for big-data Bayesian analytics
Mahani, Alireza S.
Sharabiani, Mansour T. A.
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2015, 88 : 75 - 99
[33] Harmony: An Approach for Geo-distributed Processing of Big-Data Applications
Zhang, Han
Ramapantulu, Lavanya
Teo, Yong Meng
2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 160 - 170
[34] Fine-Grained Dynamic Resource Allocation for Big-Data Applications
Baresi, Luciano
Leva, Alberto
Quattrocchi, Giovanni
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2021, 47 (08) : 1668 - 1682
[35] Stochastic Approximation: From Statistical Origin to Big-Data, Multidisciplinary Applications
Lai, Tze Leung
Yuan, Hongsong
STATISTICAL SCIENCE, 2021, 36 (02) : 291 - 302
[36] Editorial Note: HCI Systems for Big-Data Based Multimedia Applications
Hsieh, Wen-Hsiang
MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (23) : 25159 - 25159
[37] ARE YOU READY FOR BIG DATA? GOVERNANCE IN BIG-DATA RESEARCH
Scheepers, Floortje E.
Deschamps, Peter
JOURNAL OF THE AMERICAN ACADEMY OF CHILD AND ADOLESCENT PSYCHIATRY, 2016, 55 (10): : S309 - S309
[38] Efficient Storage of Big-Data for Real-Time GPS Applications
Akulakrishna, Pavan Kumar
Lakshmi, J.
Nandy, S. K.
2014 IEEE FOURTH INTERNATIONAL CONFERENCE ON BIG DATA AND CLOUD COMPUTING (BDCLOUD), 2014, : 1 - 8
[39] Big-Data Analysis, Cluster Analysis, and Machine-Learning Approaches
Alonso-Betanzos, Amparo
Bolon-Canedo, Veronica
SEX-SPECIFIC ANALYSIS OF CARDIOVASCULAR FUNCTION, 2018, 1065 : 607 - 626
[40] Applying could computing to analysis to the big-data stock system
Chen, Chiu-Chin
Liao, Chia-Chun
BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 36 - 36

← 1 2 3 4 5 →