On the Timed Analysis of Big-Data Applications

被引:5
|
作者
Marconi, Francesco [1 ]
Quattrocchi, Giovanni [1 ]
Baresi, Luciano [1 ]
Bersani, Marcello M. [1 ]
Rossi, Matteo [1 ]
机构
[1] Politecn Milan, DEIB, Milan, Italy
来源
NASA FORMAL METHODS, NFM 2018 | 2018年 / 10811卷
基金
欧盟地平线“2020”;
关键词
Big-Data Applications; Metric temporal logic; Formal verification; Apache Spark;
D O I
10.1007/978-3-319-77935-5_22
中图分类号
V [航空、航天];
学科分类号
08 ; 0825 ;
摘要
Apache Spark is one of the best-known frameworks for executing big-data batch applications over a cluster of (virtual) machines. Defining the cluster (i.e., the number of machines and CPUs) to attain guarantees on the execution times (deadlines) of the application is indeed a trade-off between the cost of the infrastructure and the time needed to execute the application. Sizing the computational resources, in order to prevent cost overruns, can benefit from the use of formal models as a means to capture the execution time of applications. Our model of Spark applications, based on the CLTLoc logic, is defined by considering the directed acyclic graph around which Spark programs are organized, the number of available CPUs, the number of tasks elaborated by the application, and the average execution times of tasks. If the outcome of the analysis is positive, then the execution is feasible-that is, it can be completed within a given time span. The analysis tool has been implemented on top of the Zot formal verification tool. A preliminary evaluation shows that our model is sufficiently accurate: the formal analysis identifies execution times that are close (the error is less than 10%) to those obtained by actually running the applications.
引用
收藏
页码:315 / 332
页数:18
相关论文
共 50 条
  • [41] Solving a Big-Data Problem with GPU: The Network Traffic Analysis
    Barrionuevo, Mercedes
    Lopresti, Mariela
    Miranda, Natalia
    Piccoli, Fabiana
    JOURNAL OF COMPUTER SCIENCE & TECHNOLOGY, 2015, 15 (01): : 30 - 39
  • [42] Social Big-Data Analysis of Particulate Matter, Health, and Society
    Song, Juyoung
    Song, Tae Min
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2019, 16 (19)
  • [43] Voter Privacy and Big-Data Elections
    Judge, Elizabeth F.
    Pal, Michael
    OSGOODE HALL LAW JOURNAL, 2021, 58 (01): : 1 - 55
  • [44] BIG-DATA VISUALIZATION FOR TRANSLATIONAL NEUROTRAUMA
    Nielson, Jessica
    Inoue, Tomoo
    Paquette, Jesse
    Lin, Amity
    Sacramento, Jeffrey
    Liu, Aiwen W.
    Guandique, Cristian F.
    Irvine, Karen-Amanda
    Gensel, John C.
    Beattie, Michael S.
    Bresnahan, Jacqueline C.
    Manley, Geoffrey T.
    Carlsson, Gunnar
    Lum, Pek Yee
    Ferguson, Adam R.
    JOURNAL OF NEUROTRAUMA, 2013, 30 (15) : A61 - A62
  • [46] A Minimax Approach for Classification with Big-data
    Krishnan, R.
    Jagannathan, S.
    Samaranayake, V. A.
    2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 1437 - 1444
  • [47] Persisting big-data: The NoSQL landscape
    Corbellini, Alejandro
    Mateos, Cristian
    Zunino, Alejandro
    Godoy, Daniela
    Schiaffino, Silvia
    INFORMATION SYSTEMS, 2017, 63 : 1 - 23
  • [48] Big-Data Science: Infrastructure Impact
    Monga, Inder
    Prabhat
    PROCEEDINGS OF THE INDIAN NATIONAL SCIENCE ACADEMY, 2018, 84 (02): : 359 - 370
  • [49] Big-Data Clustering with Genetic Algorithm
    Mortezanezhad, Afsaneh
    Daneshifar, Ebrahim
    2019 IEEE 5TH CONFERENCE ON KNOWLEDGE BASED ENGINEERING AND INNOVATION (KBEI 2019), 2019, : 702 - 706
  • [50] NodeMerge: Template Based Efficient Data Reduction For Big-Data Causality Analysis
    Tang, Yutao
    Li, Ding
    Li, Zhichun
    Zhang, Mu
    Jee, Kangkook
    Xiao, Xusheng
    Wu, Zhenyu
    Rhee, Junghwan
    Xu, Fengyuan
    Li, Qun
    PROCEEDINGS OF THE 2018 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'18), 2018, : 1324 - 1337