Distributed Speculative Parallelization using Checkpoint Restart

被引:2
|
作者
Ghoshal, Devarshi [1 ]
Ramkumar, Sreesudhan R. [1 ]
Chauhan, Arun [1 ]
机构
[1] Indiana Univ, Sch Informat & Comp, Bloomington, IN 47405 USA
关键词
Speculative parallelization; clusters; checkpoint restart;
D O I
10.1016/j.procs.2011.04.044
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Speculative software parallelism has gained renewed interest recently as a mechanism to leverage multiple cores on emerging architectures. Two major mechanisms have been used to implement speculation-based parallelism in software, software transactional memory and speculative threads. We propose a third mechanism based on checkpoint restart. With recent developments in checkpoint restart technology this has become an attractive alternative. The approach has the potential advantage of the conceptual simplicity of transactional memory and flexibility of speculative threads. Since many checkpoint restart systems work with large distributed memory programs, this provides an automatic way to perform distributed speculation over clusters. Additionally, since checkpoint restart systems are primarily designed for fault tolerance, using the same system for speculation could provide fault tolerance within speculative execution as well when it is embedded in large-scale applications where fault tolerance is desirable. In this paper we use a series of micro-benchmarks to study the relative performance of a speculative system based on the DMTCP checkpoint restart system and compare it against a thread level speculative system. We highlight the relative merits of each approach and draw some lessons that could be used to guide future developments in speculative systems.
引用
收藏
页码:422 / 431
页数:10
相关论文
共 50 条
  • [1] A Flexible Checkpoint/Restart Model in Distributed Systems
    Bouguerra, Mohamed-Slim
    Gautier, Thierry
    Trystram, Denis
    Vincent, Jean-Marc
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I, 2010, 6067 : 206 - +
  • [2] Speculative parallelization
    Gonzalez-Escribano, Arturo
    Llanos, Diego R.
    COMPUTER, 2006, 39 (12) : 126 - 128
  • [3] A SURVEY OF CHECKPOINT/RESTART TECHNIQUES ON DISTRIBUTED MEMORY SYSTEMS
    Shaiizad, Faisal
    Wittmann, Markus
    Kreutzer, Moritz
    Zeiser, Thomas
    Haler, Ceorc
    Wellein, Gerhahd
    PARALLEL PROCESSING LETTERS, 2013, 23 (04)
  • [4] Checkpoint and restart for distributed components in XCAT3
    Krishnan, S
    Gannon, D
    FIFTH IEEE/ACM INTERNATIONAL WORKSHOP ON GRID COMPUTING, PROCEEDINGS, 2004, : 281 - 288
  • [5] Fastpath Speculative Parallelization
    Spear, Michael F.
    Kelsey, Kirk
    Bai, Tongxin
    Dalessandro, Luke
    Scott, Michael L.
    Ding, Chen
    Wu, Peng
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2010, 5898 : 338 - +
  • [6] Speculative Parallelization on GPGPUs
    Feng, Min
    Gupta, Rajiv
    Bhuyan, Laximi N.
    ACM SIGPLAN NOTICES, 2012, 47 (08) : 293 - 294
  • [7] Docker Container Deployment in Distributed Fog Infrastructures with Checkpoint/Restart
    Ahmed, Arif
    Mohan, Apoorve
    Cooperman, Gene
    Pierre, Guillaume
    2020 8TH IEEE INTERNATIONAL CONFERENCE ON MOBILE CLOUD COMPUTING, SERVICES, AND ENGINEERING (MOBILE CLOUD 2020), 2020, : 55 - 62
  • [8] Transparent checkpoint-restart of distributed applications on commodity clusters
    Laadan, Oren
    Phung, Dan
    Nieh, Jason
    2005 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2006, : 52 - +
  • [9] DCR: A Fully Transparent Checkpoint/Restart Framework for Distributed Systems
    Ma, Can
    Huo, Zhigang
    Cai, Jingnan
    Meng, Dan
    2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, 2009, : 225 - 234
  • [10] Multilevel Checkpoint/Restart for Large Computational Jobs on Distributed Computing Resources
    Gholami, Masoud
    Schintke, Florian
    2019 IEEE 38TH INTERNATIONAL SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS (SRDS 2019), 2019, : 143 - 152