Reconciling fault-tolerant distributed computing and systems-on-chip

被引：22

作者：

Fuegger, Matthias ^{[1
]}

Schmid, Ulrich ^{[1
]}

机构：

[1] Tech Univ Wien, Embedded Comp Syst Grp E182 2, A-1040 Vienna, Austria

来源：

DISTRIBUTED COMPUTING | 2012年 / 24卷 / 06期

基金：

奥地利科学基金会;

关键词：

Clock synchronization; Fault-tolerant; distributed systems; Modeling approaches; VLSI; CLOCK SYNCHRONIZATION; SOFT ERRORS; DESIGN; IMPOSSIBILITY; ARCHITECTURE; CONSENSUS; CIRCUITS; ISSUES; TRENDS;

D O I：

10.1007/s00446-011-0151-7

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Classic distributed computing abstractions do not match well the reality of digital logic gates, which are the elementary building blocks of Systems-on-Chip (SoCs) and other Very Large Scale Integrated (VLSI) circuits: Massively concurrent, continuous computations undermine the concept of sequential processes executing sequences of atomic zero-time computing steps, and very limited computational resources at gate-level make even simple operations prohibitively costly. In this paper, we introduce a modeling and analysis framework based on continuous computations and zero-bit message channels, and employ this framework for the correctness & performance analysis of a distributed fault-tolerant clocking approach for Systems-on-Chip (SoCs). Starting out from a "classic" distributed Byzantine fault-tolerant tick generation algorithm, we show how to adapt it for direct implementation in clockless digital logic, and rigorously prove its correctness and derive analytic expressions for worst case performance metrics like synchronization precision and clock frequency. Rather than on absolute delay values, both the algorithm's correctness and the achievable synchronization precision depend solely on the ratio of certain path delays. Since these ratios can be mapped directly to placement & routing constraints, there is typically no need for changing the algorithm when migrating to a faster implementation technology and/or when using a slightly different layout in an SoC.

引用

页码：323 / 355

页数：33

共 50 条

[31] Active fault-tolerant system for open distributed computing
Lanka, Rodrigo
Oda, Kentaro
Yoshida, Takaichi
AUTONOMIC AND TRUSTED COMPUTING, PROCEEDINGS, 2006, 4158 : 581 - 590
[32] Fundamentals of fault-tolerant distributed computing in asynchronous environments
Gärtner, FC
ACM COMPUTING SURVEYS, 1999, 31 (01) : 1 - 26
[33] Fault-tolerant distributed mass storage for LHC computing
Wiebalck, A
Breuer, PT
Lindenstruth, V
Steinbeck, TM
CCGRID 2003: 3RD IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, PROCEEDINGS, 2003, : 266 - 273
[34] Spatial Data Locality in Scalable and Fault-tolerant Distributed Spatial Computing Systems
Werner, Martin
BIGSPATIAL 2018: PROCEEDINGS OF THE 7TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON ANALYTICS FOR BIG GEOSPATIAL DATA (BIGSPATIAL-2018), 2018, : 47 - 56
[35] An adaptive programming model for fault-tolerant distributed computing
Gorender, Sergio
Macedo, Raimundo Jose de Araujo
Raynal, Michel
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2007, 4 (01) : 18 - 31
[36] A dynamic fault-tolerant model for open distributed computing
Lanka, Rodrigo
Oda, Kentaro
Najima, Horoki
Yoshida, Takaichi
SEVENTEENTH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, : 25 - +
[37] GRAPH MODEL FOR FAULT-TOLERANT COMPUTING SYSTEMS
HAYES, JP
IEEE TRANSACTIONS ON COMPUTERS, 1976, 25 (09) : 875 - 884
[38] Design of Fault-Tolerant Neuromorphic Computing Systems
Liu, Mengyun
Xia, Lixue
Wang, Yu
Chakrabarty, Krishnendu
2018 23RD IEEE EUROPEAN TEST SYMPOSIUM (ETS), 2018,
[39] FAULT-TOLERANT COMPUTING
TOY, WN
ADVANCES IN COMPUTERS, 1987, 26 : 201 - 279
[40] HARDWARE AND SOFTWARE FOR FAULT-TOLERANT COMPUTING SYSTEMS
SOGOMONYAN, ES
SHAGAEV, IV
AUTOMATION AND REMOTE CONTROL, 1988, 49 (02) : 129 - 151

← 1 2 3 4 5 →