Self-stabilizing Byzantine Fault-Tolerant Distributed Algorithms for Integrated Circuits
Self-stabilizing Byzantine Fault-Tolerant Distributed Algorithms for Integrated Circuits
Disciplines
Electrical Engineering, Electronics, Information Engineering (20%); Computer Sciences (60%); Mathematics (20%)
Keywords
-
Distributed algorithms,
VLSI Circuits,
Self-stabilization,
Byzantine fault-tolerance
This project proposes to address robustness issues in very-large scale integrated (VLSI) circuits by means of self- stabilizing Byzantine fault-tolerant distributed algorithms. It aims at the development of the foundations of a suitable framework for correctness proofs and performance analyses for such algorithms, and its application to some representative problems in VLSI circuits. Besides permanent failures due to e.g. manufacturing defects and electrical wear-out, transient faults caused e.g. by ionizing particles, cross-talk and temporarily out-of-spec operating conditions are a major source of concern for the dependability of modern VLSI circuits, in particular, in critical applications domains like aerospace. In sharp contrast to classic fault-tolerance techniques like triple- modular redundancy, self-stabilizing algorithms can recover even from unbounded transient faults, and Byzantine- fault-tolerant self-stabilizing algorithms even allow this to happen in the presence of permanent faults. VLSI circuits put unique challenges on Byzantine fault-tolerant self-stabilizing distributed algorithms, which have not been addressed by existing solutions at all. Besides the fact that millions of simple gates that continuously compute their outputs based on (past) inputs do not match the classic distributed state machine abstraction employed in distributed algorithms, VLSI circuits provide only very simple basic computation and communication features. As a consequence, neither existing distributed algorithms nor existing modeling and analysis frameworks can be carried over without substantial modification. Nevertheless, the Byzantine fault-tolerant self-stabilizing clock generation approach for VLSI circuits developed recently by the authors (in collaboration with Danny Dolev/Hebrew University and Christoph Lenzen/Weizmann Institute) revealed that it is principally feasible to do so. The goal of SIC is to develop the foundations of a framework for the rigorous modeling and analysis of Byzantine fault-tolerant self-stabilizing distributed algorithms for VLSI circuits. The work to be performed in SIC involves solving fundamental research questions, in particular, sound and reasonably complete mathematical descriptions of the stabilization process and digital models for metastability generation and propagation, which cannot be solely addressed by engineering solutions in the self-stabilizing context. This work is complemented by devising Byzantine fault-tolerant self-stabilizing solutions for some representative problems in VLSI circuits, like distributed clock generation and reliable broadcast. Using the envisioned modeling and analysis framework, correctness proofs and performance analyses of these algorithms under adequate Byzantine failure models will be provided. The practical implementability of our algorithms will finally be demonstrated by means of simulation and FPGA prototype implementations, which also facilitate some experimental evaluation.
The SIC project addressed algorithmic ways to improve robustness in circuits. Motivation. There is a large body of work on how to decrease probabilities of such faults on circuit-level, e.g., by choice of material and cell design, and on software-level. However, signicantly less was known about algorithmic techniques on circuit-level. Approach. Interestingly, VLSI circuits can be viewed as distributed systems at several levels of abstraction: gates that communicate via binary signals, up to blocks in a network on chip (NoC). The idea of the SIC project was to use techniques from distributed computing to improve robustness of VLSI circuits. Typically, however, processes are highly restricted in communication primitives and computational power in comparison to the processes that are classically studied in software inspired distributed systems. A focus of the SIC project was on self-stabilizing algorithms: a system solves a problem in a self-stabilizing way if, starting from an arbitrarily corrupted state, it eventually solves the problem. Self-stabilizing algorithms are of great interest for missions where state corruption cannot easily be removed by maintenance or manual reset; e.g., in space missions. Achievements. We tried to tackle the above problems from dierent directions and obtained the following results: (i) We proposed distributed computing models that allow for a formal analysis of algorithms, proving their correctness and performance bounds. Specically we designed a fault-tolerant self-stabilizing distributed clock generation algorithm, designed for implementation at gate-level, and proved it correct. (ii) We pointed out problems in previously existing binary-valued circuit delay models. In follow-up works we proposed new delay models that do not suer from these problems. (iii) Metastability is an inevitable third state any bi-stable device, like a ip-op, may be trapped in for an unbounded time. Devices that are metastable can have adverse eects on their successory circuit if they are read while being metastable. The classical workaround is to wait for metastability to resolve with sucient probability before reading. We demonstrated that waiting is not necessary and how to compute with a metastable state. We also studied ways to measure and characterize metastability. (iv) We combined the above techniques to come up with several robust on-chip services: robust distributed clock generation, clock distribution, clock frequency adaptation in presence of voltage droops, communication infrastructure on a chip, pausable clocks, etc. (v) We studied higher-level algorithms that solve consensus-like problems. The focus was on dynamic networks where the network topology may continuously change.
- Technische Universität Wien - 100%
Research Output
- 221 Citations
- 38 Publications
-
2018
Title A Faithful Binary Circuit Model with Adversarial Noise DOI 10.23919/date.2018.8342219 Type Conference Proceeding Abstract Author Függer M Pages 1327-1332 Link Publication -
2018
Title Gracefully degrading consensus and k-set agreement in directed dynamic networks DOI 10.1016/j.tcs.2018.02.019 Type Journal Article Author Biely M Journal Theoretical Computer Science Pages 41-77 Link Publication -
2018
Title Tight Bounds for Asymptotic and Approximate Consensus DOI 10.1145/3212734.3212762 Type Conference Proceeding Abstract Author Függer M Pages 325-334 Link Publication -
2017
Title A Model for the Metastability Delay of Sequential Elements DOI 10.1142/s0218126617400102 Type Journal Article Author Polzer T Journal Journal of Circuits, Systems and Computers Pages 1740010 Link Publication -
2017
Title Metastability Tolerant Computing DOI 10.1109/async.2017.9 Type Conference Proceeding Abstract Author Tarawneh G Pages 25-32 Link Publication -
2017
Title Measuring Metastability with Free-Running Clocks DOI 10.1109/async.2017.18 Type Conference Proceeding Abstract Author Najvir R Pages 18-24 -
2017
Title Measuring Metastability Using a Time-to-Digital Converter DOI 10.1109/ddecs.2017.7934582 Type Conference Proceeding Abstract Author Polzer T Pages 116-121 -
2017
Title New transience bounds for max-plus linear systems DOI 10.1016/j.dam.2016.11.003 Type Journal Article Author Charron-Bost B Journal Discrete Applied Mathematics Pages 83-99 -
2019
Title The Involution Tool for Accurate Digital Timingand Power Analysis DOI 10.1109/patmos.2019.8862165 Type Conference Proceeding Abstract Author Öhlinger D Pages 1-8 Link Publication -
2018
Title Using a Duplex Time-to-Digital Converter for Metastability Characterization of an FPGA DOI 10.1109/ddecs.2018.00032 Type Conference Proceeding Abstract Author Huemer F Pages 141-146 -
2018
Title Fast All-Digital Clock Frequency Adaptation Circuit for Voltage Droop Tolerance DOI 10.1109/async.2018.00025 Type Conference Proceeding Abstract Author Függer M Pages 68-77 Link Publication -
2018
Title Self-Stabilizing High-Speed Communication in Multi-Synchronous GALS Architectures DOI 10.1109/iolts.2018.8474221 Type Conference Proceeding Abstract Author Perner M Pages 157-164 -
2019
Title An Experimental Study of Metastability-Induced Glitching Behavior DOI 10.1142/s0218126619400061 Type Journal Article Author Polzer T Journal Journal of Circuits, Systems and Computers Pages 1940006 -
2021
Title Fast All-Digital Clock Frequency Adaptation Circuit for Voltage Droop Tolerance DOI 10.1109/tcad.2021.3097599 Type Journal Article Author Függer M Journal IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pages 2518-2531 Link Publication -
2019
Title A Faithful Binary Circuit Model DOI 10.1109/tcad.2019.2937748 Type Journal Article Author Függer M Journal IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pages 2784-2797 Link Publication -
2019
Title Transistor-Level Analysis of Dynamic Delay Models DOI 10.1109/async.2019.00019 Type Conference Proceeding Abstract Author Maier J Pages 76-85 Link Publication -
2018
Title Metastability-Containing Circuits DOI 10.1109/tc.2018.2808185 Type Journal Article Author Friedrichs S Journal IEEE Transactions on Computers Pages 1167-1183 Link Publication -
2018
Title Refined metastability characterization using a time-to-digital converter DOI 10.1016/j.microrel.2017.11.017 Type Journal Article Author Polzer T Journal Microelectronics Reliability Pages 91-99 Link Publication -
2014
Title Rigorously modeling self-stabilizing fault-tolerant circuits: An ultra-robust clocking scheme for systems-on-chip DOI 10.1016/j.jcss.2014.01.001 Type Journal Article Author Dolev D Journal Journal of Computer and System Sciences Pages 860-900 Link Publication -
2017
Title Metastability-Aware Memory-Efficient Time-to-Digital Converters DOI 10.1109/async.2017.12 Type Conference Proceeding Abstract Author Függer M Pages 49-56 Link Publication -
2014
Title Equivalence of Clock Gating and Synchronization with Applicability to GALS Communication DOI 10.1109/patmos.2014.6951873 Type Conference Proceeding Abstract Author Najvirt R Pages 1-8 -
2014
Title Fault-tolerant algorithms for tick-generation in asynchronous logic DOI 10.1145/2560561 Type Journal Article Author Dolev D Journal Journal of the ACM (JACM) Pages 1-74 Link Publication -
2020
Title On the radius of nonsplit graphs and information dissemination in dynamic networks DOI 10.1016/j.dam.2020.02.013 Type Journal Article Author Függer M Journal Discrete Applied Mathematics Pages 257-264 Link Publication -
2016
Title Fast consensus under eventually stabilizing message adversaries DOI 10.1145/2833312.2833323 Type Conference Proceeding Abstract Author Schwarz M Pages 1-10 Link Publication -
2016
Title Does Cascading Schmitt-Trigger Stages Improve the Metastable Behavior? DOI 10.1109/dsd.2016.56 Type Conference Proceeding Abstract Author Steininger A Pages 372-379 Link Publication -
2016
Title The Metastable Behavior of a Schmitt-Trigger DOI 10.1109/async.2016.19 Type Conference Proceeding Abstract Author Steininger A Pages 57-37 Link Publication -
2015
Title Experimental Validation of a Faithful Binary Circuit Model DOI 10.1145/2742060.2742081 Type Conference Proceeding Abstract Author Najvirt R Pages 355-360 Link Publication -
2015
Title Upper Miocene endemic lacustrine gastropod fauna of the Turiec Basin: addressing taxonomic, paleobiogeographic and stratigraphic issues DOI 10.1515/geoca-2015-0016 Type Journal Article Author Neubauer T Journal Geologica Carpathica Pages 139-156 Link Publication -
2016
Title HEX: Scaling honeycombs is easier than scaling clock trees DOI 10.1016/j.jcss.2016.03.001 Type Journal Article Author Dolev D Journal Journal of Computer and System Sciences Pages 929-956 Link Publication -
2015
Title Towards binary circuit models that faithfully capture physical solvability DOI 10.7873/date.2015.0799 Type Conference Proceeding Abstract Author Fugger M Pages 1455-1460 Link Publication -
2015
Title A Versatile and Reliable Glitch Filter for Clocks DOI 10.1109/patmos.2015.7347599 Type Conference Proceeding Abstract Author Najvirt R Pages 140-147 -
2015
Title A Pausible Clock with Crystal Oscillator Accuracy DOI 10.1109/ecctd.2015.7300051 Type Conference Proceeding Abstract Author Najvirt R Pages 1-4 -
2015
Title How to Synchronize a Pausible Clock to a Reference DOI 10.1109/async.2015.10 Type Conference Proceeding Abstract Author Najvirt R Pages 9-16 -
2015
Title Gracefully Degrading Consensus and k-Set Agreement in Directed Dynamic Networks DOI 10.1007/978-3-319-26850-7_8 Type Book Chapter Author Biely M Publisher Springer Nature Pages 109-124 -
2015
Title On the Appropriate Handling of Metastable Voltages in FPGAs DOI 10.1142/s021812661640020x Type Journal Article Author Polzer T Journal Journal of Circuits, Systems and Computers Pages 1640020 Link Publication -
2015
Title The effect of forgetting on the performance of a synchronizer DOI 10.1016/j.peva.2015.08.002 Type Journal Article Author Függer M Journal Performance Evaluation Pages 1-16 Link Publication -
2015
Title Time Complexity of Link Reversal Routing DOI 10.1145/2644815 Type Journal Article Author Charron-Bost B Journal ACM Transactions on Algorithms (TALG) Pages 1-39 Link Publication -
2016
Title Unfaithful Glitch Propagation in Existing Binary Circuit Models DOI 10.1109/tc.2015.2435791 Type Journal Article Author Fugger M Journal IEEE Transactions on Computers Pages 964-978 Link Publication