Modelling Fault-tolerant Asynchronous Logic (FATAL)
Modelling Fault-tolerant Asynchronous Logic (FATAL)
Disciplines
Electrical Engineering, Electronics, Information Engineering (40%); Computer Sciences (50%); Physics, Astronomy (10%)
Keywords
-
Fault-tolerant distributed algorithms,
Dependable VLSI,
Model-driven deign and analysis,
Asynchronous digital circuits,
Radiation failures,
Metastability
The aim of the FATAL project is the development of the mathematical/formal foundations of a framework for the hierarchical modeling and analysis of fault-tolerant asynchronous VLSI circuits, using fault-tolerant distributed algorithms knowledge in conjunction with the experimental assessment of both radiation-induced failures and metastability in modern VLSI technology. FATAL is a joint project between the Institut für Technische Informatik and the Institut für Elektrische Mess- und Schaltungstechnik at TU Wien. Designing VLSI circuits, which nowadays accommodate billions on transistors operating at GHz clock speeds, is becoming more and more difficult: In addition to managing the ever increasing functional complexity, chip designers are also confronted with increased manufacturing cost and decreasing yield, the "erosion" of the convenient synchronous abstraction, power dissipation problems, and increasing failure rates. As a consequence, modern VLSI chips are increasingly considered as more or less loosely-coupled systems of interacting subsystems - the advent of Systems-on-Chip. Such devices, however, have much in common with the loosely-coupled distributed computing systems that have been studied by the fault-tolerant distributed algorithms community for decades. In the course of the latter research, a wealth of different computing and failure models, algorithms & protocols, and theoretical results regarding solvability of problems and achievable performance have been established. Some recent work confirms that part of this knowledge can indeed be applied successfully in the VLSI context. A key feature of FATAL is the support of model/specification-based design and analysis: Both composition of existing circuits/specifications (bottom-up approach) and decomposition of a higher-level circuit/specification into lower-level ones (top-down approach) will be supported. Particular emphasis will be put on a detailed modeling of the relation between circuits and their environment. To master the proof complexity, these features will be accompanied by hierarchical correctness proofs and performance analysis techniques. In sharp contrast to current practice in VLSI design, which considers failures as something exceptional that occurs with (very) small probability, FATAL will be based on the deterministic approach usually employed in distributed algorithms research, where failures are considered part of normal operation: A circuit will be described by the set of all possible behaviors, even if such a behavior is extremely unlikely to be encountered in practice. Consequently, our approach facilitates deterministic correctness proofs and worst case performance analyses (probabilities can be assigned to behaviors later on, however). Among the challenges that must be met are the need for incorporating a continuous notion of time, the very fine-grained concurrency caused by a huge number of asynchronously computing logic gates, and the severe resource limitations encountered in VLSI circuits, which make even basic operations like sending a message or adding up two numbers very expensive. A pivotal part of FATAL is the definition of suitable (that is, realistic but easy to handle) failure models. Our goal is the development of a hierarchy of failure models of increasing severity, which can be used to describe the observable behavior of (faulty) circuits at a reasonably high level of abstraction. Identifying such models requires an in-depth exploration of the observable failures in modern VLSI chips. For "classic" sources of errors, there is a huge body of work to rely upon. This is not the case for radiation-induced errors, however, which are increasingly dominating the failure rate of deep submicron VLSI circuits. The same is true for metastability phenomena, which are particularly dangerous for fault-tolerant asynchronous circuits: Metastability can overcome any error containment boundary and, hence, invalidate any architectural dependability concept and correctness proof. FATAL will hence include a systematic experimental evaluation of both radiation-induced failures and metastability. Two custom VLSI circuits will be specifically designed and manufactured for this purpose, which poses unique technological challenges: Both advanced analog and digital circuitry must be integrated on the same chip, which creates intricate analog design problems and prohibits the usage of digital standard libraries. Moreover, the design must be accompanied with very accurate simulation models, which rules out push-button design flows. Finally, both radiation-hardened and radiation-sensitive devices must be integrated on the same chip, which is definitely a non-standard requirement.
The aim of the FATAL project was the development of the foundations of a framework for the hierarchical modeling and analysis of fault-tolerant asynchronous very-large scale integrated (VLSI) circuits, in conjunction with the experimental study of radiation-induced failures and metastability. FATAL was a joint project of the Institute of Computer Engineering (E182) and the Institute of Electrodynamics, Microwave and Circuit Engineering (E354) at TU Vienna.Modern VLSI chips are increasingly considered as more or less loosely-coupled systems of interacting subsystems (Systems-on-Chip), which have much in common with the distributed computing systems that have been studied by the fault-tolerant distributed systems community for decades. In FATAL, we utilized/adapted some of the existing computing and failure models, algorithms & protocols, and theoretical results regarding solvability of problems and achievable performance in the VLSI context, and created specifically tailored new instances where needed. The major accomplishments of FATAL are:A novel continuous-time, discrete-value modeling and analysis framework specifically designed for fault-tolerant asynchronous circuits. Rather than on discrete zero-time state transitions, it is based on continuous computations, supports hierarchical composition and decomposition of models and implementations, and modular correctness proofs at low abstraction levels. In sharp contrast to the current practice in VLSI design, failures are considered part of normal operation, and formally captured by allowing the behavior of a faulty circuit to deviate from its specification, according to some failure model.Given that single-event transients (SETs), caused by ionizing particles hitting the transistors of a VLSI circuit, are the dominant source of errors in nanometer technology, substantial efforts in FATAL were devoted to SET measurement and modeling activities: Specifically designed on-chip analog sense amplifiers were used for the low-intrusive measurement of the detailed SET voltage pulse shapes occurring in typical digital target circuits (such as inverters and Muller C-gates) in micro-beam irradiation experiments. The results were used for calibrating a detailed physical circuit simulation and a double-exponential SET-injection-based analog simulation model. The latter has been used for the validation of the design of a comprehensive digital SET long-term monitoring ASIC.An inevitable problem in asynchronous digital circuits is metastability, which originates in the inability to always determine the precise order of state transitions occurring in different parts of a circuit. In case of a state-holding device like a memory cell, this could lead to intermediate-valued states or fast oscillations. A substantial part of FATAL has been devoted to the experimental evaluation of metastability generation in modern VLSI technology, as well as to identifying ways of incorporating it in our all-digital FATAL modeling and analysis framework.We are happy to say that we made substantial scientific progress in all these major areas in FATAL, and will be able to continue our efforts in two recently granted FWF projects.
- Horst Zimmermann, Technische Universität Wien , associated research partner
Research Output
- 289 Citations
- 63 Publications
-
2022
Title Hrip1 enhances tomato resistance to yellow leaf curl virus by manipulating the phenylpropanoid biosynthesis and plant hormone pathway DOI 10.1007/s13205-022-03426-6 Type Journal Article Author Dong Y Journal 3 Biotech Pages 11 Link Publication -
2012
Title Architecture and design analysis of a digital single-event transient/upset measurement chip. Type Conference Proceeding Abstract Author Schmid U Et Al -
2012
Title Efficient radiation-hardening of a Muller C-element. Type Conference Proceeding Abstract Author Steininger A Conference Proceedings 2012 Single Event Effects Symposium, April -
2012
Title Designing Robust GALS Circuits with Triple Modular Redundancy DOI 10.1109/edcc.2012.25 Type Conference Proceeding Abstract Author Lechner J Pages 227-236 -
2012
Title Architecture and Design Analysis of a Digital Single-Event Transient/Upset Measurement Chip DOI 10.1109/dsd.2012.26 Type Conference Proceeding Abstract Author Veeravalli V Pages 8-17 -
2012
Title Radiation-Tolerant Combinational Gates - An Implementation Based Comparison DOI 10.1109/ddecs.2012.6219036 Type Conference Proceeding Abstract Author Veeravalli V Pages 115-120 -
2012
Title Protecting Pipelined Asynchronous Communication Channels Against Single Event Upsets DOI 10.1109/iccd.2012.6378683 Type Conference Proceeding Abstract Author Lechner J Pages 480-481 -
2012
Title A Robust Asynchronous Interfacing Scheme with Four-Phase Dual-Rail Coding DOI 10.1109/acsd.2012.29 Type Conference Proceeding Abstract Author Lechner J Pages 122-131 -
2012
Title Analogously tunable delay line for on-chip measurements with sub-picosecond resolution in 90 nm CMOS DOI 10.1049/el.2012.0371 Type Journal Article Author Schidl S Journal Electronics Letters Pages 910-911 -
2012
Title Position dependent measurement of single event transient voltage pulse shapes under heavy ion irradiation DOI 10.1049/el.2011.3767 Type Journal Article Author Schweiger K Journal Electronics Letters Pages 171-172 -
2012
Title Pulse Shape Measurements by On-Chip Sense Amplifiers of Single Event Transients Propagating Through a 90 nm Bulk CMOS Inverter Chain DOI 10.1109/tns.2012.2223233 Type Journal Article Author Hofbauer M Journal IEEE Transactions on Nuclear Science Pages 2778-2784 -
2012
Title Projekt FATAL, 2012. Type Journal Article Author Hofbauer M Journal Vienna Scientific Cluster Brochure 2012 -
2018
Title A Faithful Binary Circuit Model with Adversarial Noise DOI 10.23919/date.2018.8342219 Type Conference Proceeding Abstract Author Függer M Pages 1327-1332 Link Publication -
2016
Title Unfaithful Glitch Propagation in Existing Binary Circuit Models DOI 10.1109/tc.2015.2435791 Type Journal Article Author Fugger M Journal IEEE Transactions on Computers Pages 964-978 Link Publication -
2015
Title Building reliable systems-on-chip in nanoscale technologies DOI 10.1007/s00502-015-0319-0 Type Journal Article Author Steininger A Journal e & i Elektrotechnik und Informationstechnik Pages 301-306 -
2011
Title Brief announcement DOI 10.1145/1989493.1989510 Type Conference Proceeding Abstract Author Charron-Bost B Pages 129-130 -
2011
Title Full Reversal Routing as a Linear Dynamical System DOI 10.1007/978-3-642-22212-2_10 Type Book Chapter Author Charron-Bost B Publisher Springer Nature Pages 101-112 -
2010
Title How to Speed-up Fault-Tolerant Clock Generation in VLSI Systems-on-Chip via Pipelining DOI 10.1109/edcc.2010.35 Type Conference Proceeding Abstract Author Függer M Pages 230-239 -
2009
Title Brief announcement DOI 10.1145/1582716.1582762 Type Conference Proceeding Abstract Author Dielacher A Pages 276-277 -
2009
Title Brief announcement: How to speed-up fault-tolerant clock generation in VLSI systems-on-chip via pipelining. Type Conference Proceeding Abstract Author Dielacher A -
2009
Title A Metastability-Free Multi-synchronous Communication Scheme for SoCs DOI 10.1007/978-3-642-05118-0_40 Type Book Chapter Author Polzer T Publisher Springer Nature Pages 578-592 -
2009
Title On the stability and robustness of non-synchronous circuits with timing loops. Type Conference Proceeding Abstract Author Fuegger M Conference 3rd Workshop on Dependable and Secure Nanocomputing, Jun. 2009 -
2009
Title 08371 Summary - fault-tolerant distributed algorithms on VLSI chips. Type Conference Proceeding Abstract Author Charron-Bost B Conference Dagstuhl Seminar Proceedings, Dagstuhl, Germany, 2009. Schloss Dagstuhl -Leibniz-Zentrum fuer Informatik, Germany -
2009
Title On the Threat of Metastability in an Asynchronous Fault-Tolerant Clock Generation Scheme DOI 10.1109/async.2009.15 Type Conference Proceeding Abstract Author Fuchs G Pages 127-136 -
2009
Title How to speedup fault-tolerant clock generation in VLSI systems-on-chip via pipelining. Type Journal Article Author Dielacher A Journal Research Report 15/2009 -
2012
Title Reliable gateway for radiation experiments on a VLSI chip. Type Conference Proceeding Abstract Author Fritz B Conference Proceedings Austrochip -
2011
Title Reconciling fault-tolerant distributed computing and systems-on-chip DOI 10.1007/s00446-011-0151-7 Type Journal Article Author Függer M Journal Distributed Computing Pages 323-355 Link Publication -
2011
Title Single event effect measurements in 90nm CMOS circuits at the microbeam facility for the project FATAL. Type Journal Article Author Hofbauer M Journal GSI Scientific Report 2011 -
2011
Title Brief announcement: Full reversal routing as a linear dynamical system. Type Conference Proceeding Abstract Author Chorron-Bost B -
2011
Title Partial is Full DOI 10.1007/978-3-642-22212-2_11 Type Book Chapter Author Charron-Bost B Publisher Springer Nature Pages 113-124 -
2011
Title On the Performance of a Retransmission-Based Synchronizer DOI 10.1007/978-3-642-22212-2_21 Type Book Chapter Author Nowak T Publisher Springer Nature Pages 234-245 -
2011
Title Fault-Tolerant Algorithms for Tick-Generation in Asynchronous Logic: Robust Pulse Generation DOI 10.1007/978-3-642-24550-3_14 Type Book Chapter Author Dolev D Publisher Springer Nature Pages 163-177 -
2014
Title Rigorously modeling self-stabilizing fault-tolerant circuits: An ultra-robust clocking scheme for systems-on-chip DOI 10.1016/j.jcss.2014.01.001 Type Journal Article Author Dolev D Journal Journal of Computer and System Sciences Pages 860-900 Link Publication -
2015
Title Time Complexity of Link Reversal Routing DOI 10.1145/2644815 Type Journal Article Author Charron-Bost B Journal ACM Transactions on Algorithms (TALG) Pages 1-39 Link Publication -
2016
Title HEX: Scaling honeycombs is easier than scaling clock trees DOI 10.1016/j.jcss.2016.03.001 Type Journal Article Author Dolev D Journal Journal of Computer and System Sciences Pages 929-956 Link Publication -
2014
Title Measuring SET Pulsewidths in Logic Gates using Digital Infrastructure DOI 10.1109/isqed.2014.6783331 Type Conference Proceeding Abstract Author Veeravalli V Pages 236-242 -
2014
Title Architecture for Monitoring SET Propagation in 16-bit Sklansky Adder DOI 10.1109/isqed.2014.6783354 Type Conference Proceeding Abstract Author Veeravalli V Pages 412-419 -
2014
Title Protection of Muller-Pipelines from Transient Faults DOI 10.1109/isqed.2014.6783315 Type Conference Proceeding Abstract Author Naqvi S Pages 123-131 -
2013
Title HEX DOI 10.1145/2486159.2486192 Type Conference Proceeding Abstract Author Dolev D Pages 164-175 -
2013
Title SET Propagation in Micropipelines DOI 10.1109/patmos.2013.6662165 Type Conference Proceeding Abstract Author Polzer T Pages 126-133 -
2013
Title Muller C-Element Metastability Containment DOI 10.1007/978-3-642-36157-9_11 Type Book Chapter Author Polzer T Publisher Springer Nature Pages 103-112 -
2013
Title Supply Voltage Dependent On-Chip Single-Event Transient Pulse Shape Measurements in 90-nm Bulk CMOS Under Alpha Irradiation DOI 10.1109/tns.2013.2245679 Type Journal Article Author Hofbauer M Journal IEEE Transactions on Nuclear Science Pages 2640-2646 -
2013
Title Efficient Construction of Global Time in SoCs despite Arbitrary Faults DOI 10.1109/dsd.2013.97 Type Conference Proceeding Abstract Author Lenzen C Pages 142-151 Link Publication -
2013
Title HEX: Scaling Honeycombs is Easier than Scaling Clock Trees. Type Conference Proceeding Abstract Author Doelv D -
2013
Title Unfaithful Glitch Propagation in Existing Binary Circuit Models DOI 10.1109/async.2013.9 Type Conference Proceeding Abstract Author Függer M Pages 191-199 Link Publication -
2013
Title Modular Redundancy in a GALS System using Asynchronous Recovery Links DOI 10.1109/async.2013.23 Type Conference Proceeding Abstract Author Lechner J Pages 23-30 -
2013
Title An Approach for Efficient Metastability Characterization of FPGAs through the Designer DOI 10.1109/async.2013.14 Type Conference Proceeding Abstract Author Polzer T Pages 174-182 -
2013
Title Performance of Radiation Hardening Techniques under Voltage and Temperature Variations DOI 10.1109/aero.2013.6497390 Type Conference Proceeding Abstract Author Veeravalli V Pages 1-12 -
2013
Title Digital Late-Transition Metastability Simulation Model DOI 10.1109/dsd.2013.21 Type Conference Proceeding Abstract Author Polzer T Pages 121-128 -
2012
Title LFSR implementation using C-elements. Type Conference Proceeding Abstract Author Steininger A Conference Proceedings MEMICS 2012 -
2012
Title Designing robust GALS circuits with triple modular redundancy. Type Conference Proceeding Abstract Author Lechner J -
2012
Title Messung der Auswirkungen von ionisierender Strahlung auf 90 nm CMOS Schaltungen. Type Journal Article Author Giesen U Et Al Journal Technical report, Physikalisch Technische Bundesanstalt -
2012
Title Brief Announcement: The Degrading Effect of Forgetting on a Synchronizer DOI 10.1007/978-3-642-33536-5_9 Type Book Chapter Author Függer M Publisher Springer Nature Pages 90-91 -
2012
Title A robust asynchronous interfacing scheme with four-phase dual-rail coding. Type Conference Proceeding Abstract Author Lechner J -
2012
Title Monitoring single event transient effects in dynamic mode. Type Conference Proceeding Abstract Author Steininger A Conference 1st Workshop on Manufacturable and Dependable Multicore Architectures at Nanoscale (MEDIAN'12) -
2012
Title Towards self-stabilizing byzantine fault-tolerant clock generation in systems-on-chip. Type Conference Proceeding Abstract Author Dolev D Conference 2012 NITRD National Workshop on the New Clockwork for Time-Critical Systems, October 25-26, Baltimore (USA) -
2013
Title Unfaithful Glitch Propagation in Existing Binary Circuit Models DOI 10.48550/arxiv.1311.1423 Type Preprint Author Függer M -
2013
Title On the performance of a retransmission-based synchronizer DOI 10.1016/j.tcs.2012.04.035 Type Journal Article Author Nowak T Journal Theoretical Computer Science Pages 25-39 Link Publication -
2013
Title Particle strikes in C-gates: Relevance of SET shapes. Type Conference Proceeding Abstract Author Najvirt R Conference Proceedings 2nd Workshop on Manufacturable and Dependable Multicore Architectures at Nanoscale (MEDIAN'13) -
2013
Title Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures, SPAA '13 DOI 10.1145/2486159 Type Journal Article -
2013
Title An infrastructure for accurate characterization of single-event transients in digital circuits DOI 10.1016/j.micpro.2013.04.011 Type Journal Article Author Veeravalli V Journal Microprocessors and Microsystems Pages 772-791 Link Publication -
2013
Title Metastability Characterization for Muller C-Elements DOI 10.1109/patmos.2013.6662170 Type Conference Proceeding Abstract Author Polzer T Pages 164-171 -
2013
Title Byzantine self-stabilizing clock distribution with HEX: Implementation, simulation, clock multiplication. Type Conference Proceeding Abstract Author Lenzen C Et Al Conference Proc. Sixth IARIA International Conference on Dependability (DEPEND'13)