SBCCI 2005
Florianópolis, Brazil
Going Beyond TMR for
Protection Against Multiple Faults
Carlos A. L. Lisbôa Erik Schüler Luigi Carro
Why Multiple Faults ?
• Future technologies (2010 and beyond)
• very small transistors and fewer electrons to form the channel (→ SETs)
• transient pulses due to radiation attack will last longer than the propagation delays of gates
• devices will be more sensitive to the effects of
electromagnetic noise, neutrons and alpha particles
Single Event Upset Origin
1 0 1 0 0 0 0 1
Why Multiple Faults ?
Consequence:
Gates will behave statistically,
producing correct outputs only a
fraction of the time.
Outline
• How to deal with multiple faults ?
• Why beyond TMR ?
• Making TMR (more) reliable
• Dealing with multiple simultaneous faults
• Conclusions
• New paradigm: multiple simultaneous faults
• new fault tolerance techniques will be required (TMR will no longer provide enough protection)
• How to deal with this problem ?
• new materials and manufacturing technologies must be developed
OR
• new design approaches must be taken
How to Deal with Multiple Faults ?
• new design approaches
must be taken (our bet !)
Module 2 correct output
Why Beyond TMR ?
• TMR protects only against single faults in one of the modules
Module 1 correct output
V O T E R
correct output
Module 2 wrong output
Why Beyond TMR ?
• TMR protects only against single faults in one of the modules
Module 1
Module 3
correct output
correct output
V O T E R
correct output
V O T E R
correct output
Why Beyond TMR ?
• When a single fault occurs in the voter circuit, the voter output may be wrong
Module 1
Module 2
correct output
correct output
V O T E R
correct output ?
Why Beyond TMR ?
• When a single fault occurs in the voter circuit, the voter output may be wrong
Module 1
Module 2
Module 3
correct output
correct output correct output
Why Beyond TMR ?
• Single fault injection experiment
• Module: 4x4-bit array multiplier, 96 gates
• Voter: 32 gates
• Total using TMR: 3 x 96 + 32 = 320 gates
• Injected faults: sa-0 and sa-1 (320 of each)
• Injection tool: CACO-PS
Why Beyond TMR ?
• Experimentation:
• injection of single faults with all possible 256 input combinations
Why Beyond TMR ?
• Experimentation:
• injection of single faults with all possible 256 input combinations
• Single fault in one of the 3 modules :
• no propagation to voter output
Why Beyond TMR ?
• Experimentation:
• injection of single faults with all possible 256 input combinations
• Single fault in one of the 3 modules:
• no propagation to voter output
• Single fault in the voter:
• 6 to 24 faults (in 640) propagated to voter output, depending on the input value: 0.9375% to 3.75% !
Making TMR (more) reliable
• Known solutions imply in
• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
Making TMR (more) reliable
• Known solutions imply in
• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
• Proposed solution:
• use TMR to cope with single faults in the modules
Making TMR (more) reliable
• Known solutions imply in
• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
• Proposed solution:
• use TMR to cope with single faults in the modules
• replace the digital voter by an analog voter that
Making TMR (more) reliable
• Known solutions imply in
• area, performance and / or power penalties
• deadlock: how to protect the output generator ?
• Proposed solution:
• use TMR to cope with single faults in the modules
• replace the digital voter by an analog voter that
• uses a comparator to generate the output
• can support some noise, nevertheless producing
The Analog Voter
The Analog Voter
???
The Analog Voter
The Analog Voter
The Analog Voter
The Analog Voter
The Analog Voter
Electrical Simulation: Without Faults
(SPICE and CMOS 0.35 µm)
Minimum Area Comparator
Injection of faults
in the comparator (*)
Electrical Simulation: Multiple Faults
(SPICE and CMOS 0.35 µm)
Montecarlo Parameters Variation
(up to 30%): t
oxand V
tWhy Multiple Faults ?
• with smaller devices
• less electrons to form the channel
• increased sensitivity to noise
Why Multiple Faults ?
• with smaller devices
• less electrons to form the channel
• increased sensitivity to noise
• gates will behave statistically,
producing correct outputs only
a fraction of the time
Why Beyond TMR ?
• When multiple simultaneous faults occur in different modules, the voter output may be wrong (and deemed correct !)
Module 1
Module 2
Module 3
correct output
correct output correct output
V O T E R
correct output
Why Beyond TMR ?
• When multiple simultaneous faults occur in different modules, the voter output may be wrong (and deemed correct !)
V O T E R
correct output
Module 1 wrong output
wrong output
Module 2 correct output
Dealing with Multiple Simultaneous Faults: n-MR
The Analog Voter with 5 Inputs (for 5-MR)
Dealing with Multiple Simultaneous Faults: n-MR
The Analog Voter with 5 Inputs (for 5-MR)
Conclusions
• The proposed analog voter is tolerant to multiple simultaneous faults
Conclusions
• The proposed analog voter is tolerant to multiple simultaneous faults
• It can be used to replace digital voters in n-MR solutions to cope with multiple faults
Conclusions
• The proposed analog voter is tolerant to multiple simultaneous faults
• It can be used to replace digital voters in n-MR solutions to cope with multiple faults
• n-MR solutions will withstand up to (n-1)/2 simultaneous faults in the circuit
Future Work
• prototype in silicon several circuits, such as filters, using n-MR and the analog voter
Future Work
• prototype in silicon several circuits, such as filters, using n-MR and the analog voter
• test of those circuits under multiple faults
Future Work
• this work is part of a larger research
• study new additional techniques to implement complete fault tolerant processors using future technologies
• bring analog design techniques to the digital
Thank You ! Questions ?
Contact: calisboa@inf.ufrgs.br