Improved dependability
for dynamically
reconfigurable hardware
Restoration of the reliability index
via replication and error correction
J. M. Martins Ferreira [ jmf@fe.up.pt]
FEUP / DEEC HIBU
Rua Roberto Frias Frogsvei 41 P-4200-465 Porto N-3603 Kongsberg
Manuel G. Gericota [ mgg@dee.isep.ipp.pt] ISEP / DEE
Rua Dr. António Bernardino de Almeida P-4200-072 Porto
[ this presentation is available online at
Outline of the presentation
• Introduction and motivation • Causes of failure
• Concurrent fault detection
• Fault detection latency and fault tolerance • Fault masking and fault correction
• Research directions • Conclusion
Introduction and motivation
• Dynamically reconfigurable FPGAs:
– Production tests cannot
guarantee fault-free operation – Application areas include
mission-critical systems
– The cost / benefit of spatial redundancy is different from static implementations
Causes of failure
• Post-production failure modes may be permanent or temporary ― examples:
– Electromigration phenomena may lead to permanent physical damage
– Single-event upsets (SEUs) may cause permanent malfunction if not mitigated (modification of SRAM contents changes design and data information)
Fault detection
• Dynamic reconfiguration enables concurrent fault detection
– Modifications in the configuration memory may be tested by scrubbing
– Structural faults that emerge on the field may be detected by release-to-test strategies
Fault detection: Scrubbing
• Errors in the on-chip configuration memory may be detected by partial readback (and corrected by partial reconfiguration)
• Scrubbing prevents “design” errors that might lead to functional failure
• Data stored in flip-flop registers is not
writable via the configuration memory, so scrubbing does not correct “data” errors
Fault detection: Release-to-test
• The basic idea underlying release-to-test strategies consists of non-intrusively
replicating a given functional block
in another area, and to make the original resources available for test
Rotation Test Relocation D Q Replication of functionality D Q Rotation of free resources D Q Resources under test
Replication of active resources
• Concurrent fault detection based on
release-to-test approaches must provide functional and state replication
• Replication at CLB-level
– Facilitates state transfer and requires a minimal amount of spare resources
– The relative position of the replicated CLB and its replica has an impact on propagation delay
CLB
CLB replication
• Replicating the functional configuration of a CLB is
done with minimal overhead
• In free-running clock circuits, placing the inputs of the two CLBs in parallel ensures common state acquisition
• Gated-clock circuits need an auxiliary block to provide state transfer
Example: Replicate and
release-to-test in a 24-bit binary counter
CIN COUT CLB_R22C7.S0 BX YB CIN COUT CLB_R21C7.S0 BX YB CIN COUT CLB_R23C7.S0 BX YB CIN COUT CLB_R24C7.S0 BX YB Dedicated carry lines
test in a 24-bit binary counter
0 20 40 60 80 100 120 140 160 0 1 2 3 4 5 6 7 8 9 10 11 12 Number of relocations M ax im um fr eq ue nc y of op er at ion (M H z) - vertical rotation - horizontal rotation CIN COUT CLB_R22C7.S0 BX YB CIN COUT CLB_R21C7.S0 BX YB CIN COUT CLB_R23C7.S0 BX YB CIN COUT CLB_R24C7.S0 BX YB U1/C6/C16/C1/O U1/C6/C14/C1/O Tbxcy Tbyp Tbyp U1/C6/C12/C1/OValidation: ITC’99 benchmarks
4 1 53 343 10 10+2 B13 0 0 121 1037 6 5+2 B12 4 1 31 484 6 7+2 B11 0 0 17 190 6 11+2 B10 0 0 28 160 1 1+2 B09 0 0 21 168 4 9+2 B08 6 2 49 422 8 1+2 B07 0 0 9 61 6 2+2 B06 16 4 34 977 36 1+2 B05 14 4 66 606 8 11+2 B04 0 0 30 150 4 4+2 B03 0 0 4 29 1 1+2 B02 0 0 5 47 2 2+2 B01 Segments Lines Number of flip-flops Number of gates Primary outputs Primary inputs Reference Carry logic Logic CircuitITC’99 benchmarks: ∆f and size
16,8 6 070 485 5 195 444 -47,8 -13,5 B14 28,6 332 954 258 827 -42,8 -4,3 B13 27,9 1 631 953 1 275 804 -1,2 0,0 B12 22,8 614 093 500 261 -36,0 -10,5 B11 25,5 245 455 195 571 -7,6 -7,5 B10 15,8 129 855 112 107 -4,9 -1,8 B09 18,8 178 339 150 093 -5,8 -5,8 B08 20,0 425 214 354 367 -37,8 -23,6 B07 18,1 53 503 45 291 0,0 -2,7 B06 13,7 1 286 031 1 130 985 -36,9 -17,3 B05 21,3 665 419 548 595 -29,3 -6,1 B04 14,7 138 484 120 705 -4,9 -1,9 B03 51,4 10 623 7 016 0,0 0,0 B02 16,0 56 102 48 350 0,0 -5,5 B01 Horizontal Vertical Horizontal VerticalRatio between the total size of the reconf. files
by CLB (%) (horizontal>vertical) Total size of the
reconfiguration files (bytes) Maximum frequency
deviation (%) Circuit
ITC’99 benchmarks: size per CLB
28,64 8 998 6 995 37 B13 27,92 13 713 10 721 119 B12 22,75 15 745 12 827 39 B11 25,51 12 272 9 778 20 B10 15,83 10 821 9 342 12 B09 18,82 10 490 8 829 17 B08 19,99 13 716 11 431 31 B07 18,13 10 700 9 058 5 B06 13,71 12 485 10 980 103 B05 21,30 12 322 10 159 54 B04 14,73 12 589 10 973 11 B03 51,41 10 623 7 016 1 B02 16,03 9 350 8 058 6 B01 Horizontal VerticalRatio between the mean size value of the reconf.
files by CLB (%) (horizontal>vertical) Mean size of the reconfiguration
files by CLB (bytes) Number of
occupied CLBs Circuit
Structural fault detection in CLBs
• Test vector application / response capturing is
carried out via the 1149.1 boundary-scan interface M U X Bypass register Instruction register Configuration register TDO TDI ... CLB under test CLB under test CLB under test IN OUT
User Test Register
BSCAN_VIRTEX UPDATE SHIFT RESET TDI SEL1 SEL2 DRCK1 DRCK2 TDO1 TDO2
Fault detection latency
15,813 18 392
Disconnect of the original CLB inputs and test configuration
1,146 1 333
Disconnect of the original CLB outputs
3,550 4 129
Place of the CLB outputs in parallel
1,906 2 217
Disconnect of all the auxiliary relocation circuit signals
1,844 2 145
Connect of the clock enable inputs of both CLBs 0,238 277 BY_C=0 0,238 277 CC=0 0,379 441 BY_C=1∧CC=1 9,705 11 289
Copy of the internal logic functionality and place of the input signals in parallel
TCK = 20 MHz Time (ms) Number
of bytes
Partial reconfiguration file size and reconfiguration time for each step in the replication of synchronous circuits with clock enable
Replication using the auxiliary relocation block
30,625
35 621 Total
15,813 18 392
Disconnect of the original CLB inputs and test configuration
0,923 1 073
Disconnect of the original CLB outputs
3,433 3 993
Place of the CLB outputs in parallel
10,457 12 163
Copy of the internal logic
functionality and place of the input signals in parallel
TCK = 20 MHz Time (ms) Number
of bytes
Partial reconfiguration file size and reconfiguration time for each step in the replication of synchronous circuits with free-running clock and
of combinational circuits
Fault detection latency
4,726 5 497 Total 0,440 512 6th 0,527 613 5th 0,545 634 4th 0,536 623 3rd 2,678 3 115 2nd TCK = 20 MHz Time (ms) Number of bytes Number of configurationsPartial reconfiguration file size and reconfiguration time of the test configurations
0,066 520 13 40 TCK = 20 MHz Application time (ms) Total (bits) Length (bits) Number of vectors
Shifting time for test vector application
4,088 40 1 022 TCK = 20 MHz Shifting time (ms) Number of vectors Number of cells of the BS
register in a XCV200
Shifting time for the test vector responses from a CLB under test
26 472,235 ms TCK = 33 MHz
43 679,188 ms TCK = 20 MHz
Occupation type: 25% synchronous + 50% combinational + 25% empty Mean time for the test of a 1 176 CLBs matrix
The mean time to test the full CLB matrix is also the worst case fault detection latency
Fault detection latency x
fault masking
• A fault detection latency higher than 40 s may be acceptable in some applications, but may be a problem in many others
• Fault masking by spatial redundancy may solve the problem until the defective CLB is flagged / soft error is corrected
• See Module
Module Module
Majority voter
Spatial redundancy
• N-NMR implementations enhance reliability by
allowing voter failure
• Earlier NMR implementations were a form of static redundancy, but dynamic
reconfiguration brings an added value
– Just-in-time implementation saves space – The reliability index may be restored
Module Module Module Majority voters T-TMR:
Fault detection and correction in
N-NMR via replication of CLBs
• The CLB testing approach previously
described enables the identification of a defective CLB (structural fault)
• Replication will be used to remove the defective CLB from operation (and to reestablish the reliability index)
N-NMR plus online error detection
Module L Module L Module L Module M Module M Module Mscan chain Majority
voters
• An internal scan chain capturing the module and voter outputs enables the detection of incoherencies
• Fault detection
latency still exists, but fault masking prevents system malfunction
Error correction
• If an incoherency is detected:
– A scrubbing procedure is launched to read-and-compare the configuration bitstream for the affected area
– If no error is found, each CLB in the affected module / voter is tested (a defective CLB will be replicated and removed from service)
• Error correction via CLB replication or
scrubbing reestablishes the reliability index Module L Module L Module L Module M Module M Module M
scan chain Majority voters
Research directions
• One-chip “self-healing” architectures may be achieved via self-reconfiguration (a
microprocessor core controls the self-reconfiguration port and scan chains)
• Dual-chip or multi-chip architectures may monitor the error detection circuitry of
Conclusion
• The CLB replication and test procedure
proposed enables concurrent non-intrusive fault detection, but fault detection latency prevents true fault tolerance
• Combining the proposed fault detection
techniques with spatial redundancy enables low overhead fault tolerance for DR-FPGAs (and self-healing for SR-FPGAs)
Conclusion
• Dependability will also be improved by
runtime defragmentation of the FPGA logic space (using the proposed CLB replication and test procedure)
Improved dependability
for dynamically
reconfigurable hardware
Restoration of the reliability index
via replication and error correction
J. M. Martins Ferreira [ jmf@fe.up.pt]
FEUP / DEEC HIBU
Rua Roberto Frias Frogsvei 41 P-4200-465 Porto N-3603 Kongsberg
Manuel G. Gericota [ mgg@dee.isep.ipp.pt] ISEP / DEE
Rua Dr. António Bernardino de Almeida P-4200-072 Porto
[ this presentation is available online at