Improved dependability for dynamically reconfigurable hardware: restoration of the reliability index via replication and error correction

(1)

Improved dependability

for dynamically

reconfigurable hardware

Restoration of the reliability index

via replication and error correction

J. M. Martins Ferreira [ jmf@fe.up.pt]

FEUP / DEEC HIBU

Rua Roberto Frias Frogsvei 41 P-4200-465 Porto N-3603 Kongsberg

Manuel G. Gericota [ mgg@dee.isep.ipp.pt] ISEP / DEE

Rua Dr. António Bernardino de Almeida P-4200-072 Porto

[ this presentation is available online at

(2)

Outline of the presentation

• Introduction and motivation • Causes of failure

• Concurrent fault detection

• Fault detection latency and fault tolerance • Fault masking and fault correction

• Research directions • Conclusion

(3)

Introduction and motivation

• Dynamically reconfigurable FPGAs:

– Production tests cannot

guarantee fault-free operation – Application areas include

mission-critical systems

– The cost / benefit of spatial redundancy is different from static implementations

(4)

(5)

Causes of failure

• Post-production failure modes may be permanent or temporary ― examples:

– Electromigration phenomena may lead to permanent physical damage

– Single-event upsets (SEUs) may cause permanent malfunction if not mitigated (modification of SRAM contents changes design and data information)

(6)

Fault detection

• Dynamic reconfiguration enables concurrent fault detection

– Modifications in the configuration memory may be tested by scrubbing

– Structural faults that emerge on the field may be detected by release-to-test strategies

(7)

Fault detection: Scrubbing

• Errors in the on-chip configuration memory may be detected by partial readback (and corrected by partial reconfiguration)

• Scrubbing prevents “design” errors that might lead to functional failure

• Data stored in flip-flop registers is not

writable via the configuration memory, so scrubbing does not correct “data” errors

(8)

Fault detection: Release-to-test

• The basic idea underlying release-to-test strategies consists of non-intrusively

replicating a given functional block

in another area, and to make the original resources available for test

Rotation Test Relocation D Q Replication of functionality D Q Rotation of free resources D Q Resources under test

(9)

Replication of active resources

• Concurrent fault detection based on

release-to-test approaches must provide functional and state replication

• Replication at CLB-level

– Facilitates state transfer and requires a minimal amount of spare resources

– The relative position of the replicated CLB and its replica has an impact on propagation delay

CLB

(10)

CLB replication

• Replicating the functional configuration of a CLB is

done with minimal overhead

• In free-running clock circuits, placing the inputs of the two CLBs in parallel ensures common state acquisition

• Gated-clock circuits need an auxiliary block to provide state transfer

(11)

Example: Replicate and

release-to-test in a 24-bit binary counter

CIN COUT CLB_R22C7.S0 BX YB CIN COUT CLB_R21C7.S0 BX YB CIN COUT CLB_R23C7.S0 BX YB CIN COUT CLB_R24C7.S0 BX YB Dedicated carry lines

(12)

test in a 24-bit binary counter

0 20 40 60 80 100 120 140 160 0 1 2 3 4 5 6 7 8 9 10 11 12 Number of relocations M ax im um fr eq ue nc y of op er at ion (M H z) - vertical rotation - horizontal rotation CIN COUT CLB_R22C7.S0 BX YB CIN COUT CLB_R21C7.S0 BX YB CIN COUT CLB_R23C7.S0 BX YB CIN COUT CLB_R24C7.S0 BX YB U1/C6/C16/C1/O U1/C6/C14/C1/O Tbxcy Tbyp Tbyp U1/C6/C12/C1/O

(13)

Validation: ITC’99 benchmarks

4 1 53 343 10 10+2 B13 0 0 121 1037 6 5+2 B12 4 1 31 484 6 7+2 B11 0 0 17 190 6 11+2 B10 0 0 28 160 1 1+2 B09 0 0 21 168 4 9+2 B08 6 2 49 422 8 1+2 B07 0 0 9 61 6 2+2 B06 16 4 34 977 36 1+2 B05 14 4 66 606 8 11+2 B04 0 0 30 150 4 4+2 B03 0 0 4 29 1 1+2 B02 0 0 5 47 2 2+2 B01 Segments Lines Number of flip-flops Number of gates Primary outputs Primary inputs Reference Carry logic Logic Circuit

(14)

ITC’99 benchmarks: ∆f and size

16,8 6 070 485 5 195 444 -47,8 -13,5 B14 28,6 332 954 258 827 -42,8 -4,3 B13 27,9 1 631 953 1 275 804 -1,2 0,0 B12 22,8 614 093 500 261 -36,0 -10,5 B11 25,5 245 455 195 571 -7,6 -7,5 B10 15,8 129 855 112 107 -4,9 -1,8 B09 18,8 178 339 150 093 -5,8 -5,8 B08 20,0 425 214 354 367 -37,8 -23,6 B07 18,1 53 503 45 291 0,0 -2,7 B06 13,7 1 286 031 1 130 985 -36,9 -17,3 B05 21,3 665 419 548 595 -29,3 -6,1 B04 14,7 138 484 120 705 -4,9 -1,9 B03 51,4 10 623 7 016 0,0 0,0 B02 16,0 56 102 48 350 0,0 -5,5 B01 Horizontal Vertical Horizontal Vertical

Ratio between the total size of the reconf. files

by CLB (%) (horizontal>vertical) Total size of the

reconfiguration files (bytes) Maximum frequency

deviation (%) Circuit

(15)

ITC’99 benchmarks: size per CLB

28,64 8 998 6 995 37 B13 27,92 13 713 10 721 119 B12 22,75 15 745 12 827 39 B11 25,51 12 272 9 778 20 B10 15,83 10 821 9 342 12 B09 18,82 10 490 8 829 17 B08 19,99 13 716 11 431 31 B07 18,13 10 700 9 058 5 B06 13,71 12 485 10 980 103 B05 21,30 12 322 10 159 54 B04 14,73 12 589 10 973 11 B03 51,41 10 623 7 016 1 B02 16,03 9 350 8 058 6 B01 Horizontal Vertical

Ratio between the mean size value of the reconf.

files by CLB (%) (horizontal>vertical) Mean size of the reconfiguration

files by CLB (bytes) Number of

occupied CLBs Circuit

(16)

Structural fault detection in CLBs

• Test vector application / response capturing is

carried out via the 1149.1 boundary-scan interface M U X Bypass register Instruction register Configuration register TDO TDI ... CLB under test CLB under test CLB under test IN OUT

User Test Register

BSCAN_VIRTEX UPDATE SHIFT RESET TDI SEL1 SEL2 DRCK1 DRCK2 TDO1 TDO2

(17)

Fault detection latency

15,813 18 392

Disconnect of the original CLB inputs and test configuration

1,146 1 333

Disconnect of the original CLB outputs

3,550 4 129

Place of the CLB outputs in parallel

1,906 2 217

Disconnect of all the auxiliary relocation circuit signals

1,844 2 145

Connect of the clock enable inputs of both CLBs 0,238 277 BY_C=0 0,238 277 CC=0 0,379 441 BY_C=1∧CC=1 9,705 11 289

Copy of the internal logic functionality and place of the input signals in parallel

TCK = 20 MHz Time (ms) Number

of bytes

Partial reconfiguration file size and reconfiguration time for each step in the replication of synchronous circuits with clock enable

Replication using the auxiliary relocation block

30,625

35 621 Total

15,813 18 392

Disconnect of the original CLB inputs and test configuration

0,923 1 073

Disconnect of the original CLB outputs

3,433 3 993

Place of the CLB outputs in parallel

10,457 12 163

Copy of the internal logic

functionality and place of the input signals in parallel

TCK = 20 MHz Time (ms) Number

of bytes

Partial reconfiguration file size and reconfiguration time for each step in the replication of synchronous circuits with free-running clock and

of combinational circuits

(18)

Fault detection latency

4,726 5 497 Total 0,440 512 6th 0,527 613 5th 0,545 634 4th 0,536 623 3rd 2,678 3 115 2nd TCK = 20 MHz Time (ms) Number of bytes Number of configurations

Partial reconfiguration file size and reconfiguration time of the test configurations

0,066 520 13 40 TCK = 20 MHz Application time (ms) Total (bits) Length (bits) Number of vectors

Shifting time for test vector application

4,088 40 1 022 TCK = 20 MHz Shifting time (ms) Number of vectors Number of cells of the BS

register in a XCV200

Shifting time for the test vector responses from a CLB under test

26 472,235 ms TCK = 33 MHz

43 679,188 ms TCK = 20 MHz

Occupation type: 25% synchronous + 50% combinational + 25% empty Mean time for the test of a 1 176 CLBs matrix

The mean time to test the full CLB matrix is also the worst case fault detection latency

(19)

Fault detection latency x

fault masking

• A fault detection latency higher than 40 s may be acceptable in some applications, but may be a problem in many others

• Fault masking by spatial redundancy may solve the problem until the defective CLB is flagged / soft error is corrected

• See Module

Module Module

Majority voter

(20)

Spatial redundancy

• N-NMR implementations enhance reliability by

allowing voter failure

• Earlier NMR implementations were a form of static redundancy, but dynamic

reconfiguration brings an added value

– Just-in-time implementation saves space – The reliability index may be restored

Module Module Module Majority voters T-TMR:

(21)

Fault detection and correction in

N-NMR via replication of CLBs

• The CLB testing approach previously

described enables the identification of a defective CLB (structural fault)

• Replication will be used to remove the defective CLB from operation (and to reestablish the reliability index)

(22)

N-NMR plus online error detection

Module L Module L Module L Module M Module M Module M

scan chain Majority

voters

• An internal scan chain capturing the module and voter outputs enables the detection of incoherencies

• Fault detection

latency still exists, but fault masking prevents system malfunction

(23)

Error correction

• If an incoherency is detected:

– A scrubbing procedure is launched to read-and-compare the configuration bitstream for the affected area

– If no error is found, each CLB in the affected module / voter is tested (a defective CLB will be replicated and removed from service)

• Error correction via CLB replication or

scrubbing reestablishes the reliability index Module L Module L Module L Module M Module M Module M

scan chain _Majority voters

(24)

Research directions

• One-chip “self-healing” architectures may be achieved via self-reconfiguration (a

microprocessor core controls the self-reconfiguration port and scan chains)

• Dual-chip or multi-chip architectures may monitor the error detection circuitry of

(25)

Conclusion

• The CLB replication and test procedure

proposed enables concurrent non-intrusive fault detection, but fault detection latency prevents true fault tolerance

• Combining the proposed fault detection

techniques with spatial redundancy enables low overhead fault tolerance for DR-FPGAs (and self-healing for SR-FPGAs)

(26)

Conclusion

• Dependability will also be improved by

runtime defragmentation of the FPGA logic space (using the proposed CLB replication and test procedure)

(27)