Background and Evolution of the Proposed Technique

3 MATRIX MULTIPLICATION HARDENING

3.3 PROPOSED TECHNIQUE

3.3.1 Background and Evolution of the Proposed Technique

The concept of processing and checking in parallel the outputs of a system for only a subset of its possible inputs, also called fingerprinting (MOTWANI, 1995), can be applied to the general case of a circuit that must be hardened against soft errors, thus providing tolerance against transient faults caused by pulses that affect parts of the circuit, even when the duration of the transient pulse is longer than the delay of several gates. Figure 3.1 illustrates this idea.

In contrast with other proposed solutions based on checker circuits, such as the one proposed by Austin (1999), when fingerprinting is applied the random checker does not provide full fault detection. It performs some of the functions of the main circuit only on a small set of possible inputs, being able to statistically detect errors at the output

with a given probability. The main goal of this approach is to provide an acceptable level of fault detection, according to the concepts of error tolerance, using a circuit that is significantly smaller than the main circuit under inspection, thereby providing low area overhead.

Figure 3.1. Fingerprinting - generic scheme

The underlying concept presented here is generic, and can be adopted for several different applications or circuits, with the subset of inputs, the operations performed by the checker, the performance, area, and power overheads varying according to the application. In this work, it has been applied to harden a matrix multiplier circuit, as shown in the following paragraphs.

In 1977, Rúsiņš Freivalds (1977) proved that probabilistic machines are able to execute some specific computations faster than deterministic ones, and that they can compute approximations of a function in a fraction of the time required to compute the same function deterministically. Also credited to Freivalds, a technique for faster verification of the correctness of matrix multiplication algorithms has been shown in Motwani (1995).

In summary, Freivalds’ technique proposes the use of multiplication of matrices by vectors in order to reduce the computation time when verifying the results produced by a given matrix multiplication algorithm, as follows: given n×n matrices A and B, and the matrix C, the product of A and B which was computed using the algorithm under test, the following computations are performed:

1. Randomly create a vector r in which the values of the elements are only 0 or 1.

2. Calculate Cr = C × r

3. Calculate ABr = A × (B × r)

Freivalds has proven that, whenever A×B ≠ C, the probability of Cr being equal to ABr is ≤ ½. In other words, when A×B = C the probability of the product matrix being correct is higher than ½. The demonstration is shown in Motwani (1995).

Furthermore, if steps 1 to 3 above are performed k times independently (with different values of the vector r), the probability becomes ≤ ½^k. Using this technique, the verification of the result can be done in less time than the original multiplication, since matrix multiplication requires O(n³) time to be performed, while multiplication of a matrix by a vector is performed in O(n²) time. However, since this is a statistical technique, there is no assurance that errors will always be detected.

3.3.1.2 Improving Freivalds’ technique

The analysis of the technique proposed by Freivalds shows that the probability of detecting one error in C is ≅ ½ because the randomly generated elements of the vector r have the same ½ probability of being 0 or 1. Assuming that the element of C which has

main circuit

random checker inputs

output

error

an erroneous value is Cij, in the calculation of Cr this element is multiplied by a single element rk of the vector, thereby being canceled during the generation of Cr (if rk is equal to 0) or not (when rk is equal to 1).

Given that the elements of the vector r can be randomly chosen, if we perform the computation with a second vector, rc, in which each element is the binary complement of the values in r, the elements of C that were cancelled in the first computation will not be canceled in the second one, and vice-versa. Therefore, if Cij has an erroneous value, we will either have A×(B×r) ≠ C×r or A×(B×rc) ≠ C×rc, and the probability of detecting an error in a single element of C will be equal to 1, i.e., if the erroneous value is masked in the calculation of ABr/Cr, it is not masked when ABrc/Crc are calculated, and vice versa.

This property allows the detection of every error in which a single element of C is faulty, with only two executions of the Freivalds technique, as demonstrated in the following box.

Theorem: The use of complementary r and rc vectors allows to detect all single faults with a double execution of Freivalds’ technique.

The computation of the products A×(B×r) and C×r in the Freivalds technique generates two vectors that must be compared. Assuming that matrices A and B have n×n elements, the r and rc vectors will have n elements each and the value of an element i of the above products is given by:

ABri = Σⁿi=1 ((a11b1i + a12b2i + ... + a1nbni).ri) Cri = ci1r1 + ci2r2 + ... + cinrn

As demonstrated in Motwani (1995), when no error occurs in the calculation of C, we have ABr = Cr, and regardless of the values of ri the comparison for equality will hold true. However, when ABr ≠ Cr there is a probability ≤ ½ that the comparison will also hold true. That happens because the values of ri are selected randomly from {0, 1} and, therefore,

Pr[ri = 0] = Pr[ri = 1] = ½.

This way, there is a 50% chance that an erroneous value Cij will be masked during the calculation of Cr, and, in this case, ABr is erroneously considered to be equal to Cr.

When the ri values are generated randomly, and then the complement of their values are used to set the values of the corresponding elements in vector rc, we have:

Pr[ri=1 OR rci=1] = Pr[ri=1] ∪ Pr[ rci=1]

= Pr[ri=1] + Pr[ rci=1]

= ½ + ½ = 1

Further exploring the extension of Freivalds’ technique here proposed, it becomes clear that, since the technique is valid for any randomly selected r vector, it must also be valid for the specific vector r1 = {1, 1, ..., 1}. In this case, the complementary vector is r0 = {0, 0, ..., 0}, and we have:

C×r1 = {Σⁿj=1 C1j, ..., Σⁿj=1 Cnj} (1)

A×(B×r1) = {Σⁿj=1 (Σⁿk=1 A1k .Bkj), ... , Σⁿj=1 (Σⁿk=1 Ank .Bkj)} (2) and

C×r0 = 0 (3)

A×(B×r0) = 0 (4)

From expressions (3) and (4) above, one can see that the condition C×r0 ≠ A×(B×r0) will always be false, and therefore the test of the compound condition A×(B×r1) ≠ C×r1

or A×(B×r0) ≠ C×r0 can be simplified to A×(B×r1) ≠ C×r1, significantly reducing the cost of the verification process, because the computation of expressions (3) and (4) is no longer necessary. In addition, in the computation of the expressions (1) and (2) there is no longer need to multiply by the elements of r1, since they all are equal to one.

From (1) and (2), we can also conclude that, since in the multiplication process Cij = Σⁿk=1 Aik .Bkj, if one of the Cij elements has an erroneous value, the condition A×(B×r1)

≠ C×r1 will be true, and the error will always be detected.

(Expression 5)

(Expression 6)

(Expression 7) Figure 3.2: Operations used in the verification of the product

Therefore, the verification of the product matrix can be performed only by calculating the following:

• Vector Cr, where Cri = Ci1 + Ci2 + ... + Cin (5)

• Vector Br, where Bri = Bi1 + Bi2 + ... + Bin (6)

• Vector ABr, where ABri = Σⁿk=1 Aik .Brk (7)

Then, vectors ABr and Cr must be compared; if they are different, there was an error in the multiplication, and the whole matrix multiplication algorithm must be repeated.

The above conclusions have been confirmed through exhaustive simulated fault injection experiments using MatLab (MATHWORKS, 2006), and this optimized technique provides a method that can detect all single element errors in a matrix multiplication operation, with very low overhead.

In terms of computation time overhead, Table 3.2 shows the number of operations (considering that the cost of multiplications is 4 times the cost of additions and comparisons) required to multiply and check matrices with different dimensions (n), obtained in this experiment.

Table 3.2. Computational cost scaling with n n Multiplication

4n³+ n²(n-1)

Verification 5n²+3n(n-1)

% Verification Overhead

2 36 26 72

4 304 116 38

8 2,496 488 20

16 20,224 2,000 10

32 162,816 8,096 5

64 1,306,624 32,576 2

The figures in Table 3.2 make clear that the verification cost in the proposed technique for larger matrices (n ≥ 4) is far below the 100% imposed by duplicated execution of the multiplication algorithm and also much less than in other more expensive techniques, thereby confirming the low overhead of the verification.

No documento Dealing with Radiation Induced Long Duration Transient Faults in Future Technologies (páginas 40-44)