• Nenhum resultado encontrado

Evaluation of the correctness of a decomposition

4. IMPLEMENTATION OF VECTOR DECOMPOSITION

4.7 Evaluation of the correctness of a decomposition

Gram-Schmidt

The Gram-Schmidt decomposition returns mainly one vector, and is then not as close to the original vector (in the sense of the norm of the difference). However, the results are always physical. The choice was made to keep this decomposition technique as well.

4.7 Evaluation of the correctness of a decomposition

4.7.1 Correct and incorrect decompositions

Several verifications were made in order to evaluate the quality of each loss map. It allowed to get some statistics on the results of the different algorithms. Additionally, one extra vector was decomposed on each set: it is a way to cross-check the loss maps using a common reference. Every time, the error between the chosen vector and its recomposition was calculated, allowing the comparison between sets of loss maps.

Then, every loss map of every set was decomposed on all the other sets. This way, loss maps giving bad results — that is, leading to wrong decompositions on every other vector set — could be removed.

The error on the recomposition, presented in § 3.5, evaluates the difference be- tween a vector and its recomposition in the m-vector space, but gives no information on how correct the decomposition is. For a vector corresponding to a known scenario, an incorrect decomposition (wrong linear combination of vectors) could produce a re- composition closer to the vector than the correct one.

The point of this section is to try to show a link between the value of the error on the recomposition and the correctness of this recomposition. It could for instance be expressed as a threshold value of the error, below which the decomposition is considered as correct.

4.7.2 Method and results

Different series of cross-checks were done: all the loss maps were decomposed on dif- ferent vector sets, and the errors on the recompositions were calculated. The scenario associated to the loss maps being of course known, the decomposition could be sorted as correct or incorrect. In case of factors bigger than one (SVD), the decomposition was considered as correct if the highest factor was the correct one.

4. IMPLEMENTATION OF VECTOR DECOMPOSITION

Figure 4.18: Distributions of correct (left) and incorrect (right) decompositions for G-S (top) and SVD (bottom) of every loss map of 2010 on every vector set (of the same loss maps). There are 120 decompositions in total. The averages are lower for the SVD (better recompositon), and the correct and incorrect are close to each other. See text for more observations.

Then, the distributions of the errors for correct and incorrect decompositions were ploted, and their averages calculated. The more separated these two averages are, the better the discrimination between correct and incorrect decompositions will be.

The first cross-checks were done with the loss maps of 2010 (cf. tab. 4.1), both for SVD and G-S. Every loss map of each vector set is decomposed on all the other vector sets: there are 120 decompositions in total. The results are presented infig. 4.18. The averages for correct and incorrect are, for G-S : 0.2105 and 0.2188; for SVD: 0.1631 and 0.1845. They are too close to each other to allow a good separation; there is an important overlaps between the two distributions.

Another observation is that every distribution is separated into two groups: many values are under 0.15, corresponding to true positives for the correct decompositions, but fake positives for the incorrect ones: the recomposition is incorrect, but close to the original vector. Similarly, a group of values seems to be centered around 0.45, especially for the correct GS decompositions. There aretrue negatives for the incorrect SVD decompositions. The similar values, for the correct decompositions, are fake

4.7 Evaluation of the correctness of a decomposition

Figure 4.19: Distributions of correct (left) and incorrect (right) decompositions for G-S (top) and SVD (bottom) of every loss map of 2010 on the average vector set. There are 24 decompositions in total. The averages are lower for the SVD (better recomposition), and the correct and incorrect are still too close to each other to allow a good separation. See text for more observations.

positives: far from the original vector, yet correct.

A second cross-check was done by decomposing every loss map of 2010 on the average vector set (only for H and V): 24 decomposition in total (cf. fig. 4.19). The first observation is that there are more correct decompositions than incorrect. However, there is still some overlaps, and the averages are still too close to each other to allow a good separation (0.1885 and 0.2186 for G-S; 0.1691 and 0.2124 for SVD). This is because the loss maps of 2010 were not sorted like the 2011 ones, following the order in which they were measured and the energy. The importance of these differences was only understood later.

4.7.3 Distributions of the error for the 2011 loss maps

In this case, all loss maps of 2011 were decomposed on the average set of selected loss maps (cf. § 4.3), which is the set used in real data decompositions. In these, the majority of correct and incorrect decompositions are clearly separated.

4. IMPLEMENTATION OF VECTOR DECOMPOSITION

Figure 4.20: Distributions of correct (left) and incorrect (right) decompositions for G-S (top) and SVD (bottom) of every loss map of 2011 on the set of selected loss maps. There are 104 decompositions in total. The red lines represent the value of the geometrical mean between the means for correct and incorrect decompositions (see text). The majority of correct and incor- rect decompositions are now clearly separated, even though there is still some fake negatives:

distributions that are correct but far away from the original vector, and some fake positives.

For SVD, the averages for the correct decomposition is 0.1289 and 0.2732 for in- correct(cf. fig. 4.20). The arithmetic mean between these two values is 0.201, and the geometric mean is 0.187 (the geometric mean is often preferred here, because it takes better into account the fact that there are many low values). Using 0.19 as a threshold, there are 51 entries below this value for the correct decomposition, corresponding to 72% true positives; and 20 entries above mean represent 28% fake negatives. For the incorrect decompositions, there are 26 entries above mean (79% true negative) and 7 below (21% fake positives).

For GS, the “correct” average is 0.1935, and the “incorrect” is 0.2881. The arith- metic mean between them is 0.236, and the geometric mean is 0.241 (there is no dif- ference between arithmetic and geometric mean at the precision of 0.01 used in the distribution). For the correct decompositions, there are 38 entries below mean (54%

true positives) and 33 above (46% fake negatives); for the incorrect decompositions, 24 above (73% true negatives) and 9 below (27% fake positives).