Measuring the quality of the results - Renforcements naturels pour la collaboration homme-machi

06/17/1993: A bomb killed Fatah Oﬃcer ’Ali Iskandar and his wife and wounded three other family members when it exploded at their home in the ’Ayn al-Hulway Palestinian Refugee camp near Sidon. The perpetrators were unknown.

The associated location is given as

The attack occurred in the ’Ayn al-Hulway Palestinian Refugee camp near Sidon

We therefore reduced this noise by removing the following generic words and phrases from the agentive and spatial dimensions: the attack, the incid- ent, occurred, took place, in front of, near, in, at, of, on, and, the, a, an, unspecified location, undisclosed, was / were targeted.

As in standard in string manipulation, we also removed from all the dimensions, punctuation such as quotes, fullstops, commas and hyphens.

We trimmed the strings, and replaced multiple whitespaces with a single space character.

7.6 Measuring the quality of the results

In order to test whether our AI is making a diﬀerence, and whether it is

“intelligent enough”, we need to measure its IQ, or how intelligent it has become. In other words, to identify at what point it becomes useful or mature. We therefore need to measure the quality of its results.

To do this, we must ﬁrst deﬁne the similarity between an event extracted from the document by the chainE1= (C1, T1, G1, A1), and the corresponding “perfect” event E2 = (C2, T2, G2, A2).

In our experiments,E2 is the event as detailed in theGTD files and here we have defined a fairly simple similarity measure which is specific to the GTD – WebLab mapping. However, in production, the comparison would be taken between the event extracted and the event as corrected by the analyst.

In this case, the similarity measure would certainly need to be rethought¹. For instance, when comparing with the GTD, the geographical information is either correct or not, but in production, it may be partially correct, and, of course, the documents could contain several events instead of at most one per document.

1Of course, BIMBO being completely modular, slotting in a new similarity measure would have no impact on the rest of the framework.

66 CHAPTER 7. BIMBO: A FLEXIBLE PLATFORM This can be complicated by the fact that just as the analyst may add information that was not in the original document, the GTD contains event information that is not included in the summary, and thus could not possibly have been extracted. We try to avoid penalising the AI for this.

Numerous methods have been proposed for measuring similarity (Pandit et al. (2011); Tversky and Gati (1978); Cohen et al. (2013); Dutkiewicz et al.

(2013) to give but a few), but we wanted a measure taking into account the events’ four dimensions, and capable of differentiating between different types of event, therefore we put weightsa, b, c anddon the dimensions and define:

Overall similarity between two events

σ(E1, E2) = aσ_C(C1, C2) +bσ_G(G1, G2) +cσ_T(T1, T2) +dσ_A(A1, A2) (a+b+c+d)

(7.1) Semantic similarity Because we mapped the values of the WebLab semantic dimension onto those of the GTD (see Table 7.4), we can deﬁne the semantic similarity σ_C(C1, C2) to be 1 if there is a common element between the semantic dimensions of E1 and E2, and 0 otherwise. For example, for C1 = {Bombing} and C2 = {Attack, Bombing}, we obtain C1∩C2 ={Bombing}, soσ_C(C1, C2)is 1.

Geographical (spatial) similarity We deﬁne the geographical similarity σG(G1, G2) in the same way, that is σG(G1, G2) is 1 if there is a common element between the geographical dimensions ofE1 andE2, and0otherwise.

For example, for G1 = {Caen} and G2 = {Rouen} G1 ∩G2 = ∅, and so σG(G1, G2) is0

Temporal similarity The temporal similarityσ_T(T1, T2)is1forT1∩T2 6=

∅. Otherwise, if there is no common element, we use partial and derived information. For example, ifT1={7 October 1969},T2={7/10/69},T3 = {October} and T4 ={Tuesday}. We can see thatT1 is the same as T2, just in a diﬀerent format, soσ_T(T1, T2) is1.

Deriving possible partial dates from T1 ={7 October 1969} gives {October 1969, 07-10, 10-07, 1969-10, 1969, October}

October is a common element with T3, so we give σ_T(T1, T3) a value of

10. Finally, we can calculate that 7^th October 1969 was a Tuesday, so

7.6. MEASURING THE QUALITY OF THE RESULTS 67 σ_T(T1, T4) = ¹₇. Note that the values ₁₀¹ and ¹₇ were chosen by experi- ment, and highlight the diﬃculty of giving an exact numerical value to a comparison.

Agentive similarity For the agentive similarity, we initially tried the Jac- card similarity given by the number of elements common to each set divided by the total number of distinct elements:

Jacc(X, Y) = |X∩Y|

|X∪Y|

For example, if the ﬁrst setA1 is{Al-Qaeda, Army of Islam}and the second A2 is {Al-Qaeda, The Foundation}, then

Jacc(A1, A2) = |Al-Qaeda|

|Al-Qaeda, Army of Islam, The Foundation| = 1 3 The problem is that this does not take into account spelling diﬀerences which are very common, for example in the transcription of Arabic names, and both the analysts’ corrections and the GTD standardise the spellings.

If A2 contained {Al-Qaida} instead of {Al-Qaeda}, the Jaccard similarity would give

Jacc(A1, A2) = |∅|

|Al-Qaeda, Army of Islam, Al-Qaida, The Foundation| = 0 4 We therefore decided to use the Levenshtein distance (the minimum number of characters to delete, insert or replace to convert one string into the other) (Levenshtein, 1966) on each pair of agents a1, a2 inA1, A2.

Levenshtein(Al-Qaeda, Al-Qaida) =1 replacement

But, names are often more complex, for example, if we compare Jharkhand anti-Maoist Maobadi Protirodh Committee member with Jharkhand Party member, the standard Levenstein distance is 36, and yet, logically, they are the same entity. Also, the named entity (NE) extractors such as Gate can make partial matches, or over-match. We therefore make this Levenshtein distance “fuzzy” (F L(a1, a2)) by comparing the substrings (Ginstrom, 2007).

This means that we can ignore the prefixes and suffices which are often the parts which are mismatched, whilst still accommodating slight differences in spelling.

68 CHAPTER 7. BIMBO: A FLEXIBLE PLATFORM We deﬁne the agentive similarityσ_A(A1, A2) as

σ_A(A1, A2) = 1−min(F L(a1, a2))

min(|a1|,|a2|) | ∀a1 ∈A1,∀a2 ∈A2

if over a certain thresholdθ, and 0otherwise (in practise, θ= 0.45gave the best results).

For example, if A1 = {Dr Dolittle PhD} and A2 = {Doolittle}, the standard Levenshtein distance is 7, but the fuzzy Levenshtein distance is only 1, and so the similarityσ_A(A1, A2) is

σ_A(A1, A2) = 1−F L(^✟✟^❍❍Dr Dolittle^✘✘✘^❳❳❳PhD,Do✁❆olittle)

|Doolittle|

= 1−1 9

= 0.889

Taking the problematic example above:

ifA1 ={Jharkhand anti-Maoist Maobadi Protirodh Committee member}and A2 = {Jharkhand Party member}, then the fuzzy Levenshtein calculation gives a distance of 11, and thus a similarity of1−¹¹₂₂ = 0.5.

Given two lists of agents, we do a pairwise matching, and use the max- imum similarity.

We do not try here to associate named entities such asLondon / capital of England, but the modularity of the system would allow this.

Example 6 (Full comparison) If we take the two events described in Fig- ure 7.7, we can see that:

• There is an overlap (Death Event) in the semantic description of the event, and so the semantic similarity is 1.

• The spatial similarity is 1, due to the common element West Bengal.

• We have to rely on the deduction of the day of the week, getting a temporal similarity of 0.143.

• Finally, the agentive similarity, as we have already seen, is 0.5.

If we put equal weights on the dimensions (a=b=c=d= 1), we have an overall similarity of

σ(WebLab Event,GTD Event) = 1.0 + 1.0 + 0.143 + 0.5

4 = 0.66075

between the extracted event, and the corresponding event in the GTD.

No documento Renforcements naturels pour la collaboration homme-machine (páginas 86-90)