Document-constrained clustering - Automatic face naming and recognition

3.3 Automatic face naming and recognition

3.3.1 Document-constrained clustering

The underlying idea behind face naming using news documents is to find the “best”

assignment of names and faces in each document. From the limited set of names in each caption and the large amount of data where those people appear, we hope to be able to estimate for each document the most probable assignment of names and faces. For documents withN detected names andF detected faces, considering name ambiguity and possible multiple instances of each person, each name-pair is possible and should be considered. The number of possible assignments for the document is therefore 2^N×F.

When the document is large, this number will be too large to exhaustively search for the best solution. For instance, in Figure 3.4 we show a document with 12 names and 12 faces. This would suggest around 2¹⁴⁴≈10⁴³assignments to consider, which is in- tractable. Instead, the three following constraints can be used to reduce this number.

These constraints come from assumptions that we make about news documents.

Constraint (i): Faces can only be assigned to names detected in the caption Now explicit, this first constraint has in fact been implicitly assumed in the previous paragraphs. It makes the problem of naming much easier, because the system only has

3.3. AUTOMATIC FACE NAMING AND RECOGNITION 63

Pictures of the Bali bombing suspects clockwise from top in- clude: Umar (also known as Patek), Dulmatin (a.k.a. Amar Usman), M. Ali Imron (a.k.a.

Alik), Idris (a.k.a. Jhoni Hen- drawan, a.k.a. Gembrot), Umar (a.k.a. Wayan) and Imam Samu- dra. REUTERS/Handouts from In- donesian Police.

Figure 3.5: This news document from theYahoo! Newsdata set illustrates the chal- lenges behind the ambiguity of names for people.

to distinguish co-occurring people, that is up to a dozen, and not among thousands of different people that appear in the data set. However, if the caption is incorrectly analysed and a the correct name for a face is missed, this constraint will prevent, as a consequence, from associating the correct name to it. This can also happen if a person is simply not mentioned in the caption although his name is present elsewhere in the data set. Therefore, constraint (i) inherently prevents the systems to perform perfectly even if it is the key to the success of our methods. Note that the subset of names occurring in the caption is shared by all the faces in this image.

Constraint (ii): Faces can only be assigned to at most one name

As the previous constraint, stating that a face should correspond to only one name is not always accurate. To illustrate this, we refer to Figure 3.5. In this example, we see that several names in the caption are valid for naming a single face. This assumption more commonly breaks when a person is the focus of a story, and the journalist chooses to refer to this person using different named entities across the caption (for instance, Barack Obama later referred to as President Obama or the Presisent of the United States of America). Doing so, the journalist conveys more information about the subject, and also improves the style by avoiding repetitions. Importantly, using this constraint, associating faces and names in such a data set is therefore a constrained clustering problem: it states that faces can be assigned only to a unique name,i.e.to a single cluster.

Figure 3.6: Two examples where the image contains multiple faces of the same person (left: Edmund Stoiber, right: Keanu Reeves), hence breaking the assumption behind constraint (iii). The image on the right shows an extreme case where a single name should be assigned to dozens of faces in the image. Simple lower and upper limits on the face sizes are often an adequate solution.

Constraint (iii): Names can only be assigned to at most one face in an image Constraint (iii) assumes that it is impossible that a given person appears several times in the same image. Equivalently, it means that several faces in the same image cannot be assigned to the same cluster. Although it sounds sensible and also helps mak- ing the problem easier, this assumption is sometimes inconsistent with the observed data. Posters, mirrors, and computer generated images are common causes for this inconsistency, as shown in Figure 3.6.

Null-assignments and constrained clustering

There are also cases where it is impossible to assign a face to a name because of the constraints or when the system estimates that none of the names are suitable for a face. A way to model this situation is to allow for an additional cluster, thenull cluster.

Its purpose is to collect all the faces that are not assigned to any named cluster, it is therefore not subject to constraint (iii). When a face is assigned to thenullcluster, we will also call this situation anull-assignment for the face.

We can now summarise modelling of news documents by explicitly showing in Fig- ure 3.7 the admissible assignments for a typical document under the three constraints

3.3. AUTOMATIC FACE NAMING AND RECOGNITION 65

Brazilian presidential candidate Luiz Inacio Lula da Silva and the candidate for gover- nor of the state of Sao Paulo, Jose Genoino, both of the Workers Party (PT) embrace dur- ing a news conference in Sao Paulo, Brazil on Monday Oct. 7, 2002. Unable to win a weekend election outright, the former labour boss is headed for an Oct. 27 showdown with the second-place finisher, candidate Jose Serra.(AP Photo/Dario Lopez-Mills).

Detected names are: Luiz Iniacio Lula da Silva (n₁), Jose Serra(n₂) andJose Genoino(n₃).

f₂ f₁

n₂ n₁

n₃ n₁

n₁ n₂

n₃ n₂

n₁ n₃

n₂ n₃

n₁ n₁

n₁ n₂

n₁ n₃

n₁ n₁

n₂ n₁

n₃ n₁

Figure 3.7: For a document with 2 detected faces and 3 names, shown on the top, there are 13 admissible assignments, shown below.

described above. The enumeration of possible assignmentsV(N,F) for a document withN names and F faces leads to the following formula:

V(N,F) =

min(N,F)

p=0

N p

F p

. (3.1)

This formula is interpreted as follows: for a number of name-face assignmentsp, pick p faces among F and p names among N, then pick one permutation (out of p!) of the names to associate to the faces. For most documents, this number is sufficiently small to allow for an exhaustive search for the best assignment. For the largest documents, it is still impractically large, but efficient techniques are proposed below in Section 3.3.4.

Many clustering algorithms can be adapted to handle such constraints. For instance, in bottom-up hierarchical clustering, we can propagate cannot-link constraints to unions of clusters as they are agglomerated. EM algorithms like K-means and mix- tures of Gaussians can also be adapted by constraining the E-step to select a single admissible assignment. Below, we first detail how this can be done in the case of a Gaussian Mixture Model (GMM, Section 3.3.2), then we present in Section 3.3.3 our graph-based method.

No documento Données multimodales pour l’analyse d’image Matthieu Guillaumin (páginas 73-77)