Face verication for an access control system in unconstrained environment

(1)

Universidade de Aveiro Departamento deElectrónica, Telecomunica¸cões e Informática, 2017

Daniel Pedro

Ferreira Lopes

Verifica¸

c˜

ao Facial para um Sistema de Controlo de

Acessos em Ambientes N˜

ao Estruturados

Face Verification for an Access Control System in

Unconstrained Environment

(2)

(3)

Universidade de Aveiro Departamento deElectrónica, Telecomunica¸cões e Informática, 2017

Daniel Pedro

Ferreira Lopes

Verifica¸

c˜

ao Facial para um Sistema de Controlo de

Acessos em Ambientes N˜

ao Estruturados

Face Verification for an Access Control System in

Unconstrained Environment

Disserta¸cão apresentada à Universidade de Aveiro para cumprimento dos requisitos necessários à obten¸cão do grau de Mestrado em Engenharia Eletrónica e Telecomunica¸cões, realizada sob a orienta¸cão cient´ıfica de António José Ribeiro Neves, Professor do Departamento de Eletrónica e Telecomunica¸cões e Informática da Universidade de Aveiro.

(4)

(5)

o j´uri / the jury

presidente / president Armando Jos´e Formoso de Pinho

Professor Associado com Agrega¸c˜ao da Universidade de Aveiro

vogais / examiners committee Ant´onio Jos´e Ribeiro Neves

Professor Auxiliar da Universidade de Aveiro (orientador)

Jaime dos Santos Cardoso

Professor Associado com Agrega¸c˜ao da Faculdade de Engenharia da Universidade do Porto

(6)

(7)

agradecimentos / acknowledgements

Gostaria de agradecer ao Professor Dr. Ant´onio Neves que sempre me apoiou e me orientou no decorrer deste trabalho e que sempre mostrou total disponibilidade nos momentos em que necessitei de ajuda.

Aos meus amigos que me proporcionaram memórias e histórias únicas que nunca irei esquecer. Espero que estas amizades sejam para sempre.

Ao Ricardo Ribeiro, o meu colega de trabalho e amigo que sempre me ajudou e passou muitas horas a trabalhar a meu lado.

Finalmente, um agradecimento MUITO especial aos meus pais, ao meu irmão e à minha namorada que sempre me apoiaram tanto psicologicamente como financeiramente. Sem eles, não teria as oportunidades que tive ao longo destes anos.

A todos vocˆes que contribu´ıram para o meu sucesso acad´emico e pessoal, um muito obrigado.

(8)

(9)

Palavras Chave Reconhecimento facial, verifica¸cão facial, dete¸cão facial, processamento de imagem, aquisi¸cão de imagem, calibra¸cão de câmaras.

Resumo O reconhecimento facial tem vindo a receber bastante aten¸c˜ao ao longo dos ´

ultimos anos não só na comunidade cient´ıfica, como também no ramo com-ercial. Uma das suas várias aplica¸cões é o seu uso num controlo de acessos onde um indiv´ıduo tem uma ou várias fotos associadas a um documento de identifica¸cão (também conhecido como verifica¸cão de identidade).

Embora atualmente o estado da arte apresente muitos estudos em que tanto apresentam novos algoritmos de reconhecimento como melhorias aos já de-senvolvidos, existem mesmo assim muitos problemas ligados a ambientes não controlados, à aquisi¸cão de imagem e à escolha dos algoritmos de dete¸cão e de reconhecimento mais eficazes.

Esta tese aborda um ambiente desafiador para a verifica¸cão facial: um cenário não controlado para o acesso a infraestruturas desportivas. Uma vez que não existem condi¸cões de ilumina¸cão controladas nem plano de fundo controlado, isto torna um cenário complicado para a implementa¸cão de um sistema de verifica¸cão facial.

Esta tese apresenta um estudo sobre os mais importantes algoritmos de dete¸cão e reconhecimento facial assim como técnicas de pré-processamento tais como o alinhamento facial, a equaliza¸cão de histograma, com o ob-jetivo de melhorar a performance dos mesmos. Também são apresentados dois métodos para a aquisi¸cão de imagens envolvendo a sele¸cão de imagens e calibra¸cão da câmara.

São apresentados resultados experimentais detalhados baseados em duas bases de dados criadas especificamente para este estudo. No uso de técnicas de pré-processamento apresentadas, foi poss´ıvel presenciar melhorias até 20% do desempenho dos algoritmos de reconhecimento referentes à veri-fica¸cão de identidade. Com os métodos apresentados para os testes ao ar livre, foram conseguidas melhorias na ordem dos 30%.

(10)

(11)

Keywords Face recognition, face verification, face detection, image processing, image acquisition, camera calibration.

Abstract Face Recognition has been received great attention over the last years, not only on the research community, but also on the commercial side. One of the many uses of face recognition is its use on access control systems where a person has one or several photos associated to an Identification Document (also known as identity verification).

Although there are many studies nowadays, both presenting new algorithms or just improvements of the already developed ones, there are still many open problems regarding face recognition in uncontrolled environments, from the image acquisition conditions to the choice of the most effective detection and recognition algorithms, just to name a few.

This thesis addresses a challenging environment for face verification: an unconstrained environment for sports infrastructures access. As there are no controlled lightning conditions nor controlled background, this makes a difficult scenario to implement a face verification system.

This thesis presents a study of some of the most important facial detection and recognition algorithms as well as some pre-processing techniques, such as face alignment and histogram equalization, with the aim to improve their performance. It also introduces some methods for a more efficient image acquisition based on image selection and camera calibration, specially de-signed for addressing this problem.

Detailed experimental results are presented based on two new databases created specifically for this study. Using pre-processing techniques, it was possible to improve the recognition algorithms performances up to 20% re-garding verification results. With the methods presented for the outdoor tests, performances had improvements up to 30%.

(12)

(13)

List of Figures

1.1 Face Verification used to identify persons entering in a stadium. A photo is taken and it will check if it matches with the person whom the NFC card or QR code belongs to. If there is a match, the turnstile unlocks and the person

is allowed to get in the infrastructure. . . 1

1.2 Fundamental steps for a face recognition system [1]. . . 2

2.1 An example of Face Detection in a digital image. . . 5

2.2 Most important approaches for Face Detection [5]. . . 6

2.3 An example of Face Recognition in a digital image. . . 7

2.4 Most important approaches for Face Recognition. . . 8

2.5 Haar Features [39]. . . 9

2.6 Integral image where (x,y) is the sum of the pixels in blue [40]. . . 10

2.7 The first and second features selected by AdaBoost [11]. . . 11

2.8 AdaBoost Classifier behavior [40]. . . 12

2.9 Schematic description of a the detection cascade [11]. . . 13

2.10 LBP Operator [14]. . . 13

2.11 Images from the various stages of generating a Histogram of Oriented Gradi-ents feature vector. (a) Original pedestrian image, scaled to 20x40 pixels, (b) gradient image, (c) image divided into cells of 5x5 pixels, resulting in 4x8 cells, (d) resulting HOG descriptor for the image showing the gradient orientation histograms in each cell [15]. . . 14

2.12 An overview of our feature extraction and object detection chain [15]. . . 14

2.13 Representation of images on a single column with EigenFaces method [41] . . 15

2.14 Projection (in red) is in the face space and near a face class. The axis are solely representative as they only show that EigenFaces projects the points into a vector. . . 17

(18)

2.15 Projection (in red) is in the face space but far from the class. The axis are solely representative as they only show that EigenFaces projects the points into

a vector. . . 17

2.16 Projection (in red) is near a possible class but far from the face space. The axis are solely representative as they only show that EigenFaces projects the points into a vector. . . 17

2.17 Projection (in red) far from the class and far from the face space. The axis are solely representative as they only show that EigenFaces projects the points into a vector. . . 18

2.18 A comparison of PCA and FLD for a two classes problem where data for each class lies near a linear subspace [17]. . . 19

2.19 Feature Extraction with LBP operator with monotonic different intensity change on face photos [43]. . . 21

2.20 The circular neighbourhood and possible captures [43]. . . 22

2.21 Circularly neighbour-sets for three different values of P (number of neighbors) and R (radius value) [27]. . . 22

2.22 Face image divided into 64 regions, with a histogram for every region [27]. . . 23

3.1 Detection Factory represented on a Class Diagram . . . 27

3.2 Recognition Factory represented on a Class Diagram . . . 28

3.3 Input and Outputs of the first test. . . 30

3.4 Input and Output of the second test. . . 31

3.5 Input and Output of the third test. . . 33

3.6 Activity diagram when person checks in to enter the infrastructure. . . 33

3.7 Activity diagram when person is trying to enter the infrastructure. . . 34

3.8 An example of the database used for class training for further recognition tests. 35 3.9 An example of three images used on CAMBADA dataset. . . 35

3.10 Successful faces detected and false positives for each algorithm. . . 36

3.11 Ratio between false positives and faces detected (on the left) and ratio between the number of faces detected and the total of faces on the images (on the right). 37 3.12 ROC curve of performance for EigenFaces, FisherFaces and LBP on verfication. 39 4.1 The three degrees of freedom of a human head can be described by the ego-centric rotation angles pitch, roll, and yaw [31]. . . 42

4.2 Orientation of the head in terms of pitch, roll, and yaw movements describing the three degrees of freedom of a human head [31]. . . 42

(19)

4.4 68 points of Face Landmarking by dlib. . . 43

4.5 Face alignment technique for cropping and comparison. . . 44

4.6 ROC curves comparing verification results with and without face alignment technique for EigenFaces algorithm. . . 45

4.7 ROC curves comparing verification results with and without face alignment technique for FisherFaces algorithm. . . 46

4.8 ROC curves comparing verification results with and without face alignment technique for LBP algorithm. . . 46

4.9 Face with and without alignment. . . 47

4.10 Intensities Histogram of a face non-aligned vs face aligned. . . 47

4.11 Histogram equalization for an image with very low contrast. . . 49

4.12 Image with and without histogram equalization . . . 50

4.13 ROC curves comparing verification results with and without histogram equal-ization technique for EigenFaces algorithm. . . 50

4.14 ROC curves comparing verification results with and without histogram equal-ization technique for FisherFaces algorithm. . . 51

4.15 ROC curves comparing verification results with and without histogram equal-ization technique for LBP algorithm . . . 52

4.16 Sub-Image with and without histogram equalization [44]. . . 52

4.17 LBP operator on pixel x in sub-image with and without equalization. . . 53

4.18 ROC curves comparing verification results of EigenFaces algorithm with faces cropped with resolutions of 64 × 64, 128 × 128, 256 × 256 and 512 × 512 pixels. 54 4.19 ROC curves comparing verification results of FisherFaces algorithm with faces cropped with resolutions of 64 × 64, 128 × 128, 256 × 256 and 512 × 512 pixels. 55 4.20 ROC curves comparing verification results of LBP algorithm with faces cropped with resolutions of 64 × 64, 128 × 128, 256 × 256 and 512 × 512 pixels. . . 56

4.21 ROC curves showing the FisherFaces behavior when training the same person into two classes. . . 58

4.22 Line for projecting class points not calculated correctly [41]. . . 59

4.23 ROC curves showing the FisherFaces with 6, 8 and 10 features extraction. . . 60

5.1 An example of two images acquired for face detection and further construction of the dataset. On the left the image was acquired at 9AM and on the right the image was acquired at 2PM of the same day. . . 62

5.2 Images taken to a stapler with different exposure times. . . 63

(20)

5.4 State diagram of the calibration method proposed: a solution that combines the exposure together with the camera gain in order to enhance the face found on the image. . . 65 5.5 Comparison between the automatic exposure calibration and the method

pro-posed. . . 65 5.6 Person with little yaw rotation where d1 ' d2 . . . 67

5.7 Person with a high yaw rotation where d1 d2 . . . 67

5.8 ROC curves showing the results of EigenFaces algorithm performance between the auto-exposure calibration and the calibration proposed. . . 68 5.9 ROC curves showing the results of FisherFaces algorithm performance between

the auto-exposure calibration and the calibration proposed. . . 69 5.10 ROC curves showing the results of LBP algorithm performance between the

auto-exposure calibration and the calibration proposed. . . 70 5.11 ROC curves showing the results of EigenFaces algorithm performance with and

without the image selection method. . . 72 5.12 ROC curves showing the results of FisherFaces algorithm performance with

and without the image selection method. . . 73 5.13 ROC curves showing the results of LBP algorithm performance with and

(21)

List of Tables

3.1 Processing Time: FisherFaces vs LBP vs EigenFaces. . . 40 4.1 Processing Time for different with and without face alignment of the face images. 48 4.2 Processing Time for different with and without equalization on the cropped

face images. . . 53 4.3 Processing Time for different resolutions of the cropped face images. . . 57 4.4 Vectors storage space changes according to face images resolutions . . . 57

(22)

(23)

Chapter 1

Introduction

Face verification is one of the two tasks that can be done with face recognition. It is performed when it is taken a photo of an unknown person with a claim of identity, deciding whether the person is who he/she claims to be by comparing the photo taken to the photos database linked to the claim of identity.

This process is specially useful in situations when a person is entering in a private space and he/she carries some sort of identification with him/her (as shown on Figure 1.1). Traditionally, this process is done manually, involving a person (i.e. a security guard) doing the verification. However, over the years, there is appearing some interest on the automatisation of this process. Although there are many advantages, such as faster and less expensive systems, there is still many open problems in order to make this system viable and reliable.

Figure 1.1: Face Verification used to identify persons entering in a stadium. A photo is taken and it will check if it matches with the person whom the NFC card or QR code belongs to. If there is a match, the turnstile unlocks and the person is allowed to get in the infrastructure.

(24)

Can it be done using face detection and recognition algorithms? Which algorithms are better for this application? Can some pre-processing techniques improve the performance? These are the type of questions that are studied in this thesis.

Figure 1.2 presents the fundamental steps in order to create a face recognition system. As it is possible to see, a face recognition system starts by acquiring a digital image using a camera, then it detects the faces that are on the image for further feature extraction and feature matching. These are some key terms regarding this topic explained next.

• Digital Image: Image or picture represented digitally, i.e. as a matrix of numbers digitally represented, also called pixels.

• Face Normalization: Method of normalize face images (i.e. rotation, translation) in order to cancel head rotation.

• Feature Extraction: Process that extracts the most important features that characterize the face of a person (i.e. distance between the nose and the eyes, distance between the eyes, among others) and present them in form of vectors or histograms.

• Feature Matching: Also known as comparison, it is a process that compares vectors or histograms that were calculated on the face images stored in the database and the same information extracted of the face image presented to the recognition system. This process then calculates the similarity between those vectors.

Figure 1.2: Fundamental steps for a face recognition system [1].

All of the development and the experimental analysis were done according to a project that it is being developed by the company Exclusivkey, Lda that involves the development

(25)

of an innovative access system using face verification in order to control the access at sports infrastructures.

1.1 Main Contributions

The main goal of this work was to study and develop software that evaluates face detection and face recognition algorithms for person verification. Also, some pre-processing techniques were developed and studied in order to improve the verification accuracy. Lastly, some tests on an unconstrained environment were done in order to analyze the behavior of each algorithm. As the result of this work, the main contributions are:

• Implementation of a common software interface based on a design pattern for both detec-tion and recognidetec-tion algorithms.

• Performance Evaluation of detection and recognition algorithms regarding verification. • Developing and study of pre-processing techniques in order to improve verification accuracy. • Outdoor tests using a proposed camera calibration and image selection techniques. • A scientific paper with the main studies and results achieved on this thesis that will be

presented on VipIMAGE 2017 conference that will take place in Porto, Portugal, on 18th-20th of October.

1.2 Thesis Structure

This document is split into the following chapters:

• Chapter 2 is dedicated to explain the Face Detection and Recognition problem. It also explains the algorithms used during the work and how they operate.

• Chapter 3 explains what are design patterns and how they were used regarding this work. It also presents the design pattern diagrams and the test programs used to test them. Finally, it presents performance evaluation and processing times for both detection and recognition algorithms.

• Chapter 4 presents some pre-processing techniques that may improve some of the recog-nition algorithms accuracy.

• Chapter 5 includes outdoor tests with and without a new camera calibration approach. It also presents an implementation of image selection by head pose.

(26)

• Chapter 6 presents some generic conclusions about the developed work and what could be the future work on this subject.

(27)

Chapter 2

Face Detection and Recognition

This chapter introduces the problem of face detection and face recognition and their applications on everyday life. Moreover, some of the most important existing algorithms are explained and later used in the experimental results of this thesis.

2.1 Face Detection

Face detection determines the presence and location of a face in an image, by distinguishing the face from all other patterns present in the scene [2]. An example of the resulting detected faces is presented in Figure 2.1.

Figure 2.1: An example of Face Detection in a digital image.

Human face detection plays an important role in applications such as video surveillance, human computer interface, face recognition, and face image database management [3]. On recent years, face recognition has attracted much attention and its research has rapidly ex-panded by not only engineers but also neuroscientists, since it has many potential applications in computer vision communication and automatic access control systems. Specifically, face

(28)

detection is an important part of face recognition as the first step of automatic face recog-nition. However, face detection is not a straightforward task since the face can suffer from several variations of image appearance, such as pose variation (front, non-front), occlusion, image orientation, illumination conditions and facial expression [4].

Through the years, there have been presented algorithms ranging from simple edged-based algorithms to compose high-level approaches using pattern recognition methods [5]. Figure 2.2 shows the different approaches for face detection.

(29)

2.2 Face Recognition

Face recognition is a biometric method of identifying an individual by comparing an image containing a face with a database of labeled face images.

For almost every human, face recognition is an easy task. Experiments made in [6] show that, even one to three day old babies are able to distinguish between known faces. Although it is easy for an average human, the same can not be applied for computers. One of the reasons for face recognition attract so much research attention and sustained development over the past 30 years is its great potential in numerous government and commercial applications [7].

Figure 2.3: An example of Face Recognition in a digital image. Face recognition can be split into two primary tasks [8]:

• Verification (one-to-one matching): When presented a face image of an unknown in-dividual along with a claim of identity, verifying whether the inin-dividual is who he/she claims to be.

• Identification (one-to-many matching): Given an image of an unknown individual, determining that person’s identity by comparing (possibly after encoding) that image with a database of (possibly encoded) images of known individuals.

Some of many applications of facial recognition are [9]:

• Face ID: Driver licenses, entitlement programs, immigration, national ID, passports, voter registration, welfare registration.

• Access Control: Border-crossing control, facility access, vehicle access, smart kiosk and ATM, computer access.

• Security: Terrorist alert, secure flight boarding systems, stadium audience scanning, com-puter security, comcom-puter application security.

(30)

• Surveillance: Advanced video surveillance, nuclear plant surveillance, park surveillance, neighbourhood watch, power grid surveillance, CCTV control.

• Law enforcement: Crime stopping and suspect alert, shoplifter recognition, suspect track-ing and investigation, suspect background check.

• Multimedia management: Face-based search, face-based video segmentation and sum-marisation, event detection.

Face Recognition techniques

The method for image acquisition depends on the application. If it is a surveillance application it may capture face images by a video camera while image database investigations may require static intensity images taken by a standard camera. Some other applications, such as access to top security domains, may even necessitate the forgoing of the non-intrusive quality of face recognition by requiring the user to stand in front of a 3D scanner or an infra-red sensor. [8]

As there are different use cases, the recognition techniques are also different. These different techniques can be divided into categories as shown on Figure 2.4.

Figure 2.4: Most important approaches for Face Recognition.

The algorithms studied on this thesis are holistic and feature based approaches since the study is used with 2D images. These approaches are explained next [10].

(31)

Holistic Approaches: in holistic approaches, the whole face region is taken into account as input data into the face recognition system.

Feature Based Approaches: in feature-based methods (also known as structural meth-ods), local features such as eyes, nose and mouth are extracted. Then, their locations and local statics (geometric and/or appearance) are used into a structural classifier.

2.3 Face Detection Algorithms

In this section there are explained with some detail three algorithms for face detection: Haar Cascades, Local Binary Patterns Cascade and the Histogram of Oriented Gradients. These algorithms are going to be used during this thesis in order to evaluate which one has the best performance for the proposed problem.

2.3.1 Haar Cascades

This method was first proposed in [11] showing a machine learning approach for visual object detection and, in this case, the faces are the object trained. At the beginning, the algorithm is fed with positive images (images of faces) and negative images (images without faces) in order to train the classifier. The more images used, the higher the accuracy will be. Then, Haar features are extracted as shown on Figure 2.5. Each feature is a single value obtained by subtracting sum of pixels under white rectangle from sum of pixels under black rectangle.

Figure 2.5: Haar Features [39].

This feature extraction can be computationally expensive since it needs to calculate through all the pixels and its neighbors of an image. In order to solve the referenced problem, this algorithm introduces the concept of integral image that can be computed from an image using a few operations per pixel.

(32)

An integral image is computed by calculating for each pixel (x, y) the sum of the pixel values above and to the left of (x, y), inclusive (shown in blue on Figure 2.6). Equation 2.1 describes the calculation of the value of the integral image at location x, y where ii(x, y) is the integral image and i(x, y) is the original image.

ii(x, y) = X

x0_≤x,y0_6y

i(x0, y0) (2.1)

Figure 2.6: Integral image where (x,y) is the sum of the pixels in blue [40].

Still, most of the features calculated are irrelevant. Figure 2.7 shows two important features. The first feature focuses on the eyes which are often darker than the region on the cheeks and the second one focuses on the fact that the eyes are darker than the center of the nose. Although these are important features, the same feature extraction on the cheeks or in the forehead is totally irrelevant. Since there are several features that can be considered as important ones, there is the need of a classifier that selects a small number of important features. For that, AdaBoost is used.

Figure 2.7 shows the features selected by Adaboost. The two features are shown in the top row and then overlayed on a typical training face in the bottom row. The first feature measures the difference in intensity between the region of the eyes and a region across the upper cheeks. The feature capitalizes on the observation that the eye region is often darker than the cheeks. The second feature compares the intensities in the eye regions to the intensity across the bridge of the nose. [11]

As AdaBoost is used, the final classifier is the weighted sum of the weak classifiers (shown on Figure 2.8). A better explanation of the AdaBoost algorithm is written below [11]:

• Given example images (x1, y1), ..., (xn, yn) where yi = 0 is for negative examples and yi = 1

is for positive examples. • Initialize weights w1,i= 1 2m, 1 2l yi = 0, 1 (2.2)

(33)

Figure 2.7: The first and second features selected by AdaBoost [11].

where l is the number of positives and m is the number of negatives. • For t = 1, ..., T :

1. Normalize the weights

wt,i ←

wt,i

Pj=1

n wt,j

(2.3) 2. For each feature, j, train a classifier hj, which is restricted to using a single feature.

The error is evaluated with respect to wt, j =Piwi|hj(xi) − yi|.

3. Choose the classifier, ht, with the lowest error t.

4. Update the weights

wt+1,i= wt,iβt1−ei (2.4)

where ei = 0 if example xi is classified correctly, ei = 1 otherwise, and:

βt=

t

1 − t

(2.5) • The final strong classifier is:

h(x) = ( 1 PT t=1αtht(x) ≥ 1₂PTt=1αt 0 otherwise (2.6) where, αt= log 1 βt (2.7) With this boosting algorithm it is selected only one feature in 180.000 potential features.

(34)

Figure 2.8: AdaBoost Classifier behavior [40].

Although AdaBoost decreases drastically the number of features applied to an image, there are still many features extracted in an image. For example, on this stage, an image with 24x24 pixels would still need to apply 6000 features to it. So, for another features reduction, the concept of Cascade Classifiers was applied.

Instead of applying all of the features on an image sub-window, Cascade Classifiers groups the features into different stages of classifiers and apply one-by-one as shown on Figure 2.9. A series of classifiers are applied to every sub-window. The initial classifier eliminates a large number of negative examples with very little processing. Subsequent layers eliminate additional negatives but require additional computation. After several stages of processing the number of sub-windows have been reduced radically. Further processing can take any form such as additional stages of the cascade or an alternative detection system. [11]

(35)

Figure 2.9: Schematic description of a the detection cascade [11].

2.3.2 LBP Cascade

Local Binary Patterns Cascade [12] has a very similar process comparing to Haar Cas-cades explained on Section 2.3.1 regarding its use of AdaBoost algorithm and the Cascade Classifiers. Despite these similarities, the feature extraction is different. The LBP operator [14] labels the pixels of an image by applying a filter/kernel with a window of 3×3 centered on each pixel and considering as a binary number which characterize the texture of surrounding of the pixel.

The main difference between Haar and LBP cascades is the set of features used to form the weak learners where LBP uses the encoding of the differences between the center of the pixel and its neighbors. These differences can be encoded on a single bit for each neighbor where ’0’ means that the pixel is dimmer and ’1’ is brighter than the center pixel. Figure 2.10 shows an example of this operation. Thus, the computation can be done on integer space which makes faster the face detection as it produces only binary values.

Figure 2.10: LBP Operator [14].

2.3.3 Histogram of Oriented Gradients (HOG)

Histogram of Oriented Gradients are generally used in computer vision, pattern recogni-tion and image processing to detect and recognize visual objects [15]. The extracrecogni-tion method for HOG algorithm [16] is based on the idea that local objects (in this case faces), can be

(36)

described by the distribution of local intensity gradients or edge directions. The image is divided into small regions called cells and, for each cell, it is calculated a local 1-D histogram of gradients or edge orientations over the pixels of the cell. The combined histograms entries form the representation. Then, a contrast-normalization is made to the cells to make the im-age more immune to illumination variances and shadowing. HOG descriptors are extracted by counting the occurrence of edges orientations in a local neighborhood (cells) of an image. Figure 2.11 shows this process.

Figure 2.11: Images from the various stages of generating a Histogram of Oriented Gradients feature vector. (a) Original pedestrian image, scaled to 20x40 pixels, (b) gradient image, (c) image divided into cells of 5x5 pixels, resulting in 4x8 cells, (d) resulting HOG descriptor for the image showing the gradient orientation histograms in each cell [15].

Finally, HOG descriptors are sent to a linear SVM (Support Vector Machine) for face/non-face classification. The final scheme of the HOG face/non-face detector is shown on Figure 2.12. The detector window is tiled with a grid of overlapping blocks in which Histogram of Oriented Gradient feature vectors are extracted. The combined vectors are fed to a linear SVM for object/non-object classification [15].

Figure 2.12: An overview of our feature extraction and object detection chain [15].

2.4 Face Recognition Algorithms

In this section, there are explained three of the most important algorithms for face recog-nition: EigenFaces, FisherFaces, and Local Binary Patterns. These algorithms are going to be studied during this thesis as alternatives to the system studied.

(37)

2.4.1 EigenFaces

Since correlation methods are computationally expensive and require considerable amounts of storage, it is natural to pursue dimensionality reduction schemes [17]. For this dimension-ality reduction on the face recognition the EigenFaces method [18] was developed.

Considering that a face image has a dimension of N ∗ N of intensity values. This photo can be represented as a vector of dimension N2 _{where all the columns are stacking into a}

single one as shown on Figure 2.13. This, in another dimension perspective, each image can be seen as a point.

Figure 2.13: Representation of images on a single column with EigenFaces method [41]

This set of points representing images are calculated through Principal Component Anal-ysis (PCA) reduces linear projection that maximizes the scatter of all projected samples by finding vectors that describe the face images distribution in space. These vectors are called eigenvectors and each of them is a linear combination of the original face images.

Since eigenvectors have the same dimension as the original images, they are referred as EigenPictures [19] or EigenFaces [18].

The recognition process is made by projecting the new face into the face space and calcu-late the distance between each point already projected. The closer the point is to the other, the more similar is the face image projected with that face image.

It has been suggested in [20] that by throwing out the first several principal components, the variation due to the lightning is reduced. The hope is that if the first principal components capture the the variation due to light, then better clustering of projected samples is achieved by ignoring them. Yet, it is unlikely that the first PCAs correspond solely to variation in lightning; as consequence, information that is useful may be lost.

Although this algorithm is not so commonly used nowadays, there are still some current work by combining the feature extraction using principal component analysis with the feed of forward back propagation Neural Network [21].

(38)

The algorithm of EigenFaces is described next.

Considering a set of n faces (c columns and l rows and considering N = l = c ), X = {x₁, x2, ..., xn}, the mean µ can be computed as:

µ = 1 n n X i=1 xi (2.8)

And the covariance matrix as S as:

S = 1 n n X i=1 (xi− µ)(xi− µ)T. (2.9)

The faces differ by the vector xi− µ from the average µ. Considering λi the eigenvalues

and vi of S, it is possible to say that:

Svi= λivi , i = 1, 2, ..., n. (2.10)

Then, the eigenvectors are sorted by descending order by their eigenvalue. The k principal components are the eigenvectors corresponding to the k largest eigenvalues.

The k principal components of x is given by:

yk= WT(x − µ) , W = (v1, v2, ..., vk). (2.11)

The recognition is performed following these steps:

1. The samples trained are projected into the PCA subspace. 2. The face to recognize is also projected into the PCA subspace.

3. The nearest neighbour distance ξ between the face to recognize projection and the training images projections is found (using Euclidean Distance) and, according to a threshold δ, can lead to four possibilities [18]:

• The projection is near a face and near a face class (distance ξ is lower than δ) shown on Figure 2.14.

• The projection is near a face space but not near a known face class (distance ξ is higher than δ) shown on Figure 2.15.

• The projection is distant from face space and near a face class (considered that it is not a face) shown on Figure 2.16.

• The projection is distant from face space and not a known face class (considered that it is not a face) shown on Figure 2.17.

(39)

Figure 2.14: Projection (in red) is in the face space and near a face class. The axis are solely representative as they only show that EigenFaces projects the points into a vector.

Figure 2.15: Projection (in red) is in the face space but far from the class. The axis are solely representative as they only show that EigenFaces projects the points into a vector.

Figure 2.16: Projection (in red) is near a possible class but far from the face space. The axis are solely representative as they only show that EigenFaces projects the points into a vector.

(40)

Figure 2.17: Projection (in red) far from the class and far from the face space. The axis are solely representative as they only show that EigenFaces projects the points into a vector.

(41)

2.4.2 FisherFaces

Although Principal Component Analysis uses linear combination of features that maxi-mizes the total variance of data, that does not consider any classes and that might discriminate some information that is important for recognition. An example of this problem is where the variance in data is generated by an external source such as light. Components calculated by PCA do not necessarily have discriminative information, and the samples are projected all together and the classification becomes very difficult.

Linear Discriminant Analysis (LDA) was first introduced in [22] for flowers classification. In order to solve the problems presented by Principal Component Analysis, LDA uses a class-based approach where the points in the same class should be closer and the different classes should be as far as possible from each others. This was later used in [17] for face recognition (getting the name of FisherFaces).

Noteworthy to mention that in the original state of art the author uses Fisher Linear Dis-criminant (FLD) which has a very slightly different approach that does not make assumptions as LDA does, like normally distributed classes or equal class covariances. Figure 2.18 has a comparison between PCA and FLD approaches.

Figure 2.18: A comparison of PCA and FLD for a two classes problem where data for each class lies near a linear subspace [17].

Experiments made in [17] shows that the EigenFaces decreases significantly its perfor-mance when light variation occurs as the FisherFaces is not that affected. Also, head orienta-tion and background changes might be other sources of performance decreasing for EigenFaces.

(42)

In the the state of art, there is a popular solution that uses Fisher vectors on densely sampled SIFT features [23]. This descriptor has better recognition scores and it works well in large identification tasks.

The algorithm of FisherFaces is described next.

Let X be a random vector of samples from c classes with n images each:

X = X1, X2, ..., Xc, (2.12)

Xm = X1, X3, ..., Xn , m = 1, 2, ..., c. (2.13)

The scatter matrices SB and SW are defined as:

SB= c X i=1 Ni(µi− µ)(µ − µi)T, (2.14) SW = c X i=1 X xj∈Xi (xj− µi)(xj − µi)T. (2.15)

Where µ is calculated as in Equation 2.8 and µi is defined as:

µi= 1 |Xi| X xj∈Xi xj , i ∈ 1, 2, ..., c. (2.16)

For the maximization of class separability criterion, Fisher’s algorithm finds for the pro-jection Wopt = [v1v2...vk]: Wopt= argmaxW = |WT_S BW | |WT_S WW | . (2.17)

In [24], there is a solution for the optimization problem generated by the General Eigen-value Problem:

SBvi = λiSWvi ⇔ SW−1SBvi = λivi (2.18)

where vi, i = 1, 2, ..., k is the organized eigenvectors of SW and SB corresponding to the

kth largest eigenvalues λi, i = 1, 2, ..., k.

After all the projections are calculated and projected into the subspace, it is calculated the distance (Euclidean distance) between the face to recognize projection and the classes on the database just like in EigenFaces.

(43)

2.4.3 Local Binary Patterns Histograms (LBP or LBPH)

EigenFaces and FisherFaces use a holistic approach by treating the data as vectors in a high-dimensional image space. Local Binary Patterns (LBP), explained on Section 2.3.2, does not take images as vectors but instead describe local features of an object (in this case the faces features).

LBP operator is robust against monotonic intensity transformations. Figure 2.19 shows the texture extraction with different intensity averages that have little or no change on the extraction process.

Figure 2.19: Feature Extraction with LBP operator with monotonic different intensity change on face photos [43].

The current state of art presents some solutions such as Local Ternary Patterns [25] which basically is a generalization of the LBP that is more discriminant and less sensitive to noise in uniform regions.

The LBP algorithm is described next.

Considering (xc, yc) the central pixel with intensity ic, ip the intensity of the neighbor

pixel and P the number of neighbors, the LBP operator can be obtained as:

LBP (xc, yc) = P −1

X

p=0

2ps(ip− ic) (2.19)

and s is the sign function corresponding to the binary values:

s(ip− ic) =

(

1, (ip− ic) ≥ 0

0, (ip− ic) < 0

(2.20)

In [26] it was proposed to use a variable neighborhood in order to encode details that are in different scales. The main idea is to align an arbitrary number of neighbors on a circle with a variable radius, which enables to capture neighborhoods shown on Figure 2.20.

(44)

Figure 2.20: The circular neighbourhood and possible captures [43].

Given a radius R, the neighbour position (xp, yp) according to the central pixel (xc, yc)

can be determined by:

xp= xc+ Rcos( 2πp p ) (2.21) yp = yc− Rsin( 2πp p ). (2.22)

Figure 2.21 shows different radius that consequently change the neighbor pixels that are going to be compared.

Figure 2.21: Circularly neighbour-sets for three different values of P (number of neighbors) and R (radius value) [27].

The values calculated in Equation 2.21 are used to form histograms in order to see patterns. These histograms produce feature vectors [26] that contain the best features which best define a face of a person.

For a better and more efficient representation the image will be divided into n ∗ n regions and therefore there will be n2 histograms, one for each region as represented on Figure 2.22.

(45)

Figure 2.22: Face image divided into 64 regions, with a histogram for every region [27].

For the comparison of two faces, being one the sample S and the other the model M , the feature vectors extracted from both need to be compared. This can be done by three ways [27]: • Histogram intersection D(S, M ) = n2 X j=1 ( P (P −1)+3 X i=1

min(Si,j, Mi,j)) (2.23)

• Log-likelihood statistic L(S, M ) = n2 X j=1 (− P (P −1)+3 X i=1

Si,jlog(Mi,j)) (2.24)

• Chi square statistic x2

x2(S, M ) = n2 X j=1 ( P (P −1)+3 X i=1 (Si,j− Mi,j)2 Si,j+ Mi,j ) (2.25)

As there are regions with more important information (such as region of the eyes), there must be a weight for each region. According to [26] the Chi square statistic x2 performs slightly better than the other two. By applying weights to its equation, it becomes:

x2(S, M ) = n2 X j=1 ( P (P −1)+3 X i=1 wj (Si,j− Mi,j)2 Si,j+ Mi,j ), (2.26)

(46)

2.5 Final Remarks

Once studied this detailed explanation of each algorithm, it is possible to say that each one of them has a different behavior.

Regarding the face detection algorithms, it is expected that there is going to be major differences between the HoG algorithm and the other two (Haar casacades and LBP cascade). This is due to the fact that Haar Cascade and LBP cascade have a very similar approach and, as for the HoG, its approach is quite different.

As for the recognition algorithms this means that, a holistic approach will have a different behavior comparing to a feature-based one since their algorithms are quite different. As for the two algorithms with holistic approaches (EigenFaces and FisherFaces) it is expected that, since one is based on the other, the behaviors do not differ that much compared to each other. However, since this thesis focuses on an unconstrained environment, it is expected that EigenFaces will perform quite poorly since it will appear many lightning variations through the tests.

(47)

Chapter 3

Software and Performance

Evaluation on Verification

In order to test all the algorithms in a quick and easy way, it was developed a software library in C++ programming language to incorporate the studied algorithms for face detection and identification under the same interface to the programmer. A certain level of abstraction was taken into account in order to be easier to add a new algorithm to the library. This chapter has the intend of explain some of the software patterns used and contextualize them on this specific case.

This chapter is also dedicated to the evaluation of each algorithm for facial detection and recognition and a comparison between them. Also, there is an explanation about the creation of a dataset used for experimental results and some remarks about the recognition methods used on OpenCV.

3.1 Design Patterns

In software engineering, design patterns appear as a solution to some of the problems regarding software design. They describe the best practices used by software developers providing general solutions that are documented in a way that does not require specifications linked to the particular problem.

Design Patterns are divided into three main categories [28]:

• Creational Patterns: These patterns provide a way of instantiate classes using inheri-tance effectively in the instantiation process and using delegation in object creation patterns. Some of these patterns are:

(48)

Builder: Creates an instance of several families.

Factory Method: Creates an instance of several families.

Object Pool: Avoid expensive acquisition and release of resources by recycling objects that are no longer in use.

Prototype: A fully initialized instance to be copied or cloned. Singleton: A class of which only a single instance can exist.

• Structural Patterns: These patterns concern class and object composition. Class cre-ation patterns use inheritance to compose interfaces while object crecre-ation define ways to compose objects to obtain new functionalities. Some of these patterns are:

Adaptor: Match interfaces of different classes.

Bridge: Separates an objects interface from its implementation. Proxy: An object representing another object.

• Behavioral Patterns: These patterns are specifically concerned with communication be-tween objects. Some of them are:

Chain of responsibility: A way of passing a request between a chain of objects. Command: Encapsulate a command request as an object.

Interpreter: A way to include language elements in a program.

3.2 Developed Interfaces

Regarding this work, the factory pattern was used in order to provide the best way to create an object without exposing the creation logic and refer to the newly created object using a common interface. Two factories patterns were designed being one for detection and another for recognition. The factories allow the user to choose which algorithm is going to be used and some common functions are provided (i.e. detecting faces, training classes, comparing faces with classes, debugging, among others).

Figures 3.1 and 3.2 show the factories patterns for both detection and recognition. The interface is implemented using abstract classes by declaring the common functions as pure virtual functions.

(49)

(50)

(51)

3.3 Test Programs

In order to test some of the functions used on the detection and recognition factories, some test programs were developed that will help to perceive what is the outcome of the function when an specific input is entered.

3.3.1 Detect multiple faces

The first test is the analysis of the detect_multi_face(Mat) function that, given an image, it will detect every faces on the photo and return them on a vector of images. The detection method is chosen before the program running.

// T h i s program t e s t w i l l t e s t t h e d e t e c t m u l t i f a c e ( Mat ) // I n p u t : Image w i t h one o r more f a c e s

// D e t e c t i o n Method : 1−Haar 2−LBP 3−HoG // Output : Face c r o p p e d

u n s i g n e d i n t met = a t o i ( a r g v [ 1 ] ) ;

F a c e D e t e c t i o n ∗ a = D e t e c t o r F a c t o r y : : c r e a t e D e t e c t o r ( met ) ; Mat image = imread ( ” . . / T e s t I m a g e s / f a c e s . j p g ” ) ;

v e c t o r <Mat> c r o p p e d f a c e s = a−>d e t e c t m u l t i f a c e ( image ) ; S t r i n g f i l e n a m e ; c h a r f i l e n a m e [ 1 0 0 ] ; f o r ( i n t j = 0 ; j < c r o p p e d f a c e s . s i z e ( ) ; j ++){ s p r i n t f ( f i l e n a m e , ” t e s t 1 %d . j p g ” , j + 1 ) ; a−>debug ( c r o p p e d f a c e s [ j ] ) ; i m w r i t e ( f i l e n a m e , c r o p p e d f a c e s [ j ] ) ; }

Listing 3.1: Program that tests the detect multi face(). All faces detected on image should be expected as output.

Figure 3.3a and 3.3b are showing the input and the output respectively of the first test program.

(52)

(a) Input

(b) Output

Figure 3.3: Input and Outputs of the first test.

3.3.2 Detect the predominant face

As for the second test, an analysis is made to the detect_face(Mat) function that, given an image, it will detect every faces on the image and choose the face that has the larger resolution and return it. The detection method is chosen before the program running.

// T h i s program t e s t w i l l t e s t t h e d e t e c t f a c e ( Mat ) // I n p u t : Image w i t h one o r more f a c e s

// D e t e c t i o n Method : 1−Haar 2−LBP 3−HoG // Output : The l a r g e s t f a c e on t h e photo c r o p p e d

u n s i g n e d i n t met = a t o i ( a r g v [ 1 ] ) ;

F a c e D e t e c t i o n ∗ a = D e t e c t o r F a c t o r y : : c r e a t e D e t e c t o r ( met ) ; Mat image = imread ( ” . . / T e s t I m a g e s / f a c e s 2 . JPG ” ) ;

Mat c r o p p e d f a c e = a−>d e t e c t f a c e ( image ) ; S t r i n g f i l e n a m e ;

c h a r f i l e n a m e [ 1 0 0 ] ;

s p r i n t f ( f i l e n a m e , ” t e s t 2 %d . j p g ” , 1 ) ; i m w r i t e ( f i l e n a m e , c r o p p e d f a c e ) ;

Listing 3.2: Program that tests the detect face(). Largest face detected on image should be expected as output.

Figure 3.4a and 3.4b are showing the input and the output respectively of the second test program.

(53)

(a) Input

(b) Output

Figure 3.4: Input and Output of the second test.

3.3.3 Recognize faces

Finally, the third test has the purpose of testing the recognition process as it tests the class training and further comparison for face recognition. The result will be the level of confidence (explained on Section 3.7) between the class trained and the new face photo. The recognition method is also chosen before the program running through terminal input. // T h i s program t e s t w i l l t e s t t h e t r a i n ( v e c t o r <Mat>, <v e c t o r >I n t ) // and t h e c o m p a r a t o r ( Mat ) f u n c t i o n s // // I n p u t : Database w i t h X f a c e p h o t o s f o r c l a s s t r a i n i n g // Face Photo o f t h e p e r s o n f o r r e c o n g i t o n // R e c o g n i t i o n method : 1−E i g e n F a c e s // 2− F i s h e r F a c e s // 3−LBP // Output : L e v e l o f c o n f i d e n c e between t h e c l a s s // t r a i n e d and t h e f a c e photo u n s i g n e d i n t met = a t o i ( a r g v [ 1 ] ) ; S i z e s i z e ( 2 5 6 , 2 5 6 ) ; v e c t o r <Mat> p e r s o n t o l a b e l ; i n t p e r s o n t o f i n d = 1 ; v e c t o r <cv : : S t r i n g > f n ;

(54)

v e c t o r <i n t > l a b e l s ;

F a c e R e c o g n i t i o n ∗b = R e c o g n i t i o n F a c t o r y : : c r e a t e R e c o g n i t i o n ( met ) ; S t r i n g f i l e n a m e = ” . / Artur / ∗ . j p g ” ;

Mat f a c e t o r e c o g n i z e = imread ( ” . . / T e s t I m a g e s / Artur . j p g ” ) ; r e s i z e ( f a c e t o r e c o g n i z e , f a c e t o r e c o g n i z e , s i z e ) ; c v t C o l o r ( f a c e t o r e c o g n i z e , f a c e t o r e c o g n i z e , CV BGR2GRAY ) ; g l o b ( f i l e n a m e , fn , f a l s e ) ; s i z e t c o u n t = f n . s i z e ( ) ; // number o f j p g f i l e s i n i m a g e s f o l d e r f o r ( s i z e t i =0; i <c o u n t ; i ++){ p e r s o n t o l a b e l . p u s h b a c k ( imread ( f n [ i ] ) ) ; // Save p e r s o n i m a g e s . l a b e l s . p u s h b a c k ( p e r s o n t o f i n d ) ; } f o r ( i n t i =0; i <l a b e l s . s i z e ( ) ; i ++){ c v t C o l o r ( p e r s o n t o l a b e l [ i ] , p e r s o n t o l a b e l [ i ] , CV BGR2GRAY ) ; r e s i z e ( p e r s o n t o l a b e l [ i ] , p e r s o n t o l a b e l [ i ] , s i z e ) ; } b−>t r a i n ( p e r s o n t o l a b e l , l a b e l s ) ; b−>c o m p a r a t o r ( f a c e t o r e c o g n i z e ) ;

Listing 3.3: Program that tests the train() and comparator() functions.

Figure 3.5a and 3.5b are showing the input and the output respectively of the third test program.

3.4 Proposed Scenario

Unlike many studies that use face recognition as a source of labeling all the faces detected on a digital image, this thesis has a different testing purpose. All of the tests regarding face recognition were done for verification as this is part of the development of a system for people verification when accessing a building with an ID Card or a ticket. The system is represented on Figures 3.6 and 3.7.

(55)

(a) Input

(b) Output on the terminal

Figure 3.5: Input and Output of the third test.

(56)

(57)

Taking into account the activities diagrams, the number of photos to train the class was chosen. As the person registers, the process of getting ready the ID card or ticket should be fast in order to not create a very long line of people. This time is limited and thus the camera might not be able to acquire a large number of face images. Also, when the machine is verifying the match between the image acquired at the moment and the image acquired when the person checks in, it might slow down the process of verification and, therefore, might also create a large line of people.

In order to solve these problems, it was defined that the tests used for recognition purposes would take five face images for class training as shown on Figure 3.8. In this way, the results might not be as precise as using more face photos but will be a realistic option for the situation outlined. Also, the resolution of the face images will be 256 × 256 until stated otherwise.

As it is showed on Figure 3.6 the database of each person is only defined when the person registers which gives some sort of freedom to choose which face images are saved on the database. It is also possible to calibrate the camera in order to cancel the illumination changes for a better detection and recognition. Some solutions are presented on chapter 5.

Figure 3.8: An example of the database used for class training for further recognition tests.

3.5 CAMBADA dataset

The dataset used on this and on the next Chapter for tests contains photos taken by the CAMBADA Robot Soccer Team with different cameras between the years 2006 and 2014. Figure 3.9 shows some of the images used for this dataset.

Figure 3.9: An example of three images used on CAMBADA dataset.

(58)

is created and posteriorly used is that the photos taken have different backgrounds, different illumination conditions and not every person is looking directly at the camera which makes a good test for both detection and recognition. Also, at the recognition process, the class trained and the photos tested for comparison have a two to five years difference, being the class training with faces taken from 2006 and 2007 and the comparison made is with photos from 2009 to 2011. This 5 year difference is due to the fact that, on the scenario exposed, not everyone needs to register every time that wants to enter the infrastructure (specially on sport stadiums that there are associated members that only renew their membership every 5 years). In order test this short ageing, there were used this 5 years difference.

3.6 Detection Results

The three different algorithms mentioned on Section 2.3 were tested in order to decide which one to use for further recognition. The results are shown on Figure 3.10. On this section, 14 different photos were used with 131 faces on it.

Figure 3.10: Successful faces detected and false positives for each algorithm.

Although the HoG has detected less faces, the amount of false positives that Haar and LBP have is huge and HoG had none. In order to have a different perspective, it is possible to see the ratio between false positives and successful detections and also the ratio between faces detected and the total faces that are on the photos tested. The results are shown on Figure 3.11.

(59)

Figure 3.11: Ratio between false positives and faces detected (on the left) and ratio between the number of faces detected and the total of faces on the images (on the right).

Evaluating both graphics it is possible to conclude that HoG algorithm is far better than Haar and LBP as it has no false positives and its successful ratio is close to 80%.

With this conclusions, it was chosen HoG as the face detection algorithm for further implementations and tests for facial recognition.

3.7 Remarks about Facial Recognition provided by OpenCV

As the pratical implementation of the algorithms for both detection and recognition was developed under the OpenCV library (due to its useful tools and implementations), it is important to clarify some aspects regarding the implementation of the recognition algorithms. OpenCV provides implementations of training and prediction for EigenFaces, FisherFaces and LBP. Regarding to training, a class is generated through the training of multiple images of a person’s face that contains the most important vectors obtained on that set of faces (vectors and extraction methods are different depending of which algorithm is chosen). Later on, this class will be compared to a face detected and it will give as output a level of confidence. The higher the level of confidence the more unlikely that the person’s face detected is the person trained on the class.

(60)

EigenFaces

conf idence = min(||y − yn||), n = 1, 2, ..., K (3.1)

where y is the new projection (face that it is compared) and yn are the K projected

samples representing the class trained points.

FisherFaces

conf idence = min(||W − Wn||), n = 1, 2, ..., K (3.2)

where W is the new projection (face that it is compared) and Wn are the K projected

samples representing the class trained points.

LBP conf idence = x2(S, M ) = n2 X j=1 ( P (P −1)+3 X i=1 (Si,j− Mi,j)2 Si,j+ Mi,j ) (3.3)

where S is the sample (face that it is compared) and M is the model created for the class trained.

These values of confidence will be used during this thesis as it is the value that helps to observe if the verification was successful and also to set a confidence value that will distin-guish between the person corresponding to the class trained and the others detected on an image.

3.8 Verification Results

The results for EigenFaces, FisherFaces and LBP algorithm are shown on Figure 3.12 by a Receiver Operating Characteristic (ROC) curve. Noteworthy to mention that FisherFaces is trained with two different persons into two different classes. The reason of this decision is explained on Section 4.4.

Comparing these three algorithms, it is possible to see that FisherFaces and LBP have a very similar performance getting a 50% of True Positive Rate (TPR) for 5% of False Positive Rate (FPR). As for EigenFaces, the performance is very poor, having the TPR value always similar to the FPR.

(61)

Figure 3.12: ROC curve of performance for EigenFaces, FisherFaces and LBP on verfication.

As FisherFaces algorithm is based on EigenFaces, it was expected that the performances would be similar with FisherFaces having a slight better performance than EigenFaces. How-ever, that did not happened. This was due to the fact that FisherFaces had two classes trained (one with the person used for verification and another with a random stranger) while EigenFaces was trained with only one. As both of the algorithms have holistic approaches, the more classes trained, the more features will be possible to extract thus the better the recognition results, making these algorithms not proper for facial verification solutions since it is only wanted to train one class.

On Chapter 4 it will be introduced some pre-processing techniques in order to improve these results.

3.8.1 Processing Time

Processing time is a major factor for real-time systems. Some of the systems that involve recognition take into account the processing time of recognition of a person.

As a point of comparison for further tests, Table 3.1 presents the processing time of each algorithm. The CPU used was an Intel Core i7-3610QM @ 2.30GHz.

(62)

Algorithm Fisher Faces LBP EigenFaces Processing Time (ms) 37,38 28,64 57,58

Table 3.1: Processing Time: FisherFaces vs LBP vs EigenFaces.

Comparing the values on Table 3.1 there is an evidence that, when used for verification, all of the algorithms perform quite well. As there is only one class, the procedure of comparing the class to the new face image is fast.

These values may not be as fast as they are if, instead of verification, it would be needed identification. The new face would need to compare to more classes thus taking more time to process.

(63)

Chapter 4

Pre-processing techniques

Although facial recognition algorithms are quite complex, there are ways to improve their performance without changing the algorithm itself. One of these ways is to change the face images that arrive to the training and comparison processes in order to improve the quality of the verification made. This chapter has the aim of the study some pre-processing techniques to improve the verification system performance. The techniques studied need to be fast since the system is planned to be working in real-time, so, due to that, they should keep as simple as possible algorithms.

4.1 Head Pose and Face Alignment

One of the biggest problems regarding recognition of a pattern on an image it is its rotation and scale when comparing two images of the same object [29]. Notwithstanding, facial recognition faces a similar problem.

The range of head motion for an average adult male encompasses a sagittal flexion and extension (i.e., forward to backward movement of the neck) from -60.4◦to 69.6◦, a frontal lateral bending (i.e., right to left bending of the neck) from -40.9◦to 36.3◦, and a horizontal axial rotation (i.e., right to left rotation of the head) from -79.8◦to 75.3◦ [30]. Head pose at the time that a image is acquired is crucial not only for detection, but also for recognition. There are three degrees of freedom that a human head can move that are shown on Figure 4.1. As there is any of these movements, recognition rates start to get lower. Studies made in [32] show that the correlation between two face images is lower when the rotation angle between them is higher. The vectors acquired start to differ to the ones that are taken of the comparison image. Also, if there is some pitch or yaw movement some information may be lost as the face may be partially hidden. Some of the examples of these movements are shown on Figure 4.2.

(64)

Figure 4.1: The three degrees of freedom of a human head can be described by the egocentric rotation angles pitch, roll, and yaw [31].

Figure 4.2: Orientation of the head in terms of pitch, roll, and yaw movements describing the three degrees of freedom of a human head [31].

A possible approach for yaw movement is described on Section 5.3. This Section is ded-icated to solve the roll movement. In order to cancel this movement it is presented a face alignment method that is explained on the next Section.

(65)

4.1.1 Face Alignment technique for Roll Head Movement

Considering that most of the human faces have some sort of symmetry, it is possible to consider that, when a face has no roll movement, the angle Θ between a line connecting the center of an eye to the other and the horizontal axis is 0◦. Figure 4.3 explains this angle in a visual way.

Figure 4.3: Angle represented on a face of a person.

The angle described is, however, an extreme case. On the CAMBADA database the average roll angle is 4.02◦.

dlib library has developed a package for real time head pose estimation based on ensemble regression trees [33]. As dlib library provides head pose by giving the coordinates of multiple points on the face (such as beginning and the tip of the nose, ear to ear distance and eyes coordinates shown on Figure 4.4) it is possible to cancel the roll angle.

(66)

When calculated the angle Θ, through trigonometry (Equation 4.1), and once done the face detection, it is made a rotation on the opposite direction centred on the center of the box where the face is detected. This step following is crucial so the background of the face square remains the same. Finally, the cropping around the face is done. Figure 4.5 shows these steps. This process is both done while the images are cropped for training and before the image is sent for comparison.

Figure 4.5: Face alignment technique for cropping and comparison.

There is a face alignment proposed in [34] that has also eyes based alignment. However, the input of the alignment process is a rectangle image and after the alignment, the points out of the image are removed and filled them with pixels before the rotation. Unlike this face alignment method, the process proposed in this thesis has the cropping and scaling only done after the alignment, so there will be no filling process and the image background remains the

(67)

same.

Θ = arcsin lef teyey− righteyey

||righteye(x, y) − lef teye(x, y)|| (4.1)

4.1.2 Face Alignment on EigenFaces

On Figure 4.6 there are shown the differences by applying the face alignment, comparing with the original one, on EigenFaces algorithm.

Figure 4.6: ROC curves comparing verification results with and without face alignment tech-nique for EigenFaces algorithm.

Observing the results, it is possible to say that face alignment did not had any major improvement on EigenFaces algorithm which still maintains a poor performance.

4.1.3 Face Alignment on FisherFaces

On Figure 4.7 there are shown the changes produced by face alignment on FisherFaces algorithm.

Unlike the changes seen on EigenFaces, the face alignment process has a positive impact on FisherFaces. TPR increases from 23% to 55% for 0% of FPR. This performance upgrade is noticeable for all of the FPR values making FisherFaces a more workable algorithm regarding verification.

(68)

Figure 4.7: ROC curves comparing verification results with and without face alignment tech-nique for FisherFaces algorithm.

4.1.4 Face Alignment on Local Binary Patterns Histograms

Figure 4.8 shows the performances using LBP algorithm with and without face alignment.

Figure 4.8: ROC curves comparing verification results with and without face alignment tech-nique for LBP algorithm.

Face verication for an access control system in unconstrained environment

Daniel Pedro

Ferreira Lopes

Verifica¸

c˜

ao Facial para um Sistema de Controlo de

Acessos em Ambientes N˜

ao Estruturados

Face Verification for an Access Control System in

Unconstrained Environment

Daniel Pedro

Ferreira Lopes

Verifica¸

c˜

ao Facial para um Sistema de Controlo de

Acessos em Ambientes N˜

ao Estruturados

Face Verification for an Access Control System in

Unconstrained Environment

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Main Contributions

1.2

Thesis Structure

Chapter 2

Face Detection and Recognition

2.1

Face Detection

2.2

Face Recognition

2.3

Face Detection Algorithms

2.4

Face Recognition Algorithms

2.5

Final Remarks

Chapter 3

Software and Performance

Evaluation on Verification

3.1

Design Patterns

3.2

Developed Interfaces

3.3

Test Programs

3.4

Proposed Scenario

3.5

CAMBADA dataset

3.6

Detection Results

3.7

Remarks about Facial Recognition provided by OpenCV

3.8

Verification Results

Chapter 4

Pre-processing techniques

4.1

Head Pose and Face Alignment