Face detection on infrared thermal image

(1)

Universidade de Aveiro Departamento deElectrónica, Telecomunica¸cões e Informática, 2017

Ricardo Ferreira

Ribeiro

Dete¸

c˜

ao Facial em Imagem T´

ermica Infravermelha

Face Detection on Infrared Thermal Image

(2)

(3)

Universidade de Aveiro Departamento deElectrónica, Telecomunica¸cões e Informática, 2017

Ricardo Ferreira

Ribeiro

Dete¸

c˜

ao Facial em Imagem T´

ermica Infravermelha

Face Detection on Infrared Thermal Image

Disserta¸cão apresentada à Universidade de Aveiro para cumprimento dos requesitos necessários à obten¸cão do grau de Mestrado em Engenharia Eletrónica e Telecomunica¸cões, realizada sob a orienta¸cão cient´ıfica de Doutor António José Ribeiro Neves e do Doutor José Maria Fernandes, Professores auxiliares do Departamento de Eletrónica e Telecomunica¸cões e Informática da Universidade de Aveiro

(4)

(5)

o j´uri / the jury

presidente / president Professor Doutor Armando Jos´e Formoso de Pinho

Professor Associado C/ Agrega¸c˜ao, Universidade de Aveiro

vogais / examiners committee Doutor Jaime dos Santos Cardoso

Professor Associado C/ Agrega¸c˜ao, Universidade do Porto - Faculdade de Engen-haria

Professor Doutor Ant´onio Jos´e Ribeiro Neves

(6)

(7)

agradecimentos / acknowledgements

Pelo rumo que escolhi ao longo dos ´ultimos anos e por completar mais uma etapa importante na minha vida:

Gostaria de expressar sincera aprecia¸cão e gratidão ao meu orientador, Pro-fessor Dr. António Neves, pelos conselhos, pelo apoio, orienta¸cão, e por estar sempre dispon´ıvel e pronto a ajudar;

Gostaria de agradecer `a Daniela Freitas e aos meus amigos pelo apoio, ajuda e por estarem sempre presentes;

Agrade¸co também ao meu co-orientador, Professor Dr. José Fernandes, pelos seus conselhos e sugestões;

Acima de tudo, dedico esta disserta¸cão ao meus pais, José Ribeiro e Jerónima Ferreira, à minha irmã, Tânia Ribeiro, que merecem um espe-cial agradecimento pelo seu apoio e carinho, e por acreditarem sempre em mim mesmo nos maus momentos.

(8)

(9)

Abstract Infrared cameras or thermal imaging cameras are devices that use infrared radiation to capture an image. This kind of sensors are being developed for almost a century now. They started to be used in the military environment, but at that time it took too long to create a single image. Nowadays, the infrared sensors have reached a whole new technological level and are used for other than military purposes. These sensors are being used for face de-tection in this thesis.

When comparing the use of thermal images regarding color images, it is possible to see advantages and limitations, such as capture images in total darkness and high price, respectively, which will be explored throughout this document.

This work proposes the development or adaptation of several methods for face detection on infrared thermal images. The well known algorithm de-veloped by Paul Viola and Michael Jones, using Haar feature-based cascade classifiers, is used to compare the traditional algorithms developed for visi-ble light images when applied to thermal imaging.

Three different algorithms for face detection are presented. Face segmen-tation is the first step in these methods. A method for the segmensegmen-tation and filtering of the face in the infrared thermal images resulting in a binary image is proposed. In the first method, an edge detection algorithm is ap-plied to the binary image and the face detection is based on these contours. In the second method, a template matching method is used for searching and finding the location of a template image with the shape of a human head in the binary image. In the last one, a matching algorithm is used. This algorithm correlates a template with the distance transform of the edge image. This algorithm incorporates edge orientation information resulting in the reduction of false detection and the cost variation is limited.

The experimental results show that the proposed methods have promising outcome, but the second method is the most suitable for the performed experiments.

(10)

(11)

Resumo As câmaras infravermelhas ou as câmaras de imagem térmica são dispos-itivos que usam radia¸cão infravermelha para capturar uma imagem. Este tipo de sensores estão a ser desenvolvidos há quase um século. Come¸caram a ser usados para fins militares, mas naquela época demorava demasiado tempo para criar uma única imagem. Hoje em dia, os sensores infraver-melhos alcan¸caram um n´ıvel tecnológico totalmente novo e são usados para fins além de militares. Esses sensores estão ser usados para dete¸cão facial nesta disserta¸cão.

Comparando o uso de imagens t´ermicas relativamente a imagens coloridas, ´

e poss´ıvel ver vantagens e limita¸c˜oes, tal como a captura de imagens na escurido e o pre¸co elevado, respectivamente, que ser˜ao exploradas durante este documento.

Este trabalho propõe o desenvolvimento ou adapta¸cão de vários métodos para a dete¸cão facial em imagens térmicas. O conhecido algoritmo de-senvolvido por Paul Viola e Michael Jones, que utiliza cascatas de clas-sificadores de Haar baseado em caracter´ısticas, é usado para comparar os algoritmos tradicionais desenvolvidos para imagens de luz vis´ıvel quando aplicados a imagens térmicas.

São apresentados três métodos diferentes para a dete¸cão facial. A seg-menta¸cão do rosto é o primeiro passo nestes métodos. E proposto um´ método para a segmenta¸cão e filtragem do rosto nas imagens térmicas que tem como resultado uma imagem binária. No primeiro método, é aplicado um algoritmo de dete¸cão de contornos à imagem binária e a dete¸cão facial ´

e baseada nesses contornos. No segundo método, é usado um método de correspondência de padrões para pesquisar e encontrar a localiza¸cão de uma imagem padrão com a forma da cabe¸ca humana na imagem binária. No ´

ultimo, é usado um algoritmo de correspondência. Este algoritmo correla-ciona um padrão com a transformada de distância da imagem de contornos. Este algoritmo incorpora informa¸cões de orienta¸cão de contornos que resulta na redu¸cão de falsas dete¸cões e a varia¸cão do custo é limitada.

Os resultados experimentais mostram que os métodos propostos têm re-sultados promissores, mas o segundo método é o mais adequado para as

(12)

(13)

List of Figures

1.1 FLIR Lepton thermal camera module with breakout board. . . 2

1.2 An example of a thermal image and the same scene aquired by a RGB camera. 2 2.1 Electromagnetic spectrum with different divisions of infrared spectrum [1]. . . 5

2.2 Infrared and adjacent spectral regions with expanded view of thermal infrared region [5]. . . 6

2.3 Emissive power of a blackbody according to wavelength at several temperatures [6]. . . 7

2.4 Leslei’s cube with four faces at the same temperature represented in thermal and grayscale image [7]. . . 8

2.5 Atmospheric transmission in different infrared wavelengths [8]. . . 9

2.6 Simplified block diagram of an infrared camera [10]. . . 9

2.7 Structure of an infrared camera produced by NEC Corporation [11]. . . 10

2.8 Cross-sectional view of a microbolometer [13]. . . 11

2.9 Thermal images in grayscale(a) and pseudo color (b)(c) [14]. . . 11

2.10 Thermal camera applications . . . 13

3.1 FLIR Lepton LWIR camera module. . . 16

3.2 Example of an image with good uniformity (a) and images with degrade uni-formity (b)(c) [29]. . . 16

3.3 Hypothetical Illustration of Camera Output vs. Camera Temperature in Radiometry-disabled Mode [29]. . . 17

3.4 Hypothetical Illustration of Camera Output vs. Camera Temperature in Radiometry-enabled Mode [29]. . . 18

3.5 The Raspberry Pi 3 Model B, a low-cost single-board computer manufactured by the Raspberry Pi Foundation [30]. . . 19

3.6 Raspberry Pi 3 GPIO header [33]. . . 20

(16)

3.8 FLIR Lepton camera breakout board scheme [38]. . . 23

3.9 Designed 3D parts to support and fix both cameras to capture images from the same scene. . . 23

3.10 Final system. . . 24

4.1 Processing of the infrared radiation emitted by the scene until the image is displayed. . . 27

4.2 Quantization of an image. . . 28

4.3 The four types of Haar-like features used in the Viola-Jones algorithm [40]. . 29

4.4 Schematic depiction of a the detection cascade [40]. . . 30

4.5 Thermal image processing approach to be used by the algorithms. . . 31

4.6 Example of the template in different scales. . . 33

5.1 Examples of images captured in Teaching Day. . . 35

5.2 Precision and Recall [56]. . . 38

5.3 Examples of detection using classifiers (a) and (b) trained for visible light images. 38 5.4 Examples of positive images. . . 39

5.5 Examples of negative images. . . 39

5.6 Examples of the detection using Haar Cascade algorithm with the trained clas-sifier for thermal images. . . 41

5.7 Examples of positive images containing only faces. . . 42

5.8 Examples of the detection using Haar Cascade algorithm with the improved classifier for thermal images. . . 43

5.9 Examples of images captured in the event students & teachers @deti. . . 44

5.10 Thermal image(a) followed by segmentation(b) and the morphological opera-tion(c) . . . 45

5.11 Examples of binary images using images captured in the event. . . 45

5.12 Examples of the results using Face Contours algorithm. . . 46

5.13 Face detection based on Template Matching. . . 47

5.14 Example of Distance Transform image of template(a) and query image(b). . . 47

5.15 Examples of the face detection using Chamfer Matching algorithm. . . 47

(17)

List of Tables

3.1 FLIR Lepton camera breakout board pins description [38]. . . 22 3.2 Connection between Raspberry Pi 3 GPIO pins and FLIR Lepton camera

breakout board. . . 22 5.1 Results of the test using classifiers trained in [54] for frontal face detection. . 37 5.2 Results of all tests made in this thesis using the Haar Cascade algorithm. . . 43 5.3 Results of the test using thermal images of the event with multiples detection. 48 5.4 Results of the test in a classroom for proposed algorithms. . . 50 5.5 Processing time of each proposed algorithm. . . 50

(18)

(19)

Chapter 1

Introduction

In the electromagnetic spectrum, the visible light spectrum is the only part that the human eye can see. Due to the fact that infrared radiation is invisible to the human eye, thermal cameras use infrared sensors to capture that radiation, transforming it into visible images. Many objects and even humans emit infrared radiation in function of the temperature: the higher the temperature, the higher the intensity of the emitted radiation.

The use of thermal infrared cameras has been increasing in various scientific areas. A survey providing an overview of the current applications is presented in [1]. Applications include animals, agriculture, buildings, gas detection, industrial, and military fields, as well as detection, tracking, and recognition of humans. In computer vision, thermal image analysis and processing is in constant development, being used more often in systems for detection and tracking of objects, humans, among others.

Considering the multiple uses that has the thermal imaging and the fact that the thermal infrared cameras are starting to be available in more cost effective solutions, the main idea of this thesis is to use this kind of sensors in a very concrete problem, face detection.

The infrared thermal camera used, presented in Figure 1.1, is a long-wavelength infrared (LWIR) camera module. This camera is connected to a Raspberry Pi 3 single board to be used as processing device. The infrared radiation is captured by the sensor and converted to a thermal image on the processing unit. In order to complement the thermal image, a digital camera is used to capture an image in the visual spectrum from the same scene (typically a Red, Green and Blue - RGB - image). Figure 1.2 shows the thermal image complemented with an RGB image.

In the context of this thesis, a paper was written and presented at “The Second Interna-tional Conference on Advances in Signal, Image and Video Processing”, SIGNAL 2017, in a special track “SIVPR: Time-constrained Signal, Image and Video Processing in Robotics”. This paper can be seen in Appendices.

(20)

Figure 1.1: FLIR Lepton thermal camera module with breakout board.

(a) Thermal image (b) RGB image

Figure 1.2: An example of a thermal image and the same scene aquired by a RGB camera.

1.1 Objectives

The overall goal of this thesis is to develop and adapt Computer Vision (CV) algorithms for face detection on thermal images. The infrared camera module uses a serial communication for image acquisition which requires output data processing. The main goal is to create a system for face detection using thermal infrared cameras and monitoring people taking into consideration the temperature of the face over time. In pursuit of this goal, this thesis has been divided into three different phases:

• Propose, use and test of a feature-based algorithm developed for images in the visible spectrum, which has an high efficient rate, directly in thermal images. Create and train a cascade function with a dataset of thermal images containing face of humans, in order to improve and test this algorithm in thermal imaging.

(21)

image for face detection.

• Test the four algorithms in a classroom environment over time and compare the results. The proposed algorithms should be tested using single and multi face detection. The stability and the processing time of these algorithms must be taken into account in order to conclude which proposed algorithm is the best.

1.2 Thesis Outline

This thesis is organized as follows. An introductory Chapter that provides a brief intro-duction to the thesis and presents the objectives.

Chapter 2 provides concepts and a literature review of relevant algorithms used over the chapters of the thesis. Infrared thermal radiation is discussed, as well as the nature of infrared thermal cameras and several applications.

Chapter 3 describes the overall hardware and software architectures detailing the infrared thermal camera and the other elements, concluding this chapter with the complete system.

Chapter 4 presents the proposed algorithms for face detection in infrared thermal images. Chapter 5 shows the experimental results of the tests performed in different environments, conditions and scenes.

A summary of work done, comparison of the algorithms, concluding remarks and sugges-tions for future work are made in Chapter 6.

At the end of this document, the Appendices provide the written paper on the subject of this thesis.

(22)

(23)

Chapter 2

Background

Infrared thermal cameras, also called thermographic cameras or simply thermal cameras, are used to capture radiation from the electromagnetic spectrum that the human eyes cannot see. This radiation called by infrared radiation was discovered in 1800, by William Herschel (1738-1822) [2].

2.1 Infrared Thermal Radiation

The infrared radiation is often referred as thermal radiation, due to all objects with a temperature above absolute zero emit infrared radiation [1]. In the electromagnetic spectrum, the infrared spectral region is located from the wavelengths of 0.7 to 1000 micrometers (µm), as can be seen in Figure 2.1.

Figure 2.1: Electromagnetic spectrum with different divisions of infrared spectrum [1].

(24)

band is often subdivided into different spectral regions depending on the characteristics of this radiation in each region. Figure 2.1 also shows this different regions that are [3][4]:

• Near-infrared (NIR, wavelength 0.7-1.4 µm) - Used in fiber optic telecommuni-cations and night vision devices.

• Short-wavelength infrared (SWIR, wavelength 1.4-3 µm) - Used in long distance telecommunications.

• Mid-wavelength infrared (MWIR, wavelength 3-8 µm) - Used in military appli-cation like guided missile technology. Sometimes called thermal infrared.

• Long-wavelength infrared (LWIR, wavelength 8-15 µm) - Used in thermal imag-ing that uses sensors to obtain images of objects. It is based on thermal emissions. This subdivision is also called the thermal infrared.

• Far-infrared (FIR, wavelength 15-1000 µm) - Used in military and astronomy applications.

For infrared or thermal imaging not all subdivisions of the infrared spectrum are used, only a small part called thermal infrared. Figure 2.2 shows an expanded view of this small region.

Figure 2.2: Infrared and adjacent spectral regions with expanded view of thermal infrared region [5].

According to [5], commercial cameras are available for three spectral ranges that are defined for thermography. These three ranges are: the long-wave (LW) region between 7 to 14 µm, the mid-wave (MW) region between 3 to 5 µm and short-wave (SW) region between

(25)

0.9 to 1.7 µm. The reason of the restriction to these wavelengths are the physics of detectors, the transmission properties of the atmosphere and considerations of the amount of thermal radiation to be expected. The thermal motion of charged particles in matter generate the thermal radiation.

As previously stated, every object that has a temperature above of 0 K (-273.15 ◦C) emits thermal radiation in a different wavelength depending on the temperature and material properties. A blackbody is a perfect emitter of thermal radiation. As definition “a blackbody allows all incident radiation to pass into it (no reflected energy) and internally absorbs all the incident radiation (no energy transmitted through the body). This is true for radiation of all wavelengths and for all angles of incidence. Hence the blackbody is a perfect absorber for all incident radiation” [6].

Planck’s law in general says that every physical body emits electromagnetic radiation. Due to its dependence on temperature, Planck radiation is considered thermal radiation. Figure 2.3 shows the spectral nature of Planck’s law, where the emissive power of a blackbody at several temperatures is related with the wavelength.

Figure 2.3: Emissive power of a blackbody according to wavelength at several temperatures [6].

The concept of emissivity is important to understand the thermal radiation emissions of a physical body. In a simple way, two objects with different emissivities at the same physical temperature will not have the same temperature in a thermal image. The emissivity also depends on the temperature of the surface as well as wavelength and angle. Kirchhoff’s law states that the emissivity of a ideal blackbody is 1, this means that a blackbody absorbs

(26)

all the radiation. Figure 2.4 shows a thermal image with a Leslie’s cube and the respective visible spectrum image in grayscale. All faces of the cube are at the same temperature. In grayscale image the face that has been painted black has more emissivity than the polished face. So the polished face of the cube emits less thermal radiation than the black face and reflects the radiation emitted by the hand. The polished faces are composed of shiny metal, such as aluminium, that are metals with low emissivity. The emissivity of the polished face of the images of the left is higher than the polished face of the right images and the white painted surface is nearly as emissive as a black surface.

Figure 2.4: Leslei’s cube with four faces at the same temperature represented in thermal and grayscale image [7].

Leslie’s cube is used to demonstrate and measure the variations in emissivities for different materials.

The transmission of radiation in the atmosphere is not the same in all wavelengths due to the existence of several gases that absorb some radiation. Molecules of the atmosphere, as CO2 and H2O, are able to absorb infrared radiation. This radiation is not transmitted in

some wavelength around 2.7 µm (H2O and CO2), around 4.2 µm (CO2), between 5.5 and 7

µm (H2O) and above 14 µm (H2O, CO2). This absorption defines the wavelength that the

thermal cameras are sensitive. Figure 2.5 illustrates this absorption over these wavelengths and demonstrates that CO2 and H2O dominate attenuation of the infrared radiation.

Michael Vollmer and Klaus-Peter Mollmann have a book “Infrared Thermal Imaging: Fun-damentals, Research and Applications” [5] that explains, among of others, physical principles and laws for infrared or thermal radiation.

(27)

Figure 2.5: Atmospheric transmission in different infrared wavelengths [8].

2.2 Infrared Thermal Cameras

Since infrared radiation was discovered, the infrared technology is in constant develop-ment. This radiation has been used in many applications in different scientific areas, for example medicine, astronomy, military, computer vision, among others. The development of the first infrared cameras began with an infrared line scanner. These line scanners were built for military requirements in the late 1940s. In the early 1950s, infrared thermal viewers took one hour to produce a single image [9].

Nowadays, infrared cameras are scanning devices that capture one point or one row of an image at a time, or use a staring array. This array uses a two-dimensional focal plane array (FPA) where all image elements are captured at the same time with each detector element in the array [1]. The FPA techonology is the most used in infrared detectors. Infrared cameras also include optical lens, electronic circuits, possibly a cooler for the detector or a shutter and software for processing and displaying images to store and transmit captured data in addition to the FPA detector. A simple diagram of an infrared camera is represented in Figure 2.6. Figure 2.7 shows the structure of an infrared thermal cameras.

Figure 2.6: Simplified block diagram of an infrared camera [10].

(28)

Figure 2.7: Structure of an infrared camera produced by NEC Corporation [11].

detectors normally operate in a wavelength of 2 to 5.6 µm range that corresponds to the SWIR and MWIR band [10]. These detectors are called photon detectors or quantum detectors and convert the absorbed infrared radiation directly into a change of the electronic energy distribution in a semiconductor by changing the free charge carrier concentration [1]. Cooled thermal cameras have typical operating temperature of around 77K. Detectors without cooling are not sensitive to the infrared radiation, being flooded or blinded by their own radiation [10]. A camera with this type of detectors has a high sensitivity and can deliver hundreds of HD resolution frames per second.

Uncooled cameras are usually composed of thermal detectors. Thermal detectors convert the absorbed infrared radiation into thermal energy causing a rise in the detector temperature. Most infrared cameras of this type have microbolometer detectors. Microbolometer FPAs are thermal sensors and can be created from metal or semiconductor material. Figure 2.8 shows the basic anatomy of a typical microbolometer pixel. Uncooled infrared detectors operate in a wavelength of 8 to 14 µm range, the LWIR band. These detectors have lower sensitivity and slower response time than cooled detectors but in compensation they are smaller, silent and low cost. In this thesis, the infrared thermal camera used, shown in Figure 1.1 of the Chapter 1, is uncooled and uses microbolometers that are the most used in commercial infrared cameras [12].

The output data of a thermal camera is typically an image in grayscale with depth form 8 to 16 bit per pixel. Radiometric cameras usually provide output data in a raw 16-bit intensity value. The concept of radiometry is explained in Chapter 3. These cameras are considered

(29)

Figure 2.8: Cross-sectional view of a microbolometer [13].

calibrated if they can measure temperatures through the output data. Some cameras also have a mode to visualize the output data in pseudo colors, 24 bit per pixel, due to the details of an image being more perceptible by the human eye to color images than grayscale images. Figure 2.9 shows an image captured by an infrared thermal camera using different colorizations.

Figure 2.9: Thermal images in grayscale(a) and pseudo color (b)(c) [14].

Thermal cameras are becoming a common tool for a wide range of science applications. These applications began in the military field but were increasingly expanded into other fields due to the lower cost, resolution and quality increase of the image, being more accessible for academic applications, industrial, research, agricultural, security, personal use, among others applications. Some applications are presented and described below. Figure 2.10 shows examples of thermal imaging for each application that will be mentioned.

• Animals and agriculture - In the study of wild animals, thermal cameras are used for diagnosis of diseases to control reproduction and lactation processes, as well as detection and estimation of population size [15]. Figure 2.10a) shows an example. In agriculture, thermal cameras are used to predict water stress in crops, planning irrigation scheduling, disease and pathogen detection in plants, predicting fruit yield, evaluating the maturing of fruits, bruise detection in fruits and vegetables [16], an example of this application is

(30)

shown in Figure 2.10b).

• Building inspection - Thermal cameras are used for inspection of air conditioning, electrical installations and detect heat loss from buildings [17]. An example of building inspection applications is shown in Figure 2.10c).

• Fire monitoring - Thermal cameras can be used for forest-fire detection [18]. These cameras can also be used for rapid search and rescue, to prevent firefighters from getting lost in the fire and smoke, and to help a sector leader or incident commander keep track of the positions of firefighters currently searching a building [19]. Figure 2.10d) shows an example.

• Gas detection - Detection of gas leaks of ammonia, ethylene, and methane by mea-suring the spectral region of the LWIR [20]. Figure 2.10e) shows an example.

• Humans - In this thesis, the thermal camera is used for face detection of an human. This type of cameras is also used for detection and tracking [14] and face recognition [21]. An example of detection of Humans is shown in Figure 2.10f).

• Medical - Some medical issues can be detected and analyzed using thermal imaging [22]. Thermal imaging has been shown to reveal tumors in an early state, especially with breast cancer [23]. Figure 2.10g) shows an example of the thermal image used for medical applications.

• Military - In the military field, there are many applications, such as gunfire, mines, snipers detection [24], among others [1]. Figure 2.10h) shows an image of snipers detec-tion.

In this thesis, the thermal imaging is used for face detection. Face detection is the first step in many applications, including face recognition, head pose analysis, or even some full person detection systems. Since the face is normally not covered by clothes or other objects, a thermal camera can capture the direct skin temperature through the infrared radiation emitted by the face [1]. In Chapter 4, some algorithms using thermal cameras for face detection are proposed and tested.

(31)

(a) Animals [25] (b) Agriculture [26] (c) Building inspection [27]

(d) Fire monitoring [19] (e) Gas detection [28] (f) Detection of Humans [14]

(g) Medical [22] (h) Military [24]

(32)

(33)

Chapter 3

System Description

In this chapter, the elements of the developed system for face detection are described. This system includes a thermal camera, the one used in this thesis is shown in Figure 1.1, a small single-board computer and a color camera (RGB) to complement the thermal image. The RGB camera was developed by the same company of the single-board computer.

In the next sections are described each system elements and presented the final system.

3.1 FLIR Lepton Camera Module

In this system, a FLIR Lepton LWIR (50◦ shutterless) camera module is used, shown in Figure 3.1. It integrates a fixed-focus lens assembly, a focal plane array (FPA) of 80x60 active pixels (progressive scan), and signal-processing electronics. The FPA is a LWIR mi-crobolometer sensor array, and it captures infrared radiation in a wavelength band from 8 to 14 µm and generates a uniform thermal image. By default, this camera is a non-radiometric version and also not calibrated, therefore it is not possible to obtain temperature values of each pixel. The output value also changes with the temperature value of the infrared sensor of the camera and the temperature of the scene. It also can be configured into other modes through a command and control interface (CCI). The CCI is hosted on a Two-Wire Interface (TWI) similar to Inter-Integrated Circuit (I2C) for software interface [29]. In the following sections, some operating states, modes and interfaces used in this system are described. 3.1.1 Flat-Field Correction

Lepton is factory calibrated to produce an output image that is highly uniform, such the one shown in Figure 3.2 (a), when viewing a uniform-temperature scene. However, drift effects over long periods of time degrade uniformity, resulting in imagery which appears more grainy (Figure 3.2 (b)) or blotchy (Figure 3.2 (c)). These drift effects can be caused by operating

(34)

Figure 3.1: FLIR Lepton LWIR camera module.

over a wide temperature range, for example below -10◦C and above 65◦C. When the camera is powered off, that can also have effect on image quality [29]. This camera is capable of automatically compensating using an internal algorithm called scene-based non-uniformity correction (NUC).

In this system, a high quality image is required, so a flat-field correction (FFC) is per-formed. FFC is a process whereby the NUC terms applied by the camera’s signal processing engine are automatically recalibrated to produce the most optimal image quality [29]. FFC is executed through a command via CCI and it is only commanded when the camera is imaging a uniform thermal scene, such as a wall.

Figure 3.2: Example of an image with good uniformity (a) and images with degrade uniformity (b)(c) [29].

3.1.2 Radiometry Modes

The radiometric modes affect the transfer function between incident flux (scene tempera-ture) and pixel output. There are two radiometry modes that affect output image: Radiom-etry Disabled and RadiomRadiom-etry Enabled.

(35)

Radiometry Disabled

With radiometry disabled, the output of a given pixel is intended to be near the middle of the 14-bit range (approx. 8192) when viewing a scene with a temperature equal to the temperature of the camera. Furthermore, the responsivity, which is defined as the change in pixel output value for a change in scene temperature, varies over the camera’s operating tem-perature range [29]. The resulting output for three different scene temtem-peratures is illustrated hypothetically in Figure 3.3.

Figure 3.3: Hypothetical Illustration of Camera Output vs. Camera Temperature in Radiometry-disabled Mode [29].

Radiometry Enabled

With radiometry enabled, Lepton performs internal adjustments to the signal level such that in principle the output is independent of the camera’s own temperature [29]. The result-ing output for three different scene temperatures is illustrated hypothetically in Figure 3.4.

The radiometry can be enabled using a command via CCI. In order to obtain the tem-perature values of each pixel, the camera has to be calibrated using external calculations and measurements and update this values into the camera. In this version of the FLIR Lepton cameras is not possible to calibrate because some of the functions necessary for this calibration are not available.

3.1.3 Video Over Serial Peripheral Interface

The Lepton camera module uses a protocol called Video over Serial Peripheral Interface (VoSPI) to transfer the output data through a Serial Peripheral Interface (SPI)

(36)

communica-Figure 3.4: Hypothetical Illustration of Camera Output vs. Camera Temperature in Radiometry-enabled Mode [29].

tion. The protocol is packet-based with no embedded timing signals and no requirement for flow control. The host (master) initiates all transactions and controls the clock speed. Data can be pulled from the Lepton (the slave) at a flexible rate [29]. Each video packet contains data for a single video line. For this case, there are 60 video packets per frame. The output format mode used is Raw14 that is appropriate for viewing 14-bit data. For this format each packet contains pixel data for all 80 pixels in a single output line. The first two bits of every packet are skipped because they are only used with telemetry enabled that, by default and in this case, is disabled [29].

More information and data about other modes and interfaces, that are not used in this thesis, can be accessed from [29].

3.2 Raspberry Pi 3 Model B

Raspberry Pi is a single-board computer. Although it is small in size and slower than a modern laptop or desktop, it is a low cost complete computer and provides the expected abilities that implies, at a low-power consumption level. In this thesis, the version used is the Raspberry Pi 3 Model B that is the third generation of these boards. An image of the Raspberry Pi is shown in Figure 3.5.

3.2.1 Hardware

The hardware has evolved through several generations and versions of the raspberry pi with changes in some of the features, for example memory, processor, peripheral-device

(37)

sup-Figure 3.5: The Raspberry Pi 3 Model B, a low-cost single-board computer manufactured by the Raspberry Pi Foundation [30].

port, improving the performance of the Raspberry Pi. In this system, the version used has the following main specifications [30][31][32]:

• Processor - Broadcom BCM2837 SoC with a 1.2 GHz 64-bit quad-core ARM Cortex-A53 processor, with 32kB L1 and 512kB L2 cache memory.

• GPU - Dual Core VideoCore IV Multimedia Co-Processor. Provides Open GL ES 2.0, hardware-accelerated OpenVG, and 1080p30 H.264 high-profile decode.

• Memory RAM - 1GB LPDDR2 memory (shared with GPU).

• Networking - Integrated 2.4GHz 802.11n wireless LAN, Bluetooth 4.1, Bluetooth Low Energy (BLE) and 10/100 BaseT Ethernet.

• GPIO Connector - 40-pin expansion header: 2x20 strip. Providing 27 GPIO pins as well as +3.3 V, +5 V and GND supply lines. Some of GPIO pins can be used for specific functions including Inter-Integrated Circuit (I2C), Serial Peripheral Interface (SPI), Universal Asynchronous Receiver/Transmitter (UART), Pulse-code Modulation (PCM) and Pulse Width Modulation (PWM). Figure 3.6 shows the arrangement of the pins.

• Ports - HDMI, 3.5mm analogue audio-video jack, 4 USB 2.0, Ethernet, Camera Serial Interface (CSI) and Display Serial Interface (DSI).

(38)

Figure 3.6: Raspberry Pi 3 GPIO header [33].

3.2.2 Software

The operating system used is the Rasbian which is recommended by Raspberry Pi Foun-dation. The Raspberry Pi 3 can also use other third party operating systems, for example a Debian-based Linux operating system, Ubuntu MATE, Windows 10 IoT Core, among others. For the development and use of the different algorithms and methods, the OpenCV library (Open Source Computer Vision Library) is used [34][35] available from [36]. OpenCV library is an open source computer vision and machine learning software library. This library has more than 500 functions and 2500 optimized algorithms that span many areas in computer vision, for example factory product inspection, medical imaging, security, user interface, camera calibration, stereo vision, and robotics [35]. In this thesis, the OpenCV library is used as support for the development and adaptation of face detection algorithms.

The programming language C++ is used. The algorithms and methods are compiled using GNU Compiler Collection (GCC) and it is built with CMake tool. CMake is an open source software tool designed to control the compilation process of a software system using platform-independent configuration files [34].

(39)

3.3 Raspberry Pi Camera Module

The Raspberry Pi Camera Module v2 has a Sony IMX219 sensor. The sensor has a native resolution of 8 megapixel, and has a fixed focus lens on board. This camera provides image quality, high sensitivity, low noise image capture and low-light performance. In terms of still images, the camera is capable of 3280x2464 pixel static images, and also supports 1080p at 30 frames per second (fps), 720p at 60 fps and 640x480 at 90 fps video [37]. It connects to the Raspberry Pi board via a ribbon cable to the CSI port. In this case, the camera is accessed through Video4Linux (V4L) API which is useful for use with the OpenCV library. Figure 3.7 shows the raspberry pi camera module v2.

Figure 3.7: Raspberry Pi camera module v2 [37].

3.4 Complete System

The main elements of this system are the Raspberry Pi and the FLIR Lepton camera module with breakout board. Without breakout board it is not possible to connect the camera module to Raspberry Pi. The breakout board is an easy to interface with the evaluation board to quickly evaluate the camera module. The Raspberry Pi has a non standard pinouts but using the breakout board it is easy to wire to the camera module [38]. In Table 3.1, the pins functions of the breakout board which are connected to the Raspberry Pi are represented. Figure 3.8 shows the scheme of the breakout.

The connection between the Raspberry Pi GPIO pins of the Figure 3.6 and the breakout board pins is made as shown in Table 3.2.

(40)

connec-Breakout board pins

Pin 1 - SPI CS Video Over SPI Chip Select, active low Pin 2 - SPI MOSI Video Over SPI Slave Data In

Pin 3 - SPI MISO Video Over SPI Slave Data out Pin 4 - SPI CLK Video Over SPI Slave Clock Pin 5 - GND Common Ground

Pin 6 - VIN 3-5V Supply input

Pin 7 - SDA Camera Control Interface Data, I2C Pin 8 - SCL Camera Control Interface Clock, I2C

Table 3.1: FLIR Lepton camera breakout board pins description [38].

Breakout board Raspberry Pi GPIO Pin 1 Pin 24 Pin 2 Pin 19 Pin 3 Pin 21 Pin 4 Pin 23 Pin 5 Pin 6 Pin 6 Pin 1 Pin 7 Pin 3 Pin 8 Pin 5

Table 3.2: Connection between Raspberry Pi 3 GPIO pins and FLIR Lepton camera breakout board.

(41)

Figure 3.8: FLIR Lepton camera breakout board scheme [38].

tion between the two elements of the system. In this thesis, this camera is used to complement the thermal image with an RGB image of the same scene. In order to always obtain the scene image of the frame capture by both cameras, two parts were designed using a design software. These two parts together have supports to put both cameras side by side, thus creating a single piece. Figure 3.9 shows the front part, back part and both parts together from left to right, respectively.

(a) Front part of the piece (b) Back part of the piece (c) Final piece

Figure 3.9: Designed 3D parts to support and fix both cameras to capture images from the same scene.

Both parts were printed on a 3D printer. In Figure 3.10 the final system is presented with both cameras mounted in the 3D piece and connected to the Raspberry Pi for the posterior use of the proposed algorithm and software of this thesis that will be described and tested in the next chapters.

(42)

(43)

Chapter 4

Face Detection

The main goal of this thesis is to use the infrared thermal camera for face detection. The use of these cameras for face detection has been limited because they are expensive and have low resolution when compared with color cameras. In the thermal images the face to be detected sometimes appear as a few pixels which is a disadvantage of the use of the thermal camera described in the previous chapter. In addition, advantages and other disadvantages will be presented in this chapter.

Moreover, in this chapter, algorithms are described for the implementation of face detec-tion using a infrared thermal camera. The first algorithm is based on features which uses a cascade classifier to search for features of interest. The other algorithms uses image segmen-tation. One of these algorithms uses only the contours of the segmented image. The last two algorithms use a template to find the best match in the segmented image.

4.1 Advantages and Limitations

In color or RGB cameras, the colours and visibility of the scene depend on an energy source, such as the sun or artificial light. The captured RGB images depend on the illumina-tion, with changing intensity, colour balance, direction and among others, which sometimes affects image processing in some applications. Furthermore, nothing can be captured in total darkness.

Thermal cameras are advantageous in many applications due to their ability to capture in-visible radiation seeing in total darkness, their robustness to illumination changes and shadow effects, and less intrusion on privacy [39]. Thereby they do not depend on any external energy source. As explained in Chapter 2, the human body emits infrared radiation in the form of heat and its temperature tends to remain constant. Generally, the part of the human body which is not covered is the face and it stands out from the background.

(44)

These cameras, when calibrated, are advantageous in temperature measurement compared to point-based methods since temperatures over a large area can be compared, although con-tact methods are more efficient [14]. In this work, the thermal camera used is not calibrated, so it is not possible to know the exact temperature in the region of interest.

Some algorithms for face detection using color or grayscale images currently have a very high efficiency rate but it is not possible to use them directly in the thermal images. Using these algorithms may result in some problems regarding thermal imaging, that can reduce this efficiency significantly, such as occlusion of the face with objects that emit infrared radiation, the uniform temperature in the face, objects with the shape and same temperature of the face, the face away from the camera, among others. For this reason, only one feature-based algorithm is used, Haar Cascade [40]. The Opencv library provides utilities and functions to use the Haar Cascade algorithm and training cascade classifiers to use in thermal imaging for face detection. This algorithm has an efficient feature selection, a scale and location invariant detector [40]. The face detection in this work is represented on a shape of a bounding box obtaining not only the face, but also some background noise that can affect the detected face for future work.

Comparing with visible light cameras, thermal cameras have a considerable higher cost. For example, a thermal camera with low resolution 80x60, such as the camera used in this thesis, costs hundreds of euros and increasing the resolution comes with a higher prices of many thousands of euros. These cameras need more training than visual cameras for correct usage. The emissivity and reflectivity of different materials and the impact of the atmosphere affects the captured thermal image. In order to provide accurate measurements of temperature, these cameras need to be calibrated according to the physical principles which is a complex process.

4.2 Proposed Approach

In this section, different algorithms and methods for face detection are described. The goal is to implement the algorithms on a real-time application. Therefore, these algorithms will be tested on a real-time system. The development of this work is made in C++ programming language, through the OpenCV library, which is an open source computer vision and machine learning software library.

The resources used for the acquisition of the thermal image and the way of how the image is generated are also described in this section. Figure 4.1 shows a block diagram with the process from the incidence of infrared radiation on the thermal camera to the detection represented in the image to be displayed, using the proposed algorithms. The first two blocks

(45)

are part of the integrated camera module. In order to receive the data, the SPI port of Raspberry Pi is configured. The large block encompasses all the steps for data processing, obtaining the thermal image. This image is processed by the proposed algorithms and it is displayed with face detection in shape of a rectangle. In the next section, each block of the diagram is described in detail.

Figure 4.1: Processing of the infrared radiation emitted by the scene until the image is displayed.

4.2.1 Image Acquisitions

The acquisition of the thermal image is made in real-time. This image acquisition process is based on a developing project by the company Pure Engineering [41].

The incident infrared radiation passes through the lens assembly and it is focused from the scene onto an 80x60 array of thermal detectors (microbolometers) which integrate the complete FPA. The serial stream from the FPA is received by a system on a chip (SoC) device, which provides signal processing and output formatting. The data is sent via the Lepton VoSPI protocol to the Raspberry Pi. When the camera is turned on, the SPI port of the Raspberry Pi must be opened for communication, in order to receive the data sent by the camera. It is needed to set the SPI mode, the bit per word and the SPI bus speed, so

(46)

that there is no communication failure and data is received correctly through the SPI port selected.

Through the VoSPI, the output data are received in an 14-bits data. To be able to display a visible thermal image, this data is converted into an image matrix arranged in an 8-bits data with one channel provided by OpenCV. In the 14-bits data, each pixel can assume a wide range of values and this range corresponds to temperature values between -10◦C to +65

◦_{C. As in this case it is only important the temperature of the scene captured by the camera,}

the data received has to be filtered.

The process of converting the 14-bits data into 8-bits data starts with the conditioning of the data, searching for the maximum and minimum values of the frame pixels received. Generally, in an 8-bits image with one channel the pixel values are between 0 and 255. In this image, the minimum value corresponds to 0 and the maximum value to 255. This conversion is also known as quantizing an image. Figure 4.2 represents an easy way to understand this conversion. The columns and rows of each pixel are calculated and it is created the visible thermal image. The first image of Figure 1.2 shows a thermal image obtained. This thermal image is the input image for the proposed algorithms.

Figure 4.2: Quantization of an image.

4.2.2 Haar Cascades

Haar Cascades is a machine learning approach, using Viola and Jones algorithm [40], for visual object detection where a cascade function is trained from positive images (images of faces) and negative images (images without faces). The performance of the trained classifier

(47)

will be better, as more images are used. This algorithm uses integral image representation with Haar-like features, shown in Figure 4.3. The value of each Haar-like feature is the difference between the sum of the pixels which lie within the white area and the sum of pixels in the gray area of the feature.

Edge features are shown in Figures 4.3(A) and 4.3(B). Figure 4.3(C) shows a line feature, and 4.3(D) a four-rectangle feature. In integral image, rectangle features can be computed very rapidly using an intermediate representation for the image [40]. This technique needs to be calculated only once for each image and enables fast and simple calculation of sums over rectangular areas. Each pixel (x, y) in the integral image ii is equal to the sum of pixels above and to the left of i(x, y) in the original image.

ii(x, y) = X

x0_≤x,y0_6y

i(x0, y0) (4.1)

Figure 4.3: The four types of Haar-like features used in the Viola-Jones algorithm [40]. The features are calculated in an image in all ways possible, by varying their scale and rotation of an over-complete set. The number of feature set is very large, more than the number of pixels. Therefore, AdaBoost is used for the reduction of features and choose the small set of Haar-like features in order to perform the classifier. In Viola-Jones algorithm [40] the computation time is reduce introducing the concept of Cascade of Classifier. Instead of applying all training features, the features are grouped and applied one by one at each stage of classifiers.

Figure 4.4 shows a schematic depiction of a the detection cascade. A series of classifiers are applied to every sub-window. The initial classifier eliminates a large number of negative examples with very little processing. Subsequent layers eliminate additional negatives but

(48)

Figure 4.4: Schematic depiction of a the detection cascade [40].

require additional computation. After several stages of processing the number of sub-windows have been reduced radically. Further processing can take any form such as additional stages of the cascade or an alternative detection system [40]. The final classifier is trained to detect objects of interest and eliminate the irrelevant background area.

4.2.3 Implementation of Different Methods for Face Detection

As far as we know, face detection on thermal images did not received too much attention on computer vision as the counterpart on visible light images. There are some possible ways of using some functionalities of the OpenCV library.

Figure 4.5 shows an approach for thermal image pre-processing before being applied the different face detection algorithms. Thermal image is, on a first stage, segmented and filtered with morphological operators in order to obtain a binary image for the later use of the three algorithms proposed. The algorithms Template Matching and Chamfer Matching use, in addition to the binary image, a template for recognizing this pattern in the thermal image.

Segmentation uses the Otsus method that is a thresholding binarization method [42] and filtering is performed using morphological operators, such as dilation, erosion, opening and closing [43]. In this thesis, the following algorithms are developed/adapted and implemented: • Face Contours - Acquisition and filtering of the contours in order to obtain the biggest

contour in the binary image and detect the face through it [44].

• Template Matching - Technique for finding areas of an image that match to a tem-plate image [45].

• Chamfer Matching - Technique to find the best alignment between two edge maps [46].

(49)

Figure 4.5: Thermal image processing approach to be used by the algorithms.

Segmentation and Filtering

The thermal image obtained goes through a pre-processing in order to create a binary image. Before being segmented, the thermal image is processed using Gaussian blur [47] to reduce image noise and smooth the image. Then Otsus method is used. Otsus method is a parameterless global thresholding binarization method [42]. This method involves iterating through all the possible threshold values, assuming that the image contains two classes of pixels following bi-modal histogram (foreground pixels and background pixels) and calculating the optimum threshold separating the two classes, i.e. the pixels that either fall in foreground or background. The aim is to find the threshold value where the sum of foreground and background spreads is at its minimum [48].

The binary image generated using this method may still contain small black parts that are not relevant for face detection using the proposed algorithms. So this image is filtered with morphological operators in order to remove this noise. The word morphology concerns shapes: in mathematical morphology we process images according to shape, by treating both as sets of points. In this way, morphological operators define local transformations that change pixel values that are represented as sets. The ways in which pixel values are changed is formalized by the definition of the hit or miss transformation [47]. The two basic morphological operators are Erosion and Dilation. The variant forms are Opening and Closing. In this thesis, these transformations are used in order to obtain a clean and uniform binary image.

(50)

Face Contours

Through the binary image an edges map is created, using the Canny edge detector algo-rithm [49], where the contours are found [44]. The Canny algoalgo-rithm is one of the most popular edge detection technique that obtains structural information from the image and reduces the data to be processed. Among the edge detection techniques, Canny edges detection algorithm performs better under different scenarios and under noisy conditions [50].

The edges image does not contain any information about the edges as entities in them-selves. The next step is to assemble the edge pixels into contours [35]. These contours are filtered in order to obtain just the contour of the larger area. Due to the contour of having some parts of the human body that are not relevant for face detection, for example neck and shoulders, the highest point of the contour is found. This point matches to the highest point on the face. Starting at this point, the two points that correspond to the largest width of the face are found. The detection of the face is made with this two points and the highest point in the face contour.

Template Matching

Template Matching is one of many techniques that uses a template to search and find the best match of this template, in an image. This is usually made by “sliding” a template with a similar pattern to the one or more that it is trying to find, across the image in horizontal scan and at each pixel, it calculates the distance between the template and the region of interest. There are different matching methods to perform the template matching technique. The method Normalized Cross-Correlation (NCC) is used, which according to [45] remains a viable choice for some if not all applications. According to [51], considering the image f with size Mx x My at the point (x, y), the intensity value of the image f is f (x, y). The template t has

a size of Nx x Ny. A common way to calculate the position (upos, vpos) of the template t in

the image f is to evaluate the NCC value γ at each point (u, v) which has been shifted by u steps in the x direction and by v steps in the y direction. The following equation gives a basic definition for the NCC [51]:

γ = P x,y f (x, y) − fu,v t(x − u, y − u) − t q P x,y f (x, y) − fu,v 2P x,y t(x − u, y − u) − t 2 (4.2)

Where f_u,v is the mean value of f (x, y) within the area of the template t shifted to (u, v). With similar notation t is the mean value of the template. Due to this normalization, γ(x, y) is independent to changes in brigtness or contrast of the image, which are related to the mean value and the standard deviation. In template matching, the position of the template in an

(51)

image is determined by searching the maximum of γ(x, y) [51].

Authors of [52] use an Image Pyramid to perform the template matching for objects with different sizes. In this work image pyramid is used to detect faces of different scales increasing the performance of the template matching. Due to the image captured by the camera being small in resolution, the pyramid image is created from a template with approximately the size of the image. Each level of the pyramid is generated downsizing the original template. Figure 4.6 shows the original template, that will be used, in the first image followed by images of some levels of the pyramid created. A good template is needed to obtain better results in the template matching.

Figure 4.6: Example of the template in different scales.

Chamfer Matching

The Chamfer Matching algorithm lies in shape matching using the edges maps and the respective Distance Transform [53] of a query image and a template image. The edges maps are created using the Canny edge detector algorithm [49] for simplicity and accuracy.

According to [46], the edge map of the query image is considered as U = ui and the edge

map of the template as V = vj, where ui and vj are the points of the edges of the image and

the template, respectively. The distance transform is applied, creating an image where each pixel value denotes the distance to the nearest point of U . The Distance Transform algorithm computes an approximation of the Euclidean distance.

To increase the efficiency, the matching cost can be computed via a distance transform image DTV(x) = minvj∈V | x − vj | which specifies the distance from each pixel to the nearest

edge pixel in V. The chamfer distance dCM(U, V ) is determined by the average of distances

between each point ui∈ U , and its nearest edge in V , as in the following equation

dCM(U, V ) = 1 n X ui∈U DTV(ui) (4.3) where n =| U |.

(52)

In [46], to improve robustness, several variants of chamfer matching have been introduced by incorporating edge orientation information into the matching cost. The better match between image and template is the location of the minimal chamfer distance. In this case, the used template is considered to be matched at locations where the chamfer distance is below a defined threshold. To improve the detection for faces with different sizes, not only one template is used. The original template is resized in different scales, as shown in Figure 4.6.

(53)

Chapter 5

Experimental Results

In this chapter, the experimental results to verify the effectiveness of the implemented algorithms are present. Using the Haar Cascades algorithm, tests were performed using captured images at the University of Aveiro event, “Teaching Day”. The other algorithms were tested using images from another event organized by the Department of Electronics, Telecommunications and Informatics (DETI) of the University of Aveiro, called “Students & Teachers @deti 2017”. Since the goal is the real time use, the proposed algorithms were tested under controlled conditions over time. The thermal images obtained in 8-bit grayscale in the image acquisition process were used as input and subsequently processed in each proposed algorithm. Examples of this type of images is represented in Figure 5.1.

(54)

5.1 Haar Cascades

In this thesis, the well-known traditional Viola-Jones algorithm for images of the visible spectrum was tested in thermal images using the same cascade classifiers for frontal face detection used for color images in order to show that this algorithm needs a well-trained classifier for the desired situation. Although it is intended to detect faces in both images, the image type is a very important factor to be considered. In visible images, the image is created based on the intensity of the primary colors, such as Red, Blue and Green, and the set of these colors generate a RGB image. In the case of the thermal image, the image is generated based on the emitted infrared radiation in the form of heat, as explained in Chapter 2.

5.1.1 Teaching Day

At the University of Aveiro, there are initiatives that have been carried out in the last years, of critical and shared debate around teaching and learning, such as Teaching Day. In the course of this thesis, it was carried out the 5th edition of the Teaching Day with the theme “Technology at the service of learning: opportunities and constraints”.

In the Teaching Day, students and teachers share practices and experiences of using nologies in the classroom and beyond, reflecting together the multiple ways in which tech-nology contributes to new ways of being, teaching and learning, as well as of communicating and experiencing.

During the presentation of posters, the thermal camera used in this thesis was capturing frames of people who stopped to read the posters. Thousands of images were captured during this day. These images were chosen in order to reduce the number of data to be analyzed, this is, some repeated images without people for face detection were eliminated. The total of images that were used in the tests were 8130 images. Figure 5.1 shows some examples of images captured during the Teaching Day. In all these images there are as many faces as images. Since the camera captures low resolution images, some of these faces appear as a few pixels less than 20 × 20. The classifiers provided by Opencv library were trained for an input pattern size of 20 × 20 or 24 × 24 in [54]. There are more or less 1602 frontal faces that can be detected. The faces were counted manually, seeing each frame of these images. Only faces that have a dimension of approximately equal to or greater than 20 × 20 pixels have been counted, because the classifiers were trained for an input pattern of this size.

The first test using the Haar Cascade algorithm was made with these classifiers. The Opencv library provides three different classifiers for frontal face detection on visible light images but only two of the three classifiers were used, which were trained for an input pattern size of 20 × 20. The other classifier was trained for an input pattern size of 24 × 24. According

(55)

to [54], 20 × 20 is the optimal input pattern size for frontal face detection. Table 5.1 shows the number of detections, the number of false positives, the precision and recall of the detection of the classifier (a) and (b). The variables precision and recall are calculated for an undestanding and measure of relevance [55]. Figure 5.2 represents how to the calculation of precision and recall are made. Examples of the face detection using theses classifiers are shown in Figure 5.3. The last image of each row is considered as false positive.

Cascade Classifiers Number of Faces Number of Detections False Positives Precision (%) Recall (%) a 1602 17 3 82.4 0.8 b 1602 16 1 93.8 0.9

Table 5.1: Results of the test using classifiers trained in [54] for frontal face detection.

Using cascade classifiers for visible light images in thermal images, the recall of the detec-tion is very low. This means that the algorithm using these classifiers returns almost none of the faces present in the images. Therefore it is not possible to use this algorithm directly in thermal image. In order to improve the results, it was made a Haar Cascade training for face detection in thermal images.

5.1.2 Training

With the purpose of demonstrating that using a machine learning for training a new classifier, it is possible to improve the face detection using thermal images. The OpenCV library provides an easy way for creating training samples from positive and negative images. It also provides the tool for training classifiers through these samples. Before creating the samples, the positive and negative thermal images were collected .

The dataset used, in this attempt of training classifiers for face detection on thermal images, is OTCBVS Benchmark Dataset Collection [57]. This dataset provides among of thermal images that can be used as positive and negative images. For positive images, the Terravic Facial IR Database [58] is used, that has thermal images of faces. There are images of 18 different faces with variations, such as rotation of the face, images captured indoor and outdoor, people with glasses and hat. For the training of the classifiers not all the images of the database were used, being selected only 1154 positive images. Figure 5.4 shows examples of these positive images. 757 negative images were collected from other database of the OTCBVS Benchmark Dataset Collection [57] and thermal images on the web. An example of these negative images are shown in Figure 5.5.

(56)

Figure 5.2: Precision and Recall [56].

(57)

Figure 5.4: Examples of positive images.

Figure 5.5: Examples of negative images.

vector containing five positive samples is created. In this vector, the samples are generated choosing the negative images randomly so that they varied in gradients. The vector of each positive image is created using the opencv createsamples application provided by OpenCV library. How the samples of each positive image should vary could be controlled by a number of command line parameters:

• Name of the output file containing the positive samples for training. • Source object image (positive image).

(58)

for randomly distorted versions of the object. • Number of positive samples to generate. • Width (in pixels) of the output samples. • Height (in pixels) of the output samples.

The vectors of positive images are merged in order to create a single vector with all samples. The vector created, in this case contained 5770 samples with window size of width 20 pixels and height 20 pixels. The next step is the training of the classifier, based on this vector, using the opencv traincascade application. Some of the important command line parameters for this application are the following:

• Data directory in which trained classifier is stored. • File name of the vector with positive samples.

• Background description file. This is the file containing the negative images. • Number of positive samples used in training for every classifier stage. • Number of negative samples used in training for every classifier stage. • Number of cascade stages to be trained.

• Minimal desired hit rate for each stage.

• Maximal desired false alarm rate for each stage. • The size of dedicated memory in MB.

• The type of Haar features set used in training. • Width of training samples (in pixels).

• Height of training samples (in pixels).

20 stages were trained. The number of training stages combined with minimal desired hit rate for each stage of the classifier will estimate the overall hit rate. The overall false alarm rate is estimated as the combination of the number of stages and the maximal desired false alarm rate for each stage of the classifier. As suggested in [54], the minimum hit rate and the maximum false alarm rate are set to 0.999 and 0.5 per stage, respectively, being expected an overall hit rate of about ≈ 0.98 and an overall false alarm rate of about ≈ 9.6 × 10−7. The full set of upright and 45 degree rotated feature set of the Figure 4.3 is used.

(59)

The computer used for training process of classifiers is a notebook computer having a 2.20 GHz Intel Core i7 with 8GB RAM.

Training Results

The second test using the Haar Cascade algorithm was made with the classifier trained in the previous section. In this test, the thermal images captured during the Teaching Day presented in Figure 5.1 were used. Face detection is unstable since it does not detect the face if the image does not contain the neck and shoulders of the person. For this reason, the number of faces increase to 4173, being the size 20 × 20 for the whole face, neck and shoulders. Examples of the face detection using the trained classifier are presented in Figure 5.6. In last row of the figure images with false positive detections are presented. The number of detections are 5049 and 2251 of these detections contain a face (true positives). The other 2798 detections are false positives. Therefore, the calculated precision and recall are 44.6% and 53.9%, respectively.

The study presented on [59] used the Viola and Jones algorithm [40] applied to thermal image. The results of this study show that in order to obtain a good accuracy, a large amount of training data is needed. In this study, the positive images only contain the face and not other members of the human body. However, the results of the study made in [59] have better performance comparing with the results presented in this thesis.

Figure 5.6: Examples of the detection using Haar Cascade algorithm with the trained classifier for thermal images.

(60)

5.1.3 Training Improvement

In other to attempt improving the performance of the detection, the positive images of the Figure 5.4 were edited and cropped to train a classifier with positive images containing only faces. Figure 5.7 shows examples of this new positive images. The parameters used in this training are the same of the previous training by changing only the positive images. 5770 samples with window size of 20 × 20 were also created.

Figure 5.7: Examples of positive images containing only faces.

For this test, there are more or less 1602 faces that can be detected. Examples of the face detection using the improved classifier are presented in Figure 5.8. Results of images with false positives detection are presented in the last row of the figure. 3829 possible faces were detected, but only 1464 corresponded to true positive. Consequently, the number of false positives is 2375. The precision and recall are 38.1% and 91.4%, respectively.

For the purpose of better understanding the results of all tests performed using the Haar Cascade algorithm, Table 5.2 presents the results for face detection of each classifier used. In this table, the cascade classifiers are represented by letters, being the letters “a” and “b” for the classifiers trained by [54], “c” and “d” the first and last classifiers trained in this thesis. The results shows that the training made to improve the first training detects most of the faces in the images. Although it has a lower precision than the first training, the detection only contains the face and not other members of the human body.

Face detection on infrared thermal image

Ricardo Ferreira

Ribeiro

Dete¸

c˜

ao Facial em Imagem T´

ermica Infravermelha

Face Detection on Infrared Thermal Image

Ricardo Ferreira

Ribeiro

Dete¸

c˜

ao Facial em Imagem T´

ermica Infravermelha

Face Detection on Infrared Thermal Image

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1.1

Objectives

1.2

Thesis Outline

Chapter 2

Background

2.1

Infrared Thermal Radiation

2.2

Infrared Thermal Cameras

Chapter 3

System Description

3.1

FLIR Lepton Camera Module

3.2

Raspberry Pi 3 Model B

3.3

Raspberry Pi Camera Module

3.4

Complete System

Chapter 4

Face Detection

4.1

Advantages and Limitations

4.2

Proposed Approach

Chapter 5

Experimental Results

5.1

Haar Cascades