Department of Informatics and Applied Mathematics Bachelor in Computer Science
Ulcer Segmentation and Tissue Classification
using Color Texture Clustering
Vítor de Godeiro Marques
Natal-RN November 2018
Ulcer Segmentation and Tissue Classification using
Color Texture Clustering
Undergraduate dissertation submitted to the Deparment of Informatics and Applied Math-ematics of the Center for Exact and Earth Sci-ences of the Federal University of Rio Grande do Norte as a partial requirement for obtain-ing the degree of bachelor in Computer Sci-ence.
Advisor
Prof. Dr. Bruno Motta de Carvalho
Co-Advisor
Prof. Dr. Bruno Santana da Silva
Federal University of Rio Grande do Norte – UFRN Department of Informatics and Applied Mathematics – DIMAp
Natal-RN November 2018
Marques, Vítor de Godeiro.
Ulcer segmentation and tissue classification using color texture clustering / Vítor de Godeiro Marques. - 2018. 85f.: il.
Monografia (Bacharelado em Ciência da Computação)
-Universidade Federal do Rio Grande do Norte, Centro de Ciências Exatas e da Terra, Departamento de Informática e Matemática Aplicada. Natal, 2018.
Orientador: Bruno Motta de Carvalho. Coorientador: Bruno Santana da Silva.
1. Processamento de imagens Monografia. 2. Terapia larval -Monografia. 3. Feridas crônicas - -Monografia. 4. Segmentação de imagem - Monografia. 5. Classificação de tecidos - Monografia. 6. Análise de imagens coloridas Monografia. 7. Agrupamento -Monografia. I. Carvalho, Bruno Motta de. II. Silva, Bruno Santana da. III. Título.
RN/UF/CCET CDU 004.932
Catalogação de Publicação na Fonte. UFRN - Biblioteca Setorial Prof. Ronaldo Xavier de Arruda - CCET
using Color Texture Clustering presented by Vítor de Godeiro Marques and accepted by the Department of Informatics and Applied Mathematics of the Center for Exact and Earth Sciences of the Federal University of Rio Grande do Norte, being approved by all members of the examining board specified below:
Prof. Dr. Bruno Motta de Carvalho
Advisor
Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte
Prof. Dr. Bruno Santana da Silva
Co-advisor
Instituto Metrópole Digital Federal University of Rio Grande do Norte
Prof. Dr. Anne Magaly de Paula Canuto
Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte
Prof. Dr. Selan Rodrigues dos Santos
Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte
Initially, I want to thank my family for all the support, encouragement to study and have supported me for so many years. Without your support I could not complete this work. You have made me what I am. I would also like to thank the great friends who accompanied me in this journey from the beginning or joined it at sometime.
I want to thank to the great computer scientist, LATEXdeveloper, pixels magician and
talented soccer player, prof. Bruno Motta, my advisor, for the great guidance and patience along the journey to finish this work. Thank you so much for introducing me to this new world of Computer Vision field, you helped me whenever I needed and supported me almost everyday during those years.
I would like to thank prof. Bruno Santana, my co-advisor, who back in my freshman years, welcomed me and guided me during the undergraduation time introducing the diverse computing “worlds” and making me become more and more passionate with it. Thank you very much for the great guidance and patience along the journey to accomplish this work.
I also hugely thank José Neto for their invaluable technical contributions and opinions for this work and for my academic life.
Finally, I would like to thank the Federal University of Rio Grande do Norte for all the support and structure offered for my personal growth as citizen, student and researcher.
usando agrupamento de texturas coloridas
Autor: Vítor de Godeiro Marques Orientador: Prof. Dr. Bruno Motta de Carvalho Coorientador: Prof. Dr. Bruno Santana da Silva
Resumo
Feridas Crônicas são úlceras que apresentam um processo de cicatrização difícil ou quase interrupto que aumenta o risco de complicações para a saúde dos pacientes, como am-putações e infecções. Esta pesquisa propõe uma metodologia geral e não invasiva para a segmentação e análise das imagens de feridas crônicas, computando as áreas afetadas pela necrose, em contraste às técnicas invasivas que são comumente utilizadas para esse cálculo, como a planimetria manual com filme plástico. Nós investigamos algoritmos para realizar a segmentação de feridas e classificação de tecidos como Necrose, Granulação ou Esfacelo. Na metodologia proposta, foram utilizadas descrições texturais baseadas em histogramas, que foram comparadas utilizando a Earth Mover’s Distance e propomos uma metodologia de redução de espaço de cor que aumentou a acurácia, especificidade, sensibilidade e coeficiente de Dice. Também desenvolvemos um protótipo de aplicativo móvel para mostrar que é possível utilizar esse aplicativo para dar suporte à Terapia Larval em dispositivos móveis.
Palavras-chave: Terapia Larval, Feridas Cronicas, Segmentação de Imagem, Classificação de Tecidos, Análise de Imagens Coloridas, Agrupamento.
Color Texture Clustering
Author: Vítor de Godeiro Marques Advisor: Prof. Dr. Bruno Motta de Carvalho Co-Advisor: Prof. Dr. Bruno Santana da Silva
Abstract
Chronic Wounds are ulcers presenting a difficult or nearly interrupted cicatrization process that increases the risk of complications to the health of patients, like amputations and infections. This research proposes a general noninvasive methodology for the segmentation and analysis of images of chronic wounds by computing the wound areas affected by necrosis, as opposed to invasive techniques that are commonly used for this calculation, such as manual planimetry with plastic films. We investigated algorithms to perform the segmentation of wounds and classification of tissues as Necrotic, Granulation or Slough. In the proposed methodology, we used histogram based textural descriptions, that were compared by using the Earth Mover’s Distance, and proposed a color space reduction methodology that increased the reported accuracies, specificities, sensitivities and Dice coefficients. We also developed a mobile app prototype to show that it is possible to employ such application for supporting Larval Therapy on mobile devices.
Keywords: Larval therapy, Chronic wounds, Image segmentation, Tissue classification, Color image analysis, Clustering.
1.1 Examples of different tissue composition. . . p. 17 2.1 First digital image, a photograph of Russel Kirsch’s son. . . p. 20 2.2 Digital image representations. . . p. 21 2.3 Graphic representation of color models. . . p. 21 2.4 (a) An image consisting of five different textured regions. (b) The goal
of texture classification is to label each textured region with the proper category label: the identities of the five texture regions present in (a). (c) The goal of texture segmentation is to separate the regions in the image which have different textures and identify the boundaries between them
(TUCERYAN; JAIN, 1993). . . p. 26 2.5 Difference between Euclidean and Manhattan distances. . . p. 27 2.6 Examples of Flip image. . . p. 29 2.7 Examples of Rotation image. . . p. 29 2.8 Examples of Scale image. . . p. 30 2.9 Examples of Crop image. . . p. 30 2.10 Examples of CIFAR-10 dataset. . . p. 31 2.11 Examples of PASCAL VOC 2012 dataset. . . p. 32 2.12 Examples of SYNTHIA Dataset. . . p. 33 2.13 CNN Architecture. . . p. 34 2.14 Example of Convolution. . . p. 35 2.15 Example of Max Pooling. . . p. 36 4.1 Phases of Methodology. . . p. 46 4.2 Example of image acquired by the protocol of acquisition. . . p. 48
4.4 Example of the Watershed execution process. . . p. 50 4.5 Example of histogram reduction (slough=yellow, granulation=red, necrotic=blue). p. 51 4.6 Example of classification. . . p. 52 4.7 Example of object to retrieve scale. . . p. 53 4.8 Example of application design example. . . p. 54 4.9 Software architecture. . . p. 55 4.10 Examples of classification in prototype mobile application. . . p. 57 5.1 Examples of dataset. . . p. 59 5.2 Examples of Segmentation. . . p. 60 5.3 Examples the reduction of color space. . . p. 61 5.4 Examples of classification by clustering. . . p. 65 5.5 Examples of classification by Deep Learning. . . p. 68 5.6 Examples of classification by EMD, Deep Learning and MOWA. . . p. 73 5.7 Example of the area computation execution process. . . p. 75
1.1 Comparing methods of debridement. . . p. 18 3.1 Summary of current works on wound segmentation. . . p. 39 3.2 Summary of current works on wound segmentation 2. . . p. 40 3.3 Summary of current works on wound classification. . . p. 42 3.4 Summary of current works on wound classification 2. . . p. 43 3.5 Summary of current works on wound mobile application. . . p. 44 5.1 Segmentation Results . . . p. 59 5.2 Clustering results for the original images and the images with color space
reduction. . . p. 63 5.3 Best result of clustering. . . p. 64 5.4 Classification results for the original images and the images with color
space reduction. . . p. 67 5.5 Detailed classification results for the U-Net architecture with color space
reduction. . . p. 69 5.6 Comparing results for the better approaches with MOWA. . . p. 71 5.7 Detailed comparing results for the better approaches with MOWA. . . p. 72
Adam – Adaptive Moment Estimation AUC – Area Under the Curve
CIE – Commission International de l’Eclairage CNNs – Convolutional Neural Networks
DL – Deep Learning
DFUs – Diabetic Foot Ulcers EMD – Earth Mover’s Distance FC – Fully Connected Layers
HUOL – Hospital Universitário Onofre Lopes MVP – Model View Presenter
MOWA – Mobile Wound Analyzer MLP – Multilayer Perceptron PUs – Pressure Ulcers
RBF – Radial Basis Function ReLU – Rectified Linear Unit Tanh – Hyperbolic Tangent VLUs – Venous Leg Ulcers
1 Introduction p. 16 1.1 Objectives . . . p. 18 1.2 Overview of work . . . p. 19
2 Theoretical Background p. 20
2.1 Color Image Processing . . . p. 20 2.1.1 Digital Images and Their Characteristics . . . p. 20 2.1.2 Digital Image Segmentation . . . p. 23 2.1.2.1 Thresholding . . . p. 24 2.1.2.2 Region Growing . . . p. 24 2.1.2.3 Clustering . . . p. 25 2.1.2.4 Energy Based Methods . . . p. 25 2.1.3 Texture Analysis . . . p. 26 2.2 Deep Learning . . . p. 27 2.2.1 Data augmentation . . . p. 28 2.2.1.1 Flip . . . p. 28 2.2.1.2 Rotation . . . p. 29 2.2.1.3 Scale . . . p. 29 2.2.2 Crop . . . p. 30 2.2.3 Computer Vision Techniques by Deep Learning . . . p. 30 2.2.3.1 Image Classification . . . p. 31 2.2.3.2 Object Detection . . . p. 32
2.2.4 Convolutional Neural network . . . p. 34 2.2.4.1 Convolution Layers . . . p. 34 2.2.4.2 Pooling Layers . . . p. 35 2.2.4.3 Fully Connected Layers . . . p. 36 2.3 Evaluation metrics . . . p. 36 3 Related Works p. 38 3.1 Segmentation . . . p. 38 3.2 Classification . . . p. 40 3.3 Mobile Applications . . . p. 43 4 Proposed Methodology p. 46 4.1 Image Acquisition . . . p. 47 4.2 Pre-Processing . . . p. 48 4.3 Segmentation . . . p. 49 4.4 Color Space Reduction . . . p. 50 4.5 Tissue Classification . . . p. 51 4.6 Area Computation . . . p. 52 4.7 Prototype of Mobile Application . . . p. 53 4.7.1 Design . . . p. 54 4.7.2 Development . . . p. 54 4.7.3 Execution . . . p. 56 5 Evaluation p. 58 5.1 Dataset . . . p. 58 5.2 Segmentation . . . p. 59 5.3 Reduction of Color Space . . . p. 60
5.4.1 Clustering . . . p. 61 5.4.2 CNN . . . p. 66 5.4.3 Comparing Clustering, Mowa and CNN results . . . p. 70 5.5 Area Computation . . . p. 74
6 Final Remarks p. 76
1
Introduction
Chronic wounds are ulcers with a difficult or nearly interrupted cicatrization process, like Pressure Ulcers (PUs), Venous Leg Ulcers (VLUs) and Diabetic Foot Ulcers (DFUs) in humans (PERCIVAL; SULEMAN, 2015). Chronic wounds increase the risk of complications in health and well-being of patients, like amputation and infections, and affect an estimated 6.5 million people (2% of the population) in the United States (SEN et al., 2009; FIFE; CARTER, 2012). Those wounds related to diabetes resulted in approximately 73,000 lower
limb amputations in 2010 (HEALTH; SERVICES et al., 2013). They costed $174 billions to the United States in 2007, where $116 billion were in direct costs and $58.3 billion in indirect costs, such as loss of productivity, disability, and premature mortality (ASSOCIATION et al., 2008; DRIVER et al., 2010). Chronic wounds incidence tend to increase due to the aging
population and more incidence of risk factors such as diabetes and obesity (SEN et al., 2009).
The proper treatment of chronic wounds is essential to avoid complications and repair the tissue’s health. Up to 85% of all amputations related to diabetic foot ulcers could be avoided with proper clinical intervention and effective self management from the patient (ALEXIADOU; DOUPIS, 2012;RICE et al., 2014). The treatment depends on the ulcer anatomy, which usually has three types of tissue (FALANGA, 2004): granulation (reddish tissue), necrosis (brown/blackish tissue, being able to change among patients) and slough (pale yellow or brownish yellow tissue). Figure 1.1 presents examples12 of these tissues.
There are some treatments to chronic wounds. Debridement is an important procedure to manage necrotic tissue and slough (STROHAL et al., 2013; MD; RPH; PHD, 2009). It removes non-viable tissues to clean the wound and to improve the healing process. Among the debridement procedures, there are autolytic, surgical, mechanical and hydrosurgical debridements, mechanical desloughing and larval therapy. Each of one requires professionals with specific skills and knowledge.
1All wound images in this work are part of the dataset described in Section 5.1
(a) Mainly composed of granulation (b) Mainly composed of necrotic tissue
(c) Mainly composed of slough (d) Mixed composition
Figure 1.1: Examples of different tissue composition.
Autolytic debridement occurs when a body uses its own enzymes to break down devitalized tissue (PERCIVAL; SULEMAN, 2015). This requires a moist environment which can be achieved by using a set of different wound dressings that support the autolytic debridement, like hydrogels or hydrocolloids (CUSCHIERI et al., 2013). The necessary resources are expensive. Moreover, there is a infection risk through aerosolisation of the blood (BOWLING et al., 2009).
Surgical debridement is performed in a surgery center with a bleeding wound and some patient pain. It has a high cost and is one of the faster methods for debriding. Mechanical debridement removes the necrotic tissue using physical force though friction with instruments like tweezers or gauzes (BAHR et al., 2011). This procedure is painful and can also remove healthy granulating tissue. Hydrosurgical debridement combines mechanical and surgical debridement (PERCIVAL; SULEMAN, 2015). It is also expensive and has infection risk through aerosolisation of blood similar to original procedures (BOWLING et al., 2009).
of disinfected flies to the cleaning and removal of dead tissues in wounds. Fly larvae feeds on necrotic tissue, without damaging healthy tissues, and secrete substances that disinfect the wound killing pathogenic bacteria and accelerating the healing process (ZARCHI; JEMEC, 2012; ARABLOO et al., 2016; BROWN et al., 2012). Typical larval therapy protocols recommend the usage of 5 to 25 larvae/necrotic tissue cm2 (NASSU; THYSSEN, 2015;
FLEISCHMANN; GRASSBERGER; SHERMAN, 2004).
All debridement procedures have their own benefits and disadvantages to patients, but the Larval Therapy presents the lower risks, as we can see on the comparisons shown in Table 1.1.
Table 1.1: Comparing methods of debridement. Debridement
method
Advantages Disadvantages Risk
Autolytic Useful in the preven-tion of devitalized tis-sue and slough. Does not require specialists. No pain to patient.
Slow process. Can lead to maceration and creases the risk of in-fection.
Increases macer-ation and Infec-tion.
Surgical Maintenance debride-ment. Very selective in the tissues.
Must be carried out in the surgical room. Requires skilled per-sonnel and specialized equipment. High cost.
Procedure causes a lot of pain to patients.
Mechanical Does not require spe-cialists. Procedure is easy.
Painful for the patient. Requires lots of dress-ings.
Pain on removal. Can remove healthy granulat-ing tissue.
Hydrosurgical Precisely target the area for debridement. Short treatment.
Requires skilled per-sonnel. High cost.
Potential in-fection risk (aerosolisation). Larval therapy Quick treatment times.
selective to necrotic tis-sue. Does not require a specialist.
Comparatively costly. Cannot be used for all
patients (adherence). —
1.1
Objectives
Proper application of Larval Therapy requires tissue area calculation in wound to avoid waste of larvae or to use of less larvae than necessary. The visual exam of the wound
is a simple and very used method, although highly inaccurate. A somewhat more accurate alternative is the manual measurement of wounds height and width, but with inconvenience of being invasive. As wounds usually cover irregular shapes and surfaces, this still does not have a very good accuracy. Image processing and Computer Vision techniques can be a great aid to tissue area calculation of wounds during Larval Therapy, with the potential of producing results with better accuracies while using less health care professionals efforts and errors.
This work aims to propose a new method to segment and classify tissues present in chronic wounds (granulation, necrosis and slough). The main motivation for this work is its use in a Larval Therapy mobile application that will allow health professionals to accurately compute the necrotic areas of an ulcer in order to determine the optimal number of larvae needed to clean it. To achieve our objective, our proposal should be able to:
• Segment wound area;
• Classify wound tissues;
• Calculate wound tissues areas in cm2; and
• Evaluate the execution of algorithms on a mobile platform.
1.2
Overview of work
This document is comprised of 6 chapters, with this introduction being the first one. In Chapter 2, we introduce concepts and definitions of digital image processing and Deep Learning necessary to understand the work presented, and which we have used during our studies and experiments. In Chapter 3, we presented the related works grouped by the main objectives: wound segmentation (Section 3.1), wound tissue classification (Section 3.2) and mobile applications (Section 3.3).
Chapter 4, presents the method developed in this work to segment ulcers, classify tissues and to calculate area of chronic wound tissues and presents the Prototype of the developed mobile application. In Chapter 5 we expose, analyze and explain the results of the experiments. In Chapter 6 we present our conclusion, talk about future works that may span from this work and the papers generated from this work.
2
Theoretical Background
Having motivated our problem, in this chapter we introduce some key concepts which were used in the development of the work . We present concepts related to Color Image Processing, Deep Learning and explain the metrics used to evaluate this work.
2.1
Color Image Processing
This section discusses the main areas involved in this work: digital images and their characteristics, digital image segmentation and texture analysis.
2.1.1
Digital Images and Their Characteristics
The first digital image was acquired with 176×176 pixels, dating back to 1957 (Figure 2.1) and was scanned by an apparatus invented by Russel Kirsch. The low resolution was due to the fact that the computer was not capable of storing more information.
The Kirsch’s convention is still the digital images basis currently (GONZALEZ; WOODS, 2012). It defines digital image as a two-dimensional function of light intensity f (x, y), where x and y denote spatial coordinates and value of f at any point (x, y) is called image intensity or gray level at that point. When x, y and intensity values of f are all finite, we call f a digital image, while when dealing with color or multiband digital image, the value of f at any point (x, y) is a vector value. Figure 2.2 illustrates standard digital image representations.
pixel(x,y) f(x,y)=255
x
y
(a) Gray Scale Image
pixel(x,y) f(x,y)=[255, ..., 255] x y n (b) Color Image
Figure 2.2: Digital image representations.
(a) RGB (b) CIELab (c) HSV
Figure 2.3: Graphic representation of color models.
There are several color models for representing colored images. In this work we used the following color models: RGB, CIELab and HSV. The RGB (red, green, blue) color space is an additive model that composes colors by combining amounts of the primary light colors. The black color is the absence of all primary colors while the white color is the sum of the maximum values for all primary colors. This model is the most common color representation system used in televisions and monitors (CHENG et al., 2001), being the
standard for storing digital images. However, this model has a high correlation between channels. When there is a variation in image intensity, there is a variation in the three channels of the model too (COMANICIU; MEER, 1997). Moreover, this color space is not perceptually uniform: two visually close colors may be more separated in 3D space than two other visually more distinct colors. Therefore, the similarity measure obtained from the distance between two points in this 3D space is not adequate for comparing two colors (CHENG et al., 2001).
The HSV color space is a nonlinear transformation of the RGB color system, consisting of components hue (Hue – H), saturation (Saturation – S) and brightness (Value – V). Colors are easier to be described in HSV than in RGB space. The computation of the V component can be visualized in Equation 2.1, while the formulas for the S and H components can be seen in Equations 2.2 and 2.3, respectively. When H < 0, the value is updated to H = H + 360. V = max(R, G, B) (2.1) S = V −min(R,G,B) V if V 6= 0 0 otherwise (2.2) H = 60(G−B) V −min(R,G,B) if V = R 120+60(B−R) V −min(R,G,B) if V = G 240+60(R−G) V −min(R,G,B) if V = B (2.3)
We can use non-linear transformations of the RGB space to solve the problem of the correlation between the channels (MEYER, 1992). Following this approach, CIELab is a color system with components L∗, a∗ and b∗, where L∗ represents brightness, and a∗ and b∗ dimensions represent the chromatic information. Positive values of a∗ represent red and negative represent green while positive values of b∗ represent yellow and negatives represent blue.
The CIE (Commission International de l’Eclairage) has developed a perceptually uniform color model (L’ÉCLAIRAGE, 1932). In this model any color can be specified by combining the variables X, Y and Z. These are obtained by a linear transformation of the RGB model according to Equation 2.4 and normalized to Equation 2.5.
X Y Z = 0.412453 0.357580 0.180423 0.212671 0.715160 0.072169 0.019334 0.119193 0.950227 R G B (2.4) X = X 0.950456 Z = Z 1.088754 (2.5)
The CIELab color model can be obtained by non-linear transformations from X, Y and Z. According to (CHENG et al., 2001), this model can control color and intensity information in a more independent and simpler way than the RGB color model, besides allowing the direct color comparison based on geometric separation (distance) in 3D color space. The value of component L∗ is obtained from Equation 2.6, while the value of components a∗ and b∗ are obtained from Equation 2.7 using the function f defined in Equation 2.8.
L = 116Y13 − 16 if Y > 0.008856 903.3Y if Y ≤ 0.008856 (2.6) a = 500(f (X) − f (Y )) b = 200(f (Y ) − f (Z)) (2.7) f (t) = t13 if t > 0.008856 7.787t + 11616 if t ≤ 0.008856 (2.8)
2.1.2
Digital Image Segmentation
Segmentation is an important step in digital image processing, and it is used to divide an image in different regions in order to obtain a better interpretation of the image information. The main objective of image segmentation is to subdivide the image into semantic regions by grouping the pixels that have some predefined type of similarity (ALY; DERIS; ZAKI, 2011).
Image segmentation methods can be divided into supervised or unsupervised. In supervised segmentation, classes are predefined resulting in a set of marked images (training
set). Based on this set, a classifier is trained to capture the characteristics of the pixels in each class. The classifier is then applied to new test images and the result is compared to the ground truth. In non-supervised segmentation there is no class assignment, which is defined directly from the pattern (LI; TAX; LOOG, 2012).
The segmentation of images can still be based on one of two basic properties of intensity values of images: discontinuity and similarity. The methods of the first category tend to partition an image based on abrupt changes in intensity, such as isolated points, lines and edges while the methods of the second category are based on partitioning an image into regions that are similar according to a set of predefined criteria (GONZALEZ; WOODS, 2012). In this work we will approach methods based on similarity, such as tresholding, region growing, clustering and energy based.
2.1.2.1 Thresholding
Thresholding is an image segmentation method that converts a grayscale image into a binary image. This process is accomplished by setting a threshold within the gray-tones space in the original image, then the pixels below or equal to the boundary value are selected as belonging into the one group and the other pixels are selected for a second group. The resulting image is commonly presented as binary or two-level (KULKARNI, 2012). The basic equation of this procedure can be seen in Equation 2.9, where the threshold separates the pixels into two classes. In this equation, f (x, y) is the original value of the pixel and g(x, y) is the resulting pixel value after procedure, thus producing a binary image.
g(x, y) = 0, f (x, y) ≤ T 1, f (x, y) > T (2.9)
Among the most used techniques of this type, we have Otsu’s algorithm (OTSU, 1979). The Otsu binarization automatically calculates a threshold value from image histogram for a bimodal image (bimodal image is an image whose histogram has two peaks) that is, in simple words that present an object and the background.
2.1.2.2 Region Growing
The region growth procedure basically consists of joining adjacent pixels to form regions. The association between neighboring pixels or regions in the growth process is determined by a homogeneity criterion that must be satisfied for there to be a combination
of pixels and regions (KAMDI; KRISHNA, 2012). These criteria are made of features that incorporate information about intensity, color or texture (MUñOZ et al., 2003).
The Watershed algorithm is one of the main techniques of this kind. In this work we use a region growing variant of the non-parametric marker-based Watershed algorithm described in (MEYER, 1992). For the execution of this algorithm it is necessary to add seeds in the images. From these selected points, the algorithm constructs the components of the new image, thus returning the segmented image.
2.1.2.3 Clustering
Clustering consists of grouping the pixels into homogeneous classes so that the samples of the same class are as similar as possible. Grouping can also be considered as a form of data compression, where a large number of samples (pixels with different intensities) are converted into a small number of representative pixels. Several measures of similarity can be chosen to identify classes, such as distance, connectivity and intensity (SOWMYA; RANI, 2011). Clustering segmentation is a broad field with a number of proposed solutions, among its main solutions are the K-means (KANUNGO et al., 2002). K-means clustering is a type of unsupervised learning, and K represents the number of clusters we are going to classify our data points into. The algorithm works iteratively, assigning each data point to one of K groups based on the features that are provided.
2.1.2.4 Energy Based Methods
Energy Based Segmentation methods establish an objective (energy) function which will reach a minimal value when the image is segmented with the “expected result" according to this funtion(YI; MOON, 2012). The GrabCut algorithm is arguably one of the most used technique of this type. In this work we use the GrabCut variant described in (ROTHER; KOLMOGOROV; BLAKE, 2004). For the execution of this algorithm it is necessary to add one rectangle around the object. Everything outside this rectangle will be taken as background and everything inside rectangle is unknown. Optionally the user can add seeds to the foreground and background, which are considered as hard-labelling, meaning that they won’t change during the process. Then a mincut algorithm is used to segment the graph by cutting the graph into two separating source node and sink node with minimum value of the cost function, thus returning the segmented image.
2.1.3
Texture Analysis
Texture is a difficult concept to define and does not have an unique formal definition. Intuitively, the texture provides some perceptible features for the human visual system, such as the spatial distribution and variation of luminosity, as well as describes the structural arrangement from surfaces and relations between neighboring regions (PEDRINI; SCHWARTZ, 2008).
In the literature, there are several definitions for texture (TAMURA; MORI; YAMAWAKI, 1978), as they define texture as what constitutes a macroscopic region, its structure would simply be attributed to repetitive patterns in which the elements or primitives would be arranged according to a positioning rule. Cross and Jain (CROSS; JAIN, 1983) define texture as the region of an image whose set of local statistics, local properties of the image function, vary slowly or approximately in a periodical manner.
Texture descriptors can be used for several applications, and among them we can highlight the classification, segmentation and synthesis of textures. The classification aims to create a map in which each region is identified as belonging to a certain class previously defined by means of a training set; the segmentation objective is to partition the image in regions that present similar characteristics and the texture synthesis is responsible for the determination of a model capable of generating a certain texture.
Figure 2.4: (a) An image consisting of five different textured regions. (b) The goal of texture classification is to label each textured region with the proper category label: the identities of the five texture regions present in (a). (c) The goal of texture segmentation is to separate the regions in the image which have different textures and identify the boundaries between them (TUCERYAN; JAIN, 1993).
For example, in Figure 2.4 (a), five different textures can be identified. The purpose of texture classification is to produce a classification map of the input image, where each uniform region of the texture is identified with the texture class to which it belongs, as shown in Figure 2.4 (b). The purpose of texture segmentation is to obtain a boundary
map similar to that shown in Figure 2.4 (c).
In this work we will focus on the application of classification using texture, for this purpose we will use a clustering technique that employs the Earth Mover’s Distance (EMD) (RUBNER; TOMASI; GUIBAS, 2000) as a metric (distance). The Earth Mover’s Distance measures the dissimilarity between signatures that are compact representations of distributions, comparing two histograms and verifying how different these histograms are, assigning the value 0 if they are equal.
To compute this dissimilarity it is possible to use different distances, among them the Euclidean distance and the Manhattan distance. Given some images A and B, which have n color channels, the formulated Euclidean and Manhattan Distances can be seen in Equations 2.10 and 2.11, respectively. Figure 2.5 allows us to better visualize the difference between these metrics.
Euclidean_Distance = v u u t n X i=1 (Ai− Bi)2 (2.10) M anhattan_Distance = n X i=1 |Ai− Bi| (2.11) x y (a) Euclidean x y (b) Manhattan
Figure 2.5: Difference between Euclidean and Manhattan distances.
2.2
Deep Learning
The challenge of generalizing to new examples becomes exponentially more difficult when working with high-dimensional data, and how the mechanisms used to achieve generalization in traditional machine learning are insufficient to learn complicated functions in high-dimensional spaces, this challenge has motivated the development of Deep Learning (DL) algorithms that overcome this obstacle. Therefore, Deep Learning allows the computer to build complex concepts out of simpler concepts (GOODFELLOW; BENGIO; COURVILLE,
2016).
“Deep Learning is a particular kind of machine learning that achieves great power and flexibility by learning to represent the world as a nested hierarchy of concepts, with each concept defined in relation to simpler concepts, and more abstract representations computed in terms of less abstract ones” (GOODFELLOW; BENGIO; COURVILLE, 2016).
Deep Learning is a key area of research in fields such as Image and Video processing, Computer Vision (SCHMIDHUBER, 2012; LECUN et al., 1998) and Bioinformatics. Addition-ally, Deep Learning is widely used in industry such as: Facebook to identifying faces in images, Google to manage the energy at the company’s data centers.
In this section we will present the basic concepts of Deep Learning. Initially we are going to introduce data augmentation techniques, that are used for increasing the performance of Deep Learning. Next we present three of Computer Vision tasks that are being solved with excellent performance through DL. Finally, we will introduce Convolutional Neural Networks (CNNs) which is widely used to solve Computer Vision Tasks.
2.2.1
Data augmentation
Data augmentation techniques means to increase the number of data. In terms of images, it is about increasing the number of images in the used datasets. The more the data, the better our Deep Learning architectures will be, in principle. But every data collection process is associated with a cost. This cost can be in terms of dollars, human effort, computational resources and off course, time consumed in the process. Additionally the amount of data is finite and even more so in the medical industry (PEREZ; WANG, 2017), focus of this work.
There are many ways to perform data augmentation. In images, you can rotate, translate the original image, change lighting conditions, crop it differently, so for one image you can generate multiple samples. In this subsection, we present some basic but powerful augmentation techniques that are very popular: flip, rotation, scale and crop.
2.2.1.1 Flip
You can flip images horizontally and vertically. Figure 2.6 show examples for images that are flipped.
(a) Original image (b) Horizontal flipped image
(c) Vertical flipped im-age
(d) Horizontal and verti-cal flipped image
Figure 2.6: Examples of Flip image.
2.2.1.2 Rotation
One key thing to note about this operation is that image dimensions may not be preserved after rotation. For example if your image is a square, only rotating it at right angles will map the defined areas of the new image to the area occupied by the original image. Figure 2.7 shows examples of square images rotated at right angles.
(a) Original image (b) Image rotated by 90 degrees
(c) Image rotated by 180 degrees
(d) Image rotated by 270 degrees
Figure 2.7: Examples of Rotation image.
2.2.1.3 Scale
The image can be scaled to increase or decrease its size. While scaling to increase size, the final image size will be larger than the original image size. Most image frameworks cut out a section from the new image, with size equal to the original image. Figure 2.8 show examples of images being scaled.
(a) Original image (b) Image increase seize 10% (c) Image increase seize 20%
Figure 2.8: Examples of Scale image.
2.2.2
Crop
Cropping image means selecting randomly a section from the original image, performing the image crop and resizing this section to the original image size. This method is popularly known as random cropping. Figure 2.9 show examples of random cropping.
(a) Original image (b) Image example of crop (c) Image example of crop
Figure 2.9: Examples of Crop image.
2.2.3
Computer Vision Techniques by Deep Learning
Deep learning has recently become one of the most popular sub-fields of machine learning, owing to its distributed data representation with multiple levels of abstraction. A diverse range of deep learning algorithms are being employed to solve conventional artificial intelligence problems and computer visions tasks. In this subsection we are presenting tasks of computer vision which are being resolved by Deep Learning, these tasks are: Image Classification, Object Detection and Semantic Segmentation.
2.2.3.1 Image Classification
The problem of Image Classification is the following: given a set of images that are all labeled, we are asked to predict these categories for a novel set of test images and measure the accuracy of the predictions. There are a variety of challenges associated with this task, like as: viewpoint, scale, intra-class variations, image deformation, image occlusion, illumination conditions, and background clutter.
Computer Vision researchers have come up with a data-driven approach to work with Image Classification. This approach provide the computer with many examples of each image class and then develop learning algorithms that look at these examples and learn about the visual appearance of each class. The most popular architecture used for image classification is Convolutional Neural Network.
In the literature there exists several dataset for image classification, such as the CIFAR-10 dataset (KRIZHEVSKY; NAIR; HINTON, ), composed by 60,000 32 × 32 color images. Each image is labeled with one of 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck), with 6000 images per class. Figure 2.10 shows examples of this dataset.
In the literature there exists several CNNs architecture for solving problems in Im-age Classification, such as: AlexNet (KRIZHEVSKY; SUTSKEVER; HINTON, 2012), ZFNet (ZEILER; FERGUS, 2013), GoogLeNet (SZEGEDY et al., 2014), VGGNet (SIMONYAN; ZIS-SERMAN, 2014), ResNet (HE et al., 2015) and DenseNet (HUANG; LIU; WEINBERGER, 2016).
2.2.3.2 Object Detection
The problem of object detection is detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. The application domain has applications in many areas of computer vision like as face detection for video surveillance, pedestrian and objects detection for autonomous cars and many others.
In the literature there exists several dataset for object detection, like as PASCAL VOC 2012 Dataset (EVERINGHAM et al., ) composed by 11,530 colour images containing 27,450 ROI annotated objects. Each image is labeled with one of 20 classes. Figure 2.11 shows examples of this dataset.
(a) Bottle (b) Bus (c) Car (d) Dog (e) Chair
(f) Cow (g) Table (h) Bird (i) Aeroplane (j) Sheep
(k) Cat (l) Train (m) Horse (n) Motor bike (o) Person
(p) Tv monitor (q) Bicycle (r) Couch (s) Boat (t) Potted plant
In the literature there exists several deep architecture for solving problems in Object Detection, like as: R-CNN (GIRSHICK et al., 2016), Fast R-CNN (GIRSHICK, 2015), YOLO (REDMON et al., 2015), SSD (LIU et al., 2015) and R-FCN (DAI et al., 2016).
2.2.3.3 Semantic Segmentation
One of the main areas of Computer Vision is the process of Segmentation. Particularly, Semantic Segmentation tries to semantically understand the role of each pixel in the image (e.g. is it a car, a motorbike, bike or a person ). As with other computer vision tasks,
CNNs have had enormous success on segmentation problems.
In the literature there exists several dataset for semantic segmentation, like as SYN-THIA Dataset (ROS et al., 2016) this is synthetic dataset of urban scenes that consists of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations for 13 classes. The dataset have more 200,000 HD images from video streams and +20,000 HD images from independent snapshots. Figure 2.12 shows examples of this dataset.
Figure 2.12: Examples of SYNTHIA Dataset.
In the literature there exists several deep architecture for solving problems in Semantic Segmentation, like as: convolutional networks SegNet (BADRINARAYANAN; KENDALL; CIPOLLA, 2017), U-net (RONNEBERGER; FISCHER; BROX, 2015), FCN-32s and FCN-8s (LONG; SHELHAMER; DARRELL, 2015). The FCN-32s and FCN-8s are inspired by the VGG-16 net, which is a 16 layer CNN architecture. These models are customized with different upsampling layers that magnify the output used in the original CNN model VGG-16 (LONG; SHELHAMER; DARRELL, 2015).
The SegNet is topologically identical to the convolutional layers of the VGG16, but without the fully connected layers of the VGG-16, making the it significantly smaller and easier to train than many other recent architectures (BADRINARAYANAN; KENDALL; CIPOLLA, 2017). U-Net was designed for segmenting biomedical images, since it offers an architecture that requires few images for training and produces more precise segmentations.
The need for a few images stems from the fact that it increases the initial data set by applying elastic deformations to the available training images, allowing the network to learn the invariability for such deformations, without the need to see these transformations in the original images.
2.2.4
Convolutional Neural network
Convolutional Neural Networks (CUN et al., 1989), also known as Convolutional Net-works is one of the main architectures to do Image Classification, Object Detection, Semantic Segmentation, Instance Segmentation. The name “Convolutional Neural Network” indicates that the network employs a mathematical operation called convolution, that is a specialized kind of linear operation. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers. Convolutional neural networks is computationally efficient because it uses convolution and pooling operations and performs parameter sharing (GOODFELLOW; BENGIO; COURVILLE, 2016).
The CNN architecture consists of mainly three types of layers, namely convolutional layers, pooling layers and fully connected layers (SINHA; PANDEY; PATTNAIK, 2018). An example of architecture can be seen in Figure 2.13 were each input image will pass it through a series of convolution layers with filters (kernels), pooling, Fully Connected Layers (FC) before applying a Softmax function to classify an object in the image, for example.
Conv Pool Conv Pool Fc Fc Softmax
Figure 2.13: CNN Architecture.
2.2.4.1 Convolution Layers
The Convolutional Layers of a Convolution Neural Network have the functionality to extract features from an input data through convolution operation. Convolution is a mathematical operation to merge two sets of information. In our case the convolution is applied on the input data using a convolution filter to produce a feature map.
parameters using weight sharing mechanisms and the correlation between neighboring pixels are made it easy due to local connectivity. These advantages leads researchers to replace fully connected layers to put forward the learning process (SZEGEDY et al., 2015;
OQUAB et al., 2015).
For any kind of neural network to be powerful, it needs to contain non-linearity. The CNN passes the result of the convolution operations through non linear functions to introduce non-linearity in CNN. There are non linear functions such as Hyperbolic Tangent (Tanh), sigmoid or Rectified Linear Unit (ReLU). In modern neural networks, the default recommendation is to use the ReLU (JARRETT et al., 2009; NAIR; HINTON, 2009;GLOROT; BENGIO, 2010) because it avoids and rectifies vanishing gradient problem.
In Figure 2.14 we show an example convolution operation in 2D image using a 3 × 3 filter whose image pixel values are 0, 1. In this convolution we have no padding and the stride is default value, which is 1. We perform the convolution operation by sliding this filter/kernel over the input. At every location, we do an element-wise matrix multiplication and sum the result. This sum goes into the feature map.
1
1
0
0
0
0
1
1
0
0
0
1
1
1
0
1
1
1
0
0
1
1
0
0
1
(a) Input0
1
1
1
0
0
0
1
1
(b) Filter/Kernel3
4
2
4
3
2
3
4
4
(c) Feature MapFigure 2.14: Example of Convolution.
2.2.4.2 Pooling Layers
The Pooling Layers functionality minimizes the measurements of feature maps and consequently the parameters of the network (SINHA; PANDEY; PATTNAIK, 2018), which shortens the training time and fights overfitting. It is usually applied after a convolution layer. Spatial pooling can be of different types, such as Max Pooling, Average Pooling and Sum Pooling.
In Figure 2.15 we show an example of max pooling using a 2 × 2 window and stride 2. Each color denotes a different window. Since both the window size and stride are 2, the windows are not overlapping.
6
5
1
1
8
7
4
2
2
1
2
3
4
3
1
0
(a) Input6
8
8
4
3
(b) ResultFigure 2.15: Example of Max Pooling.
2.2.4.3 Fully Connected Layers
After the Convolution and Pooling layers we add a set of Fully Connected Layers. The Fully Connected Layers are the final layers of CNN. These layers usually make up to 90% of the parameters and in the network we flattened our matrix into a vector and fed it into a fully connected layer like neural network (SINHA; PANDEY; PATTNAIK, 2018).
2.3
Evaluation metrics
Here we introduce the evaluation metrics we use to validate our work, we use the follow-ing metrics: Accuracy (Equation 2.12), Sensitivity (Equation 2.13), Specificity (Equation 2.14), and Dice Coefficient (Equation 2.15). The Accuracy is the proximity of measurement results to the true value, while Sensitivity indicates the correctness of regions that are certain type of tissue (necrosis, granulation or slough). Specificity illustrates the correctness of regions that are not certain type of tissue, and the Dice Coefficient quantifies how similar segmentation results are when compared to the ground truth or a gold standard.
Accuracy = T P + T N
Sensitivy = T P
T P + F N (2.13)
Specif icity = T N
T N + F P (2.14)
Dice Coef f icient = 2T P
2T P + F P + F N (2.15) In these equations, T P represents samples of true positives, T N represents samples of true negatives, F P denotes false positive samples and F N represents samples of false negatives.
3
Related Works
The related literature focus on the segmentation of ulcer images to estimate wounds area, to perform tissue classification and to assess the rate of wound healing. However, most of these works deal with images acquired under controlled conditions, and are either limited to the wound region or to specific types of wounds. These restrictions commonly make these techniques impractical for clinical use. We cannot compare their results based solely on the results showed here, since they used different databases for testing.
In this Chapter, we present related works grouped by the main objectives: wound segmentation, wound tissue classification and mobile applications.
3.1
Segmentation
Table 3.1 and 3.2 presents a summary of the works related to ulcer segmentation. Goyal et al. (GOYAL et al., 2017a) developed a new convolutional neural network architecture, DFUNet, for ulcer segmentation. The database consists of 397 foot images (292 images with ulcers and 105 images of the healthy foot), and they evaluated the DFUNet by comparing it with the GoogLeNet, Alexnet and LeNet. DFUNet achieved best results overall, with 92.5% of Accuracy, 91.1% of Specificity, 93.4% of Sensitivity, 93.9% of F-Measure and 96.1% of Area Under the Curve (AUC).
Gholami et al. (GHOLAMI et al., 2017) provides a proof-of-concept tool to segment chronic wounds and to control a robot to construct a bioprinter (MURPHY; ATALA, 2014). This work uses 26 images of 15 people, which had ground truth generated by a dermatologist. Several methods of segmentation have been used to determine the geometry of ulcers, including edge-detection and morphological operations, region-growing, Livewire, active contours, and texture segmentation. Livewire achieved the best performance with 97.08% of Accuracy, 96.67% of Specificity, 99.68% of Sensitivity, 96.22 of Jaccard Index, 98.15 of Says Similarity Coefficient, and 32.26 of Hausdorff Distance, in average.
Goyal et al. (GOYAL et al., 2017b) also developed another method for segmenting ulcers by using convolutional neural networks. The database consist of 600 foot images. They evaluated the FCN-AlexNet and the FCN-32s, FCN-16s, FCN-8s networks, which are based on the VGG-16 architecture. The FCN-8s achieved 87.3% of Dice, 99.9% of Specificity and 85.4% of Sensitivity, the FCN-16s achieved 89.7% of Dice, 98.8% of Specificity and 90.0% of Sensitivity, and the FCN-32s achieved 89.9% of Dice, 98.9% of Specificity and 90.4% of Sensitivity. The FCN-AlexNet achieved 86.9% of Dice, 98.5% of Specificity and 87.9% of Sensitivity.
Song and Sacan (SONG; SACAN, 2012) worked with 92 images, 78 of which are used for training and 14 for testing. They have developed an integrated technique to identify the wound, using K-means Clustering, Edge Detection, Thresholding and Region Growing, followed by the usage of a Multilayer Perceptron (MLP) or Radial Basis Function (RBF) (BROOMHEAD; LOWE, 1988). The MLP obtained 71.4% of correct prediction and the RBF
obtained 85.7% of correct prediction.
Table 3.1: Summary of current works on wound segmentation.
Addressing Methodology No. of Im-ages
Results
(GOYAL et al., 2017a)
New convolutional neural network archi-tecture 397 92.5% of Accuracy, 91.1% of Specificity, 93.4% of Sensi-tivity, 93.9% of F-Measure and 96.1% of AUC. (GHOLAMI et al., 2017) Edge-based methods: Livewire 26 97.08% of Accuracy, 96.67% of Specificity, 99.68% of Sen-sitivity, 96.22 of Jaccard In-dex, 98.15 of Dice Coeffi-cient, 32.26 of Hausdorff Dis-tance.
Table 3.2: Summary of current works on wound segmentation 2.
Addressing Methodology No. of Im-ages Results (GOYAL et al., 2017b) Convolutional neutal networks: FCN-AlexNet, FCN-32s, FCN-16s and FCN-8s 600 CN-8s achieved 87.3% of Dice, 99.9% of Specificity and 85.4% of Sensitivity. FCN-16s achieved 89.7% of Dice, 98.8% of Speci-ficity and 90.0% of Sensitiv-ity. FCN-32s achieved 89.9% of Dice, 98.9% of Speci-ficity and 90.4% of Sensitiv-ity. FCN-AlexNet achieved 86.9% of Dice, 98.5% of Specificity and 87.9% of Sen-sitivity.
(SONG; SACAN, 2012)
Neural Networks, K-means Cluster-ing, Edge Detection, Thresholding and Region Growing
92 71.4% of Accuracy in MLP, 85.7% of Accuracy in RBF.
3.2
Classification
Tables 3.3 and 3.4 present a summary of the work related to ulcer classification. Chakraborty et al. (CHAKRABORTY et al., 2016) developed a method for segmenting and classifying ulcers. The database consisted of 89 images, 34 of which were obtained by authors and 50 from the Medetec wound database. The methodology used employed color correction, noise filtering, color homogenization and wound segmentation through Fuzzy C-means; followed by tissue classification using Linear Discriminant Analysis (LDA). The overall Accuracy was 91.45%.
Chakraborty et al. (CHAKRABORTY; GUPTA; GHOSH, 2015) used a database with 50 images of five different types (20 pressure ulcers, 10 diabetic ulcers, 10 venous ulcer,
5 malignant ulcers and 5 pyoderma gangrenosum ulcers) with 560 × 401 pixels. The methodology used was comprised of color correction, noise filtering, color homogenization and wound segmentation through FCM clustering; followed by tissue classification using a Bayesian classifier. The overall Accuracy was 87.11%.
Fauzi et al. (FAUZI et al., 2015) used 80 images with 768 × 1024 pixels resolution, obtained under uncontrolled conditions. At least two health professionals generated the ground truth of images. The methodology consists of initially determining to which class the pixel under analysis belongs to from a distance-based probability map in the modified HSV color model. The tissue classification is then performed by filtering the false-positive regions, based on region growing or optimal thresholding techniques. The results had 75.1% of overall Accuracy.
Veredas et al. (VEREDAS et al., 2015) worked with 113 images took in a controlled environment at a distance between 30 and 40cm. The methodology consists of initially applying a media filter to reduce noise, then performing wound segmentation through a series of procedures using K-means clustering. For classification of tissues, three approaches were tested: SVM, feed-forward neural networks and random forests. The latter presented the best results, with 89.60% of Accuracy, 96.90% of Specificity and 82.30% of Sensitivity.
Mukherjee et al. (MUKHERJEE et al., 2014) applies the median filter with windows 5 × 5 for noise reduction during the pre-processing step. The wound segmentation is obtained through Fuzzy Divergence Based Thresholding and the tissue classification evaluated with the SVM and Bayesian classifier. The SVM with 3rd polynomial kernel obtained better result: 86.94% of Accuracy in the granulation tissue, 90.47% in the slough tissue and 75.53% in the necrotic tissue, 87.61% of overall Accuracy and 0.793 of Kappa statistic.
Wannous et al. (WANNOUS; LUCAS; TREUILLET, 2011) segmented ulcers and classified tissues into the following categories: granulation, slough and necrosis. The database consisted of 50 images obtained in hospitals. The ground truth was generated by health professional experts. The segmentation of wounds is obtained through CSC, EGBIS, mean shift and J-SEG techniques. The latter obtained the best results. The features used in the classification process were generated from color and texture information. SVM obtained better results, with 88% of Accuracy, 94% of Specificity and 77% of Sensitivity. In this last step K-NN, Fuzzy K-NN and K-Means were also tested.
Veredas et al. (VEREDAS; MESA; MORENTE, 2010) worked with 113 images taken in controlled environment in a distance from 30cm to 40cm, by a Canon EOS 40D camera. After applying a medium filter to reduce noise, wound segmentation is performed through
Hybrid Machine-Learning Architectures with neural networks, Bayesian classifiers, SVM and classifier committees. This approach obtained a performance of 91.5% of Accuracy, 94.7% of Specificity and 78.7% of Sensitivity.
Wannous et al. (WANNOUS; TREUILLET; LUCAS, 2007) used a dataset composed of 25 images with ground truth generated by specialists. After segmentation of ulcers, tissues were classified as granulation, slough or necrosis. Segmentation was performed by different techniques, such as CSC, mean shift and J-SEG. Classification of tissues is based on color and textural information via SVM classifier. Results had overlap scores of 69.2-78.8 granulation, 50.0-56.4 slough and 23.4-45.4 necrosis.
Finally, Galushka et al. (GALUSHKA et al., 2005) classified tissues into granulation, slough and necrosis. The methodology used case-based reasoning (CBR), with features from texture or color. Results had 58.3% of Accuracy and 0.20 of Kappa value from texture and 89.93% of Accuracy and 0.80 of Kappa value for color.
Table 3.3: Summary of current works on wound classification.
Addressing Methodology Images Results
(CHAKRABORTY et al., 2016)
Color correction, noise filtering, color homogenization, FCM clus-tering and Bayesian classifier
50 87.11% of overall Accuracy
(CHAKRABORTY; GUPTA; GHOSH, 2015)
Color correction, noise filtering, color homogenization, Fuzzy c-means and Linear discriminant analysis
50 91.45% of overall Accuracy
(FAUZI et al., 2015)
RYKW map, Region-growing and Optimal thresholding
80 75.1% of Accu-racy
(VEREDAS et al., 2015)
Median Filter, K-means cluster-ing, SVM, feed-forward neural net-works and random forest
113 89.60% acuracy, 96.90% of Speci-ficity, 82.30% of Sensitivity
Table 3.4: Summary of current works on wound classification 2.
Addressing Methodology No. of Images Results
(MUKHERJEE et al., 2014)
Medial filter, fuzzy diver-gence, SVM and bayesian classifier 74 87.61% of overall Accuracy, 0.793 of Kappa statis-tic (WANNOUS; LU-CAS; TREUILLET, 2011)
CSC, EGBIS, mean shift, J-SEG, K-NN, Fuzzy K-NN, K-Means and SVM 50 88% of Accuracy, 94% of Speci-ficity, 77% of Sen-sitivity (VEREDAS; MESA; MORENTE, 2010)
Mean shift, region grow-ing, neural networks and Bayesian classifiers 113 91.5% of Ac-curacy, 94.7% of Specificity, 78.7% of Sensi-tivity (WANNOUS; TREUILLET; LUCAS, 2007)
CSC, mean shift, J-SEG and SVM 25 69.2-78.8 of gran-ulation, 50.0-56.4 of slough, 23.4-45.4 of necrosis (GALUSHKA et al., 2005) Classification approach based on case-based reason-ing
Not avaliable 89.93% of Ac-curacy, 0.80 of Kappa value
3.3
Mobile Applications
Table 3.3 presents a summary of the work related to mobile applications. Wang et al. (WANG et al., 2017) presents a mobile application to segment and determine area of ulcers. This work used 100 pictures of 15 patients (mostly of feet soles), obtained by a capture box described in (WANG et al., 2015). The approach consists of a simple linear iterative clustering (SLIC) to segment image into a number of superpixels. For each superpixel in a given sample image, a feature descriptor is generated based on local color and texture information. Subsequently, a two-stage cascaded SVM-based wound boundary determination method
is applied based on extracted feature descriptors. Finally, a CRF-based algorithm was applied to refine the determined bound boundary. This approach achieves the overall performance of 73.3% of Sensitivity and 94.6% of Specificity.
Varma et al. (VARMA et al., 2016) presents a mobile application to segment the ulcers in the images obtained (the dataset is composed of 30 images), determining wound area and classifying the tissues. The methodology used consists of segmenting the ulcer via GrabCut and Otsu. In order to estimate the area they use the angle of view of the camera lens from any height x. The application, taking x as input, calculates the area in cm2
using trigonometric fundamentals. Ulcer tissue classification is performed via K-means with the K = 3.
Hettiarachchi et al. (HETTIARACHCHI et al., 2013) presents a mobile application to segment the wound and determine its area. In this work, the database consisted of 20 images. First they apply a Gaussian filter with a 31 × 31 window for posterior wound targeting. To achieve that, edges are detected from the snake formulation of an Active Contour Model. Next, the flood fill algorithm is applied to fill areas with colors of each tissue (granulation, necrotic and slough) and calculate the area of the wounds from image histogram. This methodology showed an overall Accuracy of 90%.
Table 3.5: Summary of current works on wound mobile application.
Addressing Methodology No. of Images Results
(WANG et al., 2017)
SLIC clustering, two-stage SVM-based and CRF-based algorithm 100 94.6% of Speci-ficity, 73.3% of Sensitivity (VARMA et al., 2016) Otsu thresholded, GrabCut, K-means clustering 30 Not available (HETTIARACHCHI et al., 2013) Gaussian filter,flood fill and snake formula-tion of the Active Con-tour Model
20 90% of Accuracy
The related works presented in this chapter can not be compared directly, because the dataset used in each work is private. Additionally, it is important to highlight that the
studies have specific acquisition protocols, so the results in normal clinical environments tend to be worse.
In general, the related works have small dataset, with less than 120 images and usually use classic techniques of Artificial Intelligence and Processing image, such as: K-means, C-means, K-NN, SVM, MLP, J-Seg, Otsu, GrabCut.
Therefore analyzing the related works we conclude that it is necessary to develop a work that uses a dataset of normal clinical working environment and can be used efficiently in the most different parts of the human body (eg lower and upper limbs), as well as to perform wound segmentation, tissue classification and the calculation of the area of the tissues present, working not only with a specific type of tissue but with a range of them (eg necrosis, granulation and slough).
4
Proposed Methodology
The methodology developed in this work aims to segment ulcers, classify tissues and calculate areas for the different tissue types. The flow of the proposed methodology can be seen in Figure 4.1. Initially, the obtained image is pre-processed to reduce noise. The result is then subjected to a semi-automatic procedure to perform wound segmentation. After that, the image is converted to the CIELab color model, has its color space reduced in order to speed up the processing that is done afterwards. The wound classification process is finally performed by classification wound tissues into granulation, necrosis and slough classes. Finally, the wound tissues areas are calculated based on the result of previous step.
Image
Acquisition
Pre-Processing
Segmentation
Reduction of
Color Space
Classification
Output
Area
Computation
4.1
Image Acquisition
In this step we define a specific protocol of image acquisition. This protocol has the objective to avoid the introduction of noise in the image (e.g., other parts of the human body that are not the wound), avoid adding other objects to the image that can complicate the following steps and avoiding the acquisition of background with colors similar to human skin. Additionally, this protocol aims to ensure equal conditions for all image acquisition with aim to avoid variations in light and color in the images that influence the process of wound segmentation and classification of wound tissues. The protocol steps can be seen below.
1. Take the photo with a distance of 30cm – 40cm from the wound;
2. Take the photo with a white or blue background to avoid adding unwanted objects to the background (e.g. parts of the human body or another object that is not the wound);
3. Take the picture without flash to avoid adding extra brightness (glare) in the image;
4. When taking the picture, place an object with a known size on the image at the same depth as the wound, in order to recover the scale of the pixels, for correctly calculate computing the area;
5. When taking the photo, use white lighting to avoid variation in the colors of the image;
6. When taking the photo, ensure that the entire wound is equally illuminated to avoid shadows in the image.
An example of image acquired by these protocol can be seen in Figure 4.2. In this figure we simulate an example of wound, because it was not possible to acquire images with real wounds.
Figure 4.2: Example of image acquired by the protocol of acquisition.
4.2
Pre-Processing
This stage aims to detect human skin and wounds and remove the rest. First, we convert image from RGB to HSV color system. We used the components H and S for detecting human skin, according to (LI, 2005). A pixel is detected as human skin when its color components are H ≤ 36 and 13 ≤ S ≤ 179. Then convert result image from HSV to RGB color system. Examples of this procedure can be seen in Figure 4.31.
(a) Input images
(b) Preprocessed images
Figure 4.3: Example of the pre-processing.
4.3
Segmentation
At this stage, our goal is to gather the region where the wound is and to eliminate the regions where there are no skin injuries. We use a classic image segmentation method, the Watershed algorithm (MEYER, 1992), which is a region growth algorithm, as described in Section 2.1.2.2. An example of results of each step of process can be seen in Figure 5.72.
(a) Input image (b) Selected points
(c) After segmentation (d) Region of interest
Figure 4.4: Example of the Watershed execution process.
4.4
Color Space Reduction
To reduce the dimensionality of image histogram in RGB format, we first quantify each component of RGB color model in 6 bits. Each pixel of the image is divided by 4 to go from 8 bits (256 possible colors) to 6 bits (64 possible colors). With this reduction, the histogram changes from 2563 possible colors to 643 possible colors, decreasing 16515072
possibilities.
The next step is to convert the image from the RGB format to the CIELab space, according to the equations previously presented in Section 2.1.1, in order to preserve the most representative colors of the histogram. We make this transformation, because the RGB color space is not perceptually uniform. In order to circumvent this problem, we use the CIELab color space described in Section 2.1.1.
Then occurrence of colors in the histogram is computed and those with at least 0.05% of occurrence are stored in a list of representative colors. The colors of image are analyzed and those not saved in the list, or those not judged as representative, will be transformed
into a closer representative color based on the Euclidean distance in the CIELab space (Equation 4.1), since it has better accuracy with the human visual system and presents
better results in calculating the distance between colors (CHENG et al., 2015).
DistCIELab(i, j) =
q
(Li− Lj)2+ (ai− aj)2+ (bi− bj)2 (4.1)
An example of color space reduction of an histogram can be seen in Figure 4.5, where it is possible to find the original image (a), the image with the reduced color space (c), as well as the points in the 3D CIELab color model space for both images (b and d).
(a) Image with 94312 distinct colors (b) CieLAB space of 94312 distinct colors
(c) Reduced image with 499 distinct colors dis-tintas
(d) CieLAB space of 499 distinct colors
Figure 4.5: Example of histogram reduction (slough=yellow, granulation=red, necrotic=blue).
4.5
Tissue Classification
At this stage, we aim to classify tissues present in wounds as Necrosis, Granulation or Slough. For this, we perform a color clustering using the Earth Mover’s Distance (EMD)
(RUBNER; TOMASI; GUIBAS, 2000). The EMD measures the dissimilarity between signatures that are compact representations of distribution, comparing two histograms and verifying how different these histograms are, as described in Section 2.1.3.
To use this technique we selected few images from the dataset to serve as the basis for the training, which consists of extracting representative patches for each tissue class. Then each patch is compared to the already extracted patches of the class under analysis. If the distance obtained from the EMD is greater than 5, this patch is included as belonging to that tissue class.
After the training is performed, the image to be classified is "swept" to extract patches. Each patch of the analyzed image is compared by EMD with the entire set of patches extracted in the training. If EMD result is equals to 0, the patch belongs to this tissue class. Otherwise, tissue classification is given by the smaller distance between the patch being analyzed and the training set patches.
An example of the classification process can be seen in Figure 4.6, where it is possible find the input image (a) that is the result of segmentation and color space reduction, as well the result of classification (b).
(a) Classification input
(b) Classification output
Figure 4.6: Example of classification.
4.6
Area Computation
At this stage our goal is to compute the area of necrosis, slough and granulation tissues. For that we need to introduce an object in the wound image that has a known size, as for example the object of Figure 4.7 that is a circle of radius 6mm. The purpose of introducing
this object is to retrieve the correct absolute scale.
Figure 4.7: Example of object to retrieve scale.
Then, we must determine the scale of the object introduced in the picture in relation to the real world. For this we will extract the object from the image, for this we will detect in the HSV color space. In the HSV color space we binarize the image based on the green color and we find the object based on its shape. Therefore, we find the radius value of the circle in pixels in the image and consequently we obtain the real world scale for the image.
Our goal is to compute the area occupied by each tissue type. For this we apply the Green formula (GREEN, 1828) in each of the class taken from the classification process. From the area found for each wound tissue, we retrieve the area in the real world based on scale obtained in the last step.
4.7
Prototype of Mobile Application
In this section we are describe the mobile prototype developed for evaluate the execution of approach presented in this work (using EMD to classification). We are show the design of the application, what was developed and the execution of prototype.