DEPARTMENT OF PHYSICS
CATARINA SANTOS FERNANDES FIDALGO PEREIRA BSc in Biomedical Engineering Sciences
RISK PREDICTION ANALYSIS FOR
POST-SURGICAL COMPLICATIONS IN CARDIOTHORACIC SURGERY
IMAGE ANALYSIS OF THE PATIENT-REPORTED PHOTOGRAPHS OF THE WOUND
MASTER IN BIOMEDICAL ENGINEERING
NOVA University Lisbon
DEPARTMENT OF PHYSICS
RISK PREDICTION ANALYSIS FOR POST-SURGICAL COMPLICATIONS IN CARDIOTHORACIC SURGERY
IMAGE ANALYSIS OF THE PATIENT-REPORTED PHOTOGRAPHS OF THE WOUND
CATARINA SANTOS FERNANDES FIDALGO PEREIRA BSc in Biomedical Engineering Sciences
Adviser: Federico Guede Fernández
Head of Data Science, Value for Health CoLAB
Co-adviser: Ricardo Nuno Vigário
Associate Professor, NOVA University Lisbon
Examination Committee Chair: Carla Maria Quintão Pereira
Auxiliar Professor, NOVA University Lisbon
Adviser: Federico Guede Fernández
Head of Data Science, Value for Health CoLAB
Member: José Manuel Matos Ribeiro da Fonseca
Associate Professor with Aggregation, NOVA University Lisbon
MASTER IN BIOMEDICAL ENGINEERING NOVA University Lisbon
Risk prediction analysis for post-surgical complications in cardiothoracic surgery Copyright © Catarina Santos Fernandes Fidalgo Pereira, NOVA School of Science and Technology, NOVA University Lisbon.
The NOVA School of Science and Technology and the NOVA University Lisbon have the right, perpetual and without geographical boundaries, to file and publish this dissertation through printed copies reproduced on paper or on digital form, or by any other means known or that may be invented, and to disseminate through scientific repositories and admit its copying and distribution for non-commercial, educational or research purposes, as long as credit is given to the author and editor.
To my beloved grandmother.
Acknowledgements
I would like to express my deepest gratitude to everyone that supported me through- out my academic years. First, I would like to thank my advisor Federico Guede for all his guidance and help, which was vital to complete this work. Next, I must also thank pro- fessor Ricardo Vigário, for helping the change of this work’s direction and instigating my critical spirit while developing this thesis. I am very grateful to Ana Londral for putting the trust in me to develop this work and integrating me in the Value for Health, CoLab team. With that said, I would like to thank to all of the VOH team for the knowledge shared during that time, it was a pleasure learning how a real team works.
I cannot begin to express my gratitude to all my family, but specially to my mom for being such a strong woman and to my sister for her unconditional love. Also, to my dad for always pushing me to become better. I would like to express my deepest thanks to my grandma, Fernanda, for being the biggest inspiration I could have had in my life, for showing me what kindness is and for giving everything until the end.
I would like to pay my special regards to my best friend, Maria, for being always there for me after so many ups and downs. Next, to my boyfriend, Tiago, for believing in me and giving me so much love and support during this challenging times. To all my friends, Mara, Maria, Melo, Diogo, Coutinho, André, thank you for putting up with me during all this time, I could have not chosen a betterOhanato spend my college years with. Also, I would like to thank Raquel for being one of the five most important people around her and last but not least to my number one rock, Inês, for sharing so much as I do.
“Everything is hard before it is easy.”
(Johann Wolfgang von Goethe)
Abstract
Cardiothoracic surgery patients have the risk of developing surgical site infections (SSIs), which causes hospital readmissions, increases healthcare costs and may lead to mortality. The first 30 days after hospital discharge are crucial for preventing these kind of infections. As an alternative to a hospital-based diagnosis, an automatic digital monitoring system can help with the early detection of SSIs by analyzing daily images of patient’s wounds. However, analyzing a wound automatically is one of the biggest challenges in medical image analysis.
The proposed system is integrated into a research project called CardioFollowAI, which developed a digital telemonitoring service to follow-up the recovery of cardiotho- racic surgery patients. This present work aims to tackle the problem of SSIs by predicting the existence of worrying alterations in wound images taken by patients, with the help of machine learning and deep learning algorithms. The developed system is divided into a segmentation model which detects the wound region area and categorizes the wound type, and a classification model which predicts the occurrence of alterations in the wounds.
The dataset consists of 1337 images with chest wounds (WC), drainage wounds (WD) and leg wounds (WL) from 34 cardiothoracic surgery patients. For segmenting the im- ages, an architecture with a Mobilenet encoder and an Unet decoder was used to obtain the regions of interest (ROI) and attribute the wound class. The following model was divided into three sub-classifiers for each wound type, in order to improve the model’s performance. Color and textural features were extracted from the wound’s ROIs to feed one of the three machine learning classifiers (random Forest, support vector machine and K-nearest neighbors), that predict the final output.
The segmentation model achieved a final mean IoU of 89.9%, a dice coefficient of 94.6% and a mean average precision of 90.1%, showing good results. As for the algo- rithms that performed classification, the WL classifier exhibited the best results with a 87.6% recall and 52.6% precision, while WC classifier achieved a 71.4% recall and 36.0%
precision. The WD had the worst performance with a 68.4% recall and 33.2% precision.
The obtained results demonstrate the feasibility of this solution, which can be a start for preventing SSIs through image analysis with artificial intelligence.
Keywords: Machine learning, Deep learning, Wound segmentation, Wound classifica- tion, Surgical site infections, Cardiothoracic surgery
xii
Resumo
Os pacientes submetidos a uma cirurgia cardiotorácica tem o risco de desenvolver infeções no local da ferida cirúrgica, o que pode consequentemente levar a readmissões hospitalares, ao aumento dos custos na saúde e à mortalidade. Os primeiros 30 dias após a alta hospitalar são cruciais na prevenção destas infecções. Assim, como alternativa ao diagnóstico no hospital, a utilização diária de um sistema digital e automático de monotorização em imagens de feridas cirúrgicas pode ajudar na precoce deteção destas infeções. No entanto, a análise automática de feridas é um dos grandes desafios em análise de imagens médicas.
O sistema proposto integra um projeto de investigação designado CardioFollow.AI, que desenvolveu um serviço digital de telemonitorização para realizar ofollow-upda re- cuperação dos pacientes de cirurgia cardiotorácica. Neste trabalho, o problema da infeção de feridas cirúrgicas é abordado, através da deteção de alterações preocupantes na ferida com ajuda de algoritmos de aprendizagem automática. O sistema desenvolvido divide-se num modelo de segmentação, que deteta a região da ferida e a categoriza consoante o seu tipo, e num modelo de classificação que prevê a existência de alterações na ferida.
O conjunto de dados consistiu em 1337 imagens de feridas do peito (WC), feridas dos tubos de drenagem (WD) e feridas da perna (WL), provenientes de 34 pacientes de cirurgia cardiotorácica. A segmentação de imagem foi realizada através da combinação de Mobilenet como codificador e Unet como decodificador, de forma a obter-se as regiões de interesse e atribuir a classe da ferida. O modelo seguinte foi dividido em três sub- classificadores para cada tipo de ferida, de forma a melhorar a performance do modelo.
Caraterísticas de cor e textura foram extraídas da região da ferida para serem introduzidas num dos modelos de aprendizagem automática de forma a prever a classificação final (Random Forest,Support Vector MachineandK-Nearest Neighbors).
O modelo de segmentação demonstrou bons resultados ao obter umIoUmédio final de 89.9%, umdicede 94.6% e uma média de precisão de 90.1%. Relativamente aos algo- ritmos que realizaram a classificação, o classificador WL exibiu os melhores resultados com 87.6% derecalle 62.6% de precisão, enquanto o classificador das WC conseguiu um
recall de 71.4% e 36.0% de precisão. Por fim, o classificador das WD teve a pior perfor- mance com umrecallde 68.4% e 33.2% de precisão. Os resultados obtidos demonstram a viabilidade desta solução, que constitui o início da prevenção de infeções em feridas cirúrgica a partir da análise de imagem, com recurso a inteligência artificial.
Palavras-chave: Aprendizagem automática, Deep learning, Segmentação de feridas, Classificação de feridas, Infeções no local cirúrgico, Cirurgia cardio- torácica
xiv
Contents
List of Figures xix
List of Tables xxi
Glossary xxiii
Abbreviations xxv
1 Introduction 1
1.1 Context . . . 1
1.2 Objectives . . . 3
1.3 Document structure . . . 3
2 Theoretical concepts 5 2.1 Cardiothoracic surgery . . . 5
2.2 Infection . . . 6
2.3 Surgical site infection . . . 7
2.3.1 SSI classification . . . 7
2.3.2 Physiopathology of SSIs . . . 8
2.3.3 Risk factors for SSIs . . . 8
2.4 Medical image analysis . . . 9
2.5 Image analysis for segmentation . . . 10
2.5.1 Edge or boundary-based methods . . . 11
2.5.2 Region-based methods . . . 12
2.5.3 Clustering . . . 13
2.5.4 Thresholding . . . 13
2.6 Artificial intelligence methods . . . 14
2.6.1 Machine Learning . . . 14
2.6.2 Deep learning . . . 20
2.7 Evaluation metrics . . . 25
2.7.1 Metrics for segmentation . . . 25
2.7.2 Metrics for classification . . . 26
3 Literature review 29 3.1 Wound image segmentation . . . 29
3.1.1 Traditional methods for wound segmentation . . . 30
3.1.2 Deep learning methods for wound segmentation . . . 32
3.2 Wound image classification . . . 35
3.2.1 Traditional machine learning methods for wound classification . 35 3.2.2 Deep learning methods for wound classification . . . 36
3.2.3 Hybrid methods . . . 37
3.2.4 Similar studies . . . 37
4 Methodology 39 4.1 Dataset . . . 39
4.1.1 Data annotation . . . 41
4.2 Proposed pipeline . . . 42
4.3 Pre-processing . . . 43
4.3.1 Color correction . . . 43
4.3.2 Noise removal . . . 44
4.4 Wound segmentation . . . 44
4.4.1 Model architectures . . . 45
4.4.2 Model selection and evaluation . . . 47
4.4.3 Addressing class imbalance . . . 48
4.4.4 Model training . . . 49
4.4.5 Data augmentation . . . 50
4.4.6 Post-processing . . . 51
4.4.7 Final evaluation . . . 51
4.5 Wound classification . . . 52
4.5.1 Pre-processing . . . 52
4.5.2 Feature extraction . . . 53
4.5.3 Classification . . . 54
5 Results and discussion 59 5.1 Wound segmentation . . . 59
5.1.1 Model selection . . . 59
5.1.2 Hyperparameter grid search . . . 61
5.1.3 Data augmentation . . . 62
5.1.4 Final results . . . 63
5.2 Wound classification . . . 64
5.2.1 Adressing class imbalance . . . 65
5.2.2 Hyperparameter search and model selection . . . 67 xvi
5.2.3 Model evaluation . . . 68
5.2.4 Conclusions . . . 69
6 Conclusion 71 6.1 Conclusion . . . 71
6.2 Limitations . . . 72
6.3 Future work . . . 72
Bibliography 75 Appendices A Wound segmentation 89 A.1 Initial model elimination . . . 89
B Equations for feature extraction 91 B.1 Color features . . . 91
B.2 Textural features . . . 91
B.2.1 First order statistics features . . . 91
B.2.2 GLCM features . . . 92
List of Figures
2.1 Median sternotomy incision . . . 6
2.2 Example of two SSIs from the dataset . . . 7
2.3 Different operators for edge-based approaches . . . 12
2.5 Graphic representation of sigmoid function . . . 17
2.6 Typical CNN architecture . . . 21
2.7 Convolution operation . . . 21
2.8 Example of three types of pooling operations . . . 22
2.9 Representation of fine-tuning approaches for transfer learning . . . 24
2.10 Example of an auto-encoder structure . . . 25
4.1 Example of the wound types present in the dataset. . . 40
4.2 Example of the evolution of a patient’s WC . . . 40
4.3 Reduction of the dataset’s dimensions . . . 41
4.4 Example of an annotation made withLabelmetool on a WC . . . 41
4.5 Proposed architecture for the developed system . . . 42
4.6 Example of a result from the pre-processing step . . . 43
4.7 Structure of a residual block present in ResNet architectures . . . 45
4.8 Illustration of the SegNet framwork . . . 46
4.9 Illustration of the Unet framework . . . 46
4.10 Representation of the dataset split . . . 47
4.11 Representation of the nested cross-validation . . . 47
4.12 Total wound distribution . . . 48
4.13 Example of a final output image from the segmentation model . . . 51
5.1 Example of training and loss curves for the proposed architectures . . . 60
5.3 Segmentation mask before and after the post-processing step . . . 63
5.4 Class distribution over the three wound types, before and after the approaches to address class imbalance . . . 66
List of Tables
2.1 Patient and operative risk factors for SSIs . . . 9 2.2 Approaches to define the distance between instances (x and y) in the KNN
algorithm . . . 18 4.1 Hyper-parameter grid search values for MobileNet-Unet . . . 50 4.2 Possible hyperparameter values for the proposed ML algorithms . . . 56 5.1 Performance of the DL segmentation models for the evaluation metrics, mean
IoU and dice coefficient . . . 59 5.2 Comparison of the MobileNet-Unet performance with the various hyperpa-
rameters combinations for 40, 50 and 60 epochs . . . 61 5.3 Performance comparison of the dataset with no augmentation and with the
three different types of augmentations . . . 63 5.4 Final evaluation of the segmentation model . . . 64 5.5 Models performance for the oversampling techniques regarding both opti-
mization metrics and their respective score . . . 66 5.6 Results of the hyperparameter search and the performance achieved by the
best models with the corresponding number of feature components . . . . 67 5.7 Final evaluation metrics for the best optimized algorithm regarding F1 and F2
optimization . . . 68 A.1 Model architectures used for training the reduced dataset . . . 89 A.2 Results of the mentioned models with a training set of five images and test set
of two images . . . 89 B.1 Parameters of FOS . . . 92 B.2 GLCM notation . . . 94
Glossary
CIE RGB A RGB color space, differentiated by a set of monochromatic (single- wavelength) primary colors.
HED A color space, Haematoxylin-Eosin-DAB, designed to analyze specific tissues in the medical field.
HSV A cylindrical color space based on how humans perceive light, where H cor- responds to Hue, S to saturation and V to value (brightness).
LAB A color space designed in a perceptually uniform space that represents color on three axis,Lfor lightness (luminance) andaandbfor the color components (red, green, blue and yellow).
LUV Uniform color space defined by CIE in 1976, where L stands for luminance, whereas U and V represent chromatic values of color images.
RGB A color space with three channels, red, green and blue, based on the human eye sensitive receptors, in which every color is decomposed into combinations of the primary colors.
XYZ A color space based on the mathematical limit of human vision in terms of color, where the X, Y, Z channels are extrapolated from the R, G, B channels. X (0.412R+0.358G+0.180B) corresponds to a mix of cone response curves chosen to be orthogonal to luminance and near the red channel, Y (0.213R+0.715G+0.072B) represents luminance and Z (0.019R+0.119G+0.950B) corresponds to a channel near the blue channel.
YCbCr Color space used to compress the digital color information involved in video transmission. The Y (65.48R+128.55G+24.97B+16) channel represents lumi- nance and the Cb (-37.78R-74.16G+111.93B+128) and Cr (111.96R-93.75G- 18.21B+128) represent two chromas, blue difference and red-difference, re- spectively.
YDbDr Color space similar to YCbCr but with different coefficients in the chroma channels, used in analog color television broadcasting, where Y (0.299R+0.587G+0.114B) stands for luminance and Db (-0.450R- 0.883G+1.133B) and Dr (-1.333R+1.116G+0.217B) for chrominance.
YIQ Color space similar to YCbCr and YDbDr, where brightness is represented by Y (0.299R+0.587G+0.114B) and chrominance by I (0.595R-0.274G-0.321G) and Q (0.211R-0.522G+0.311B), which carry color information along lumi- nance and are derived by rotating the UV vector by 33º.
YPbPr Color space used in video electronics which is a gamma correction of the YCbCr color space. Y (0.212R+0.701G+0.0865B) represents luminance, Pb (-0.116R-0.383G-0.500B) and Pr (0.500R-0.445G-0.055B) correspond to the difference between blue and luminance and red and luminance.
YUV Color space that reduces the bandwidth for chroma channels for encoding color images or videos. Y (0.299R+0.587G+0.114B) is the luminance compo- nent, linear-space brightness, and U (-0.147R-0.288G+0.436B) and V (0.615R- 0.515G-0.100B) are the blue and red projection, respectively.
xxiv
Abbreviations
ACM Active Contour Model
Adam Adaptive Moment Estimation
AHRF Associative Hierarchical Random Fields AI Artificial Intelligence
ANN Artificial Neural Networks AUC Area Under the Curve
CABG Coronary Artery Bypass Grafting CNN Convolutional Neural Network
DL Deep Learning
DT Decision Tree
FCN Fully Convolution Neural Network FN False Negative
FOS First Order Statistics FP False Positive
GLCM Grey Level Co-occurrence Matrix HOG Histogram of Oriented Gradients IoU Intersection over Union
KNN K-Nearest Neighbors LBP Local Binary Pattern
LDA Linear Discriminant Analysis LREG Logistic Regression
ML Machine Learning MLP Multi-Layer Perceptron
NB Naive Bayes
PCA Principal Component Analysis RBF Radial Basis Function
ReLu Rectified Linear unit
RF Random Forest
ROI Region of Interest
SGD Stochastic Gradient Descent SIFT Scale-Invariant Feature Transform
SMOTE Synthetic Minority Oversampling Technique SSI Surgical Site Infection
SVM Support Vector Machine TN True Negative
TP True Positive
WC Chest Wound
WD Drainage Wound
WL Leg Wound
xxvi
1
Introduction
1.1 Context
The incidence and prevalence of cardiothoracic diseases is increasing globally and it is estimated that every year 2 million open heart surgeries are performed [2]. Data prior to 2017 shows that Portugal is the 10thcountry of Europe with the highest number of cardiothoracic surgeries per million of habitants [3]. These high amount of surgeries reflect the fact that heart diseases are the leading cause of death in Portugal, however, life expectancy was able to increase over the years with the help of medicine and technology.
A good example is the post-surgical digital follow-up, which can be crucial to the recovery of cardiac surgery patients since hospital readmission rates are 15-20% in the first 30 days and 30% considering the first year. A proper follow-up after cardiothoracic surgery allows for earlier hospital discharges and the improvement of patient experience and recovery at home. Also, this became noticeable important with the rise of COVID-19, since the necessary interventions by the medical team can be maintained at distance resulting in the reduction of the transmission risk during hospital appointments.
This master’s dissertation is inserted in CardioFollow.AI, a research project funded by Fundação para a Ciência e Tecnologia (FCT), that designed and implemented a post- surgical digital telemonitoring service for cardiothoracic surgery patients. This research project emerged from the need of transformation in patient care, in which the main goals are to study the impact of daily telemonitoring in early diagnosis, to reduce hospital read- missions and to improve patient safeness. Other secondary objectives are the development of a computer aided diagnosis system to early detect possible complications and to assess the costs of a telemonitoring service. The remote telemonitoring allows early hospital discharges, which can be very beneficial in terms of patient safety, reduces health costs and allows more available beds for hospitals. The project is currently being developed by multi-domain expertise researchers from Value for Health CoLAB, Fraunhofer-AICOS, Nova Medical School-UNL and Hospital de Santa Marta-CHULC. The service surveils patients, from the cardiothoracic surgery service of Hospital de Santa Marta, during a 30-day period after hospital discharge. This remote follow-up involves a monitoring
C H A P T E R 1 . I N T R O D U C T I O N
digital health kit (DHK) which includes a sphygmomanometer, a scale, a smartwatch, and a smartphone, allowing the daily collection of data by patients. The kit enables the measurement of weight, pulse rate, blood pressure, number of steps, the occurrence of palpitations, edema, pain, fatigue, dyspnea, syncope and also a picture of the surgical wound. However, to be eligible for this project, patients need to know how to use a smart- phone and to have some health literacy, that was also improved in an individual session simulating the daily routine.
Every day, nurses monitor the data in a web-platform in which if an unanticipated alteration occurs, they can intervene upon the incidence. As the number of patients increases, the amount of time spent in the analysis of the collected data becomes enor- mous for healthcare professionals who have a busy schedule. Consequently, this problem demanded an easy system to manage the data, which included a summary of the pa- tients’ health statuses and risk alerts, for when the measured outcomes are out of the clinically acceptable values. Later, the system evolved by supporting access to data, the insertion of notes, and the change of the course of treatment. Artificial intelligence (AI) automatizes some of this processes by calculating the risk indicators and consequently saving time spent by healthcare professionals. However, alerts for visible changes on the surgical wounds’ healing progress are not yet automatic as it is a challenging problem.
Motivated by this, the present master’s dissertation fits into one of the modules from the CardioFollow.AI project, regarding the automation of surgical wound analysis.
Cardiothoracic surgery patients have a substantial risk of developing surgical site infections (SSIs). These infections cause an increase of morbidity, mortality and costs, prolong hospital stay and the need for other surgical procedures [4–6]. SSIs are often detected after patients get discharged from the hospital. Hence, they require an early diagnosis and treatment to prevent further complications. Given that the length of hos- pitalization is getting shorter, post-discharge surveillance with feedback of information has already proved as an essential way of reducing and treating SSIs [7]. Recently, the advances in machine learning (ML) and deep learning (DL) algorithms increased the num- ber of studies on the topic of medical image analysis, which has many applications such as segmentation, location, classification and detection. One of the top developed areas is wound analysis, regarding pressure ulcers, skin lesions and burns. However, while most studies have focused on these wound types, few have developed systems to automatically analyze post-operative wounds, more specifically cardiothoracic surgical sites. This evi- dences the gap in the literature regarding cardiothoracic surgical sites, which is the main focus of this dissertation. Despite all the advances made in this area, the examination of post-surgical wounds remains a challenging task.
2
1 . 2 . O B J E C T I V E S
1.2 Objectives
This work aims to automatize the analysis of images with post-operative wounds by developing a risk prediction model with the help of ML and DL algorithms. The proposed system’s goal is to early detect worrying wound alterations to prevent further infections, allowing the intervention of the clinical team to initiate an early response and treatment.
Furthermore, it can reduce the workload of clinicians and prevent possible human errors.
The expected result is supposed to integrate the previously developed web-telemonitoring platform, by having an alarmist system when an alteration arises.
To achieve the established objectives of this dissertation, the following sub-objectives were defined:
1. Labelling and annotation of photographs from the given database with an annota- tion software
2. Development of a wound segmentation model
3. Development of a wound alteration classification model 4. Integration of both models
The development of this work was based on the combination of DL and conventional ML algorithms to create the mentioned segmentation and classification models. Image segmentation is a crucial step prior to classification, because for analyzing the local signs present in the surgical wounds, the image needs to be segmented in order to obtain only the region of interest (ROI). As so, several approaches were considered and DL models were evaluated to select the best architecture for the problem. The classification model predicts the final output, considering extracted features based on known characteris- tics present in wound alterations. A comparative analysis of several conventional ML algorithms was done to evaluate their performance. Lastly, the integration of both mod- els creates a system where an RGB input image can be segmented and classified into a non-wound alteration or wound alteration.
1.3 Document structure
This master’s dissertation document is organized into six chapters: Introduction, Theo- retical concepts, Literature review, Methodology, Results and Discussion, and Conclusion.
Chapter 1 gives context to the proposed problem and the main objectives of this work.
In Chapter 2, the main theoretical concepts are briefly described to understand the fun- damental basis of this work. The following chapter, describes the dataset and gives a depth analysis of the development and implementation steps of the system. Relatively to chapter 5, it reports and discusses the finding results. Finally, the main conclusions are presented along with the exposure of this work’s limitations and future work proposals.
2
Theoretical concepts
This chapter describes the theoretical concepts needed to understand this disserta- tion’s work. It can be divided into two groups, medical concepts and medical image analysis. In the first, cardiothoracic surgery and the biology behind infection is presented.
Then, regarding medical image analysis, segmentation and classification methods are explained alongside the theory behind these approaches.
2.1 Cardiothoracic surgery
Cardiothoracic surgery is one of the most challenging areas of surgery, especially be- cause heart related diseases are a major cause of illness and death. It involves the surgical treatment of diseases or traumatic injuries of the heart, lungs, and other associated struc- tures. It can be divided into cardiovascular and pulmonary surgery and can be performed with multiple types of procedures: open, endoscopic (laparoscopic or thoracoscopic) and robotic [8].
Open heart surgery requires cutting the chest open to have direct access to the organs inside the thoracic cavity. Coronary artery bypass grafting (CABG) is the most common type of heart surgery, in which saphenous vein grafts (located on the leg) are a frequently used conduit for coronary revascularization [9]. Furthermore, open heart surgery is also performed to repair heart valves and areas of the heart, implant medical devices or heart transplantation [10]. The procedure consists of first performing a median sternotomy, which is an incision made in the thorax as illustrated in Figure 2.1.
Sternotomy is a safe and efficient technique used for the surgical treatment of all congenital and acquired heart diseases. A vertical and median incision (12 to 18 cm) between the sternal notch and the tip of the xiphoid process is performed in the thorax.
In a CABG surgery, the incision needs to be extended into the upper part of the linea alba.
After this step, the patient’s sternum is split with a sternal saw and aFinochiettoretractor is introduced to separate the ribs and make the heart visible [11]. Using the fingers, the fat anterior to the pericardium and the anterior extensions of the parietal pleura are swept laterally. Lastly, after finishing the needed procedure the surgeon closes the sternum with
C H A P T E R 2 . T H E O R E T I C A L C O N C E P T S
stainless wires and the incision with absorbable runing suture or with clips. The sternal closure is extremely important since the rapprochement of the two sternal halves has to facilitate bone healing and to avoid instability, which is a risk factor for wound infection [12]. After the surgery, thoracic chest drains may be inserted between the ribs and the chest cavity to drain any accumulated blood, fluid or air from the surgery. There can be as many as four chest tubes, but typically is two or three [13] .
Figure 2.1: Median sternotomy incision [14].
2.2 Infection
Infection is defined as “invasion and multiplication of microorganisms in body tissues, which may be clinically inapparent or result in local cellular injury because of competitive metabolism, toxins, intracellular replication, or antigen-antibody response” [15]. The process of wound infection is complex and entails biological pathways at molecular levels.
Bacteria, fungi or viruses enter through damage skin or local circulation and start proliferating which provokes inflammation, triggering multiple physiologic cascades which protect the body and initiate the healing process [16]. Further down, the local signs of wound infection appear when the immune response to the microbial invasion increases. Purulent discharge, erythema, swelling, edema, elevated temperature, malodor and pain are signs of local infection.
Concerning the extension of the infection, when pathogens proliferate beyond the wound and reach high concentrations a major disturbance is caused [17]. If it spreads to other local structures, deeper tissue, surrounding tissue, muscle, fascia and local organs it may lead to systemic sepsis and ultimately death [16].
6
2 . 3 . S U R G I C A L S I T E I N F E C T I O N
2.3 Surgical site infection
Post-operative wound infection is a common healthcare problem. A wound turns into a SSI when it occurs within 30 days after surgery (or up to one year if the patient received an implant); only includes the skin, subcutaneous tissues, deep layers or distant organs and has purulent drainage or organisms isolated from the wound site.
It has been estimated that SSIs occur in 3% to 20% of surgical procedures, however this rate varies for each specific procedure and may increase if the number of risk factors present is high [18]. Also, the incidence of SSIs after cardiac surgery varies from studies since the results are often about specific procedures. The prevalence of wound infections after cardiac surgery including both sternal wound and donor site infections range from 1.3% to 12.8% of patients.
These infections contribute to extra morbidity and mortality, prolong hospitalization and increase the healthcare costs [7]. For this reason, the prevention of cardiothoracic surgical wound infections is a main component of quality assurance and hospital cost- containment [6]. Several studies have proven that surveillance and feedback of wound infections reduces the risk of these infections.
2.3.1 SSI classification
SSIs are classified based on the depth of the involvement of infection as follows [19]:
• Superficial incisional SSI -inflammation of the skin and subcutaneous tissue of the incision. The local signs (redness, fluid collection, disunion) are apparent and require local disinfection and oral antibiotics treatment. However, it may progress to a deep SSI. Two examples of superficial SSIs are illustrated in Figure 2.2
Figure 2.2: Example of two SSIs from the dataset, in which both wounds exhibit disunion of the suture’s borders.
C H A P T E R 2 . T H E O R E T I C A L C O N C E P T S
• Deep incisional SSI -involving deep soft tissues of the incision, such as the fascial and muscular layers. Deep SSIs include sternal osteitis, mediastinitis and endo- carditis. It must meet at least one of the following criteria [20]:
– Positive culture of tissue samples or mediastinal fluid
– Typical presentation of mediastinitis on revision surgery or anatomopatholog- ical examination.
– Presence of one of the following elements: fever superior to 38ºC, thoracic pain or sternal instability, with either pus in the mediastinum or positive culture of preoperative samples or of hemoculture
• Organ/space SSI -involving any part of the anatomy that is not the incision The incisions can also be divided into primary or secondary in cases with more than one incision. For example, in CABG the primary incision is the chest zone and the sec- ondary incision involves the leg incision for donor site.
2.3.2 Physiopathology of SSIs
Most SSIs develop due to microbial contamination of the surgical site, which may come from endogenous or exogenous flora [4]. The most common pathogens that cause postoperative infections areCoagulase-negative staphylococciandStaphylococcus aureus[7].
SSIs are often due to the endogenous flora, which includes the patient’s skin, mu- cous membranes, and hollow viscera. Exogenous flora surges from the operating room, including air, instruments, and the surgical team [21].
2.3.3 Risk factors for SSIs
There are a variety of patient-related and operative factors, presented in Table 2.1, that increase the risk of developing a post-operative wound infection, especially when a clean and technically good operating technique is performed [22]. Hence, the most effective way to reduce the risk of an SSI is giving proper peri-operative antibiotics and special attention to additional risk factors.
8
2 . 4 . M E D I C A L I M AG E A N A LY S I S
Table 2.1: Patient and operative risk factors for SSIs [7, 23].
Risk factors
Patient
Age
Nutritional status Pre-existing illness Diabetes mellitus Smoking Obesity
Coexistent infections at a remote site Colonization with microorganisms Immunosupressed patients
Operative
Preoperative stay Length of operation Glove punctures Emergency procedures Airborne contamination
2.4 Medical image analysis
Medical image analysis had a big evolution in the past 20 years, solving clinical prob- lems by analyzing several types of medical images [24]. It has several applications such as segmentation, location, classification and detection.
The advances in the field of biomedical engineering due to the application of ML techniques in medical image analysis, made this one of the top researched and devel- oped areas [25]. ML algorithms help the extraction of information in an effective and efficient way and became more utilized with the increase of digital medical records and diagnostic imaging [26, 27]. However, it was not until the advances in DL with the appearance of convolutional neural networks (CNNs) that automatic medical image anal- ysis widespread. These major developments allowed for the extraction of deep features learned automatically from a set of data instead of hand-crafted features [25]. All of the success in CNNs performance came from the high technology advancements of central processing units (CPUs) and graphics processing units (GPUs), the availability of large, labelled data samples and the developments in learning algorithms [27, 28].
Medical imaging is the process that provides visual information of human body, that allows for early detection, diagnosis, treatment of diseases and also surveillance [25].
There are several types of medical images, such as computed tomography (CT), mag- netic resonance imaging (MRI), positron emission tomography (PET), ultrasound, X-Ray, lesion photographs, histology, dermoscopy images and wound photographs [27, 28]. Pho- tographs of wounds are a great non-invasive technique that allows a qualitative assess- ment of the symptoms at the lesion’s site. These images require a medical interpretation that is performed by human experts such as radiologists and clinicians. However, even with their expertise, the benefits of computer assisted interventions surfaced due to the limitations of manually annotated data. The several limitations are, the wide variation
C H A P T E R 2 . T H E O R E T I C A L C O N C E P T S
of pathologies, potential fatigue of clinicians, the high amount of time spent in the task and the reduced number of professionals who can evaluate medical images. These mul- tiple problems of manual analysis could lead to wrong or delayed diagnosis that can be more efficient and accurate with the introduction of computer aided detection (CADx) and computer aided diagnosis (CAD) [25, 26]. On the other hand, this type of images are difficult to collect, and the data must be labeled for proper application in the ML algorithms. Therefore, it is a challenge to apply DL in a limited well-labeled dataset without overfitting the model. Another crucial challenge is the high-accuracy required that algorithms need to achieve, because it can directly affect the clinical diagnosis and/or treatment [28].
2.5 Image analysis for segmentation
Image segmentation is an essential step of image analysis, that divides an image into non-overlapping regions with similar properties such as grey level, color, texture, brightness, contrast and others [29–33]. The results produced by segmentation simplify an image into an easily analyzable way which can determine the final accuracy for object analysis. Image segmentation, formally, can be defined with mathematical tools and terms as follows [29, 30, 34]:
IfP()is a uniformity predication defined on groups of connected pixels, segmentation is a partition ofSinto disjoint non-empty subsets or regionsS1, S2, ..., SN such that
• Sn
i=1Si=SwithS1∩Sj=∅(i,j) (the union of allSi is equivalent to S)
The first condition involves that the segmentation algorithm cannot be over until all the points are processed because each image point has to belong to a region. However, two segmented regions should not intersect with one another, meaning a single pixel can only belong to a specific region.
• P(Si) =truefor for all regionsSi(i= 1,2, . . . N)
The pixels belonging to the same segmented region should have similar properties such as color, texture, among others.
• P(Si∩Sj) =f alse, i,j(the predication of the union of setsSi andSj is false wheni is not equal toj
This condition indicates that pixels from different regions should have several discernible characteristics
• Fori= 1,2, . . . , N , Siis a connected component.
Various segmentation methods give solutions to distinct imaging applications since it does not exist a perfect method. They are widely used in medical applications to segment
10
2 . 5 . I M AG E A N A LY S I S F O R S E G M E N TAT I O N
tissues, body organs, wounds and in other domains where segmentation is important to surveillance [35]. Segmentation algorithms can be automatic or semi-automatic, based on the need of user interactions. User interactions can improve the quality of segmentation, especially in complex problems or when the available amount of data is scarce and not generalized [35].
Image segmentation can be performed with methods based on traditional algorithms or DL. Both methods have their limitations and are more appropriate for different sce- narios. Traditional methods are popularly used but they are dependent on the quality of feature extraction by domain experts that can miss non apparent features for segmenta- tion. While DL extracts features automatically but needs a large dataset to achieve a good result [35, 36].
Generally, traditional image segmentation approaches are categorized on two proper- ties: discontinuity and similarity [33, 37]:
• Edge or boundary-based methods(detect discontinuities)
• Region-based methods(detect similarities)
These approaches will be described in the subsections below.
2.5.1 Edge or boundary-based methods
Edges are a sign of discontinuity and can be described by the boundary between two distinct regions. In edge-based methods an image is divided when an abrupt local intensity change occurs, where edge detection algorithms find the pixels belonging to a region boundary, linking them together to form closed object boundaries. This type of segmentation works well on images with good contrast between the object and the background.
Edge based segmentation locates the edges where [38]:
1. The first derivative of intensity is superior in magnitude than a particular threshold or
2. The second derivative of intensity has a zero crossing
In these edge-based approaches, all edges are identified and connected together to form the required object boundaries [32]. One of the most popular edge techniques is the application of edge detectors by mask convolutions, such as Roberts, Sobel, Prewitt (1st derivative type), Canny and Laplacian (2nd derivative type). Figure 2.3 represents the masks applied with the different edge operators. For the Prewitth detector,d=1, for the Sobel operator, d=2. Roberts mask is represented in Figure 2.4b and the Laplacian operator, a second-order edge detector has is mask representation in Figure 2.4c.
Edge-based methods sometimes can detect wrong edges, that are not the transition between two regions and weak edges, when the change in pixel intensity is not abrupt
C H A P T E R 2 . T H E O R E T I C A L C O N C E P T S
enough. Thus, not all edges may be linked to form the boundaries of a region, which negatively affects the segmentation results. The performance of this approach is highly sensitive to the presence of image noise and does not work properly on images with smooth transitions and low contrast. Hence, it is recommended an additional step of pre-processing noise removal. Also these techniques should be utilized along with region- based methods for a complete and accurate segmentation [31, 39].
(a) Edge operators (b) Roberts operator
(c) Laplacian operators
Figure 2.3: Different operators for edge-based approaches (Adapted from [39, 40]).
2.5.2 Region-based methods
In region-based algorithms, an image is partitioned into similar regions according to a specific criterion, i.e., color, intensity [41]. The techniques that use this approach are region growing, thresholding, clustering and region splitting and merging.
2.5.2.1 Region splitting and merging
Split and merge technique can be divided into two main steps. At the splitting phase, the image is split repeatedly into homogeneous sub-images using a quad-tree structure, until no further splitting is possible [39].
In the quadtree structure, the root of the tree corresponds to the entire image and each node to a subdivision. These divisions occur when the variance between pixels is higher than the established criteria, breaking it into quadrants [31, 42]. Secondly when two adjacent regions are similar they merge, until no more merges can take place, while satisfying homogeneity criteria [43].
2.5.2.2 Region growing
Region growing is a method for extracting a region of the image that is connected based on a predefined criteria regarding intensity information [40]. It starts with a man- ually selected seed point and will go on adding neighbor pixels based on similarity until
12
2 . 5 . I M AG E A N A LY S I S F O R S E G M E N TAT I O N
all the pixels belong to a certain region. It is widely used to segment the parts of human body and to delineate small simple structures such as tumors and lesions [30]. The first disadvantages of region growing is that it requires manual interaction due to the need of a seed point for each region to be segmented. It is also sensitive to noisy images and inten- sity variations, causing over segmentation, holes in the regions or disconnected regions [44, 45].
2.5.3 Clustering
Clustering is also an approach for region segmentation in which an image is divided into clusters of pixels with similar characteristics [32]. It is an unsupervised task with no training phases because it trains itself with the available data. Clustering algorithms iteratively alternate between segmenting the image and characterizing the properties of each class [30]. The formation of clusters is based on the principle of maximizing the intra-class similarity and minimizing inter-class similarity. The most basic approach is to assign each pixel to the nearest cluster mean. Clustering algorithms can be classified as hard clustering (k-means) or soft clustering (fuzzy clustering).
2.5.3.1 Clustering k-means
Clustering k-means is a statistical clustering algorithm based on the similarity be- tween pairs of data [44]. It splits an image into K clusters by iteratively computing a mean intensity for each cluster and adds points to the cluster where the difference be- tween the point and the mean is smallest [30, 40]. This clustering technique does not always assure continuous areas, which is a big disadvantage comparing to other algo- rithms like splitting and merging [40].
2.5.3.2 Fuzzy clustering
Fuzzy clustering is a popular soft-clustering method that can be superior comparing to hard clustering [42]. Instead of attributing data points into separate clusters, each point is assigned to a membership grade between 0 and 1, calculated by a membership function, in which an object can belong to more than one cluster [44] .
2.5.4 Thresholding
Thresholding is considered one of the simplest and fastest segmentation methods.
This method determines an intensity value called threshold, which creates a partitioning in an image’s intensities [42].
The histogram of an image represents the number of pixels in function of each in- tensity value, and has distinct peaks which can divide images into different parts. After finding the appropriate threshold, the segmentation is completed by grouping all pixels
C H A P T E R 2 . T H E O R E T I C A L C O N C E P T S
with values greater than or equal to the threshold into one class and all the remaining pixels into another class [30].
Thresholding is an effective method to segment images which have high contrast between the object and background (i.e., bright object on dark background). However, sometimes the separation of different intensities is necessary, hence, the application of multi-thresholding is needed [42]. Thresholds can be manually selected by visualization of an image’s histogram or determined automatically with different techniques. Algo- rithms for thresholding are classified into local or global [39].
2.5.4.1 Global thresholding
Global thresholding techniques calculate a threshold based on the global information of an image, such as the histogram of global texture properties. In this method, the thresh- old value is constant, and it is applied to the whole image [40, 45]. A big disadvantage of this approach is when the background illumination is not constant [42].
2.5.4.2 Local (adaptive) thresholding
Local thresholding solves the main problem of the global technique by finding mul- tiple threshold values. It assesses local properties of pixels, dividing the image into sub- images and calculating the threshold for each part. A local threshold is assigned to pixels to determine if it belongs to the background or foreground [40]. The main limitation is that is more time consuming than global thresholding, but it behaves better in varying backgrounds [45].
2.6 Artificial intelligence methods
2.6.1 Machine Learning
(ML), a branch of computer science, can be defined as the process of solving a practical problem through the application of specialized algorithms that allow the computer to learn patterns from selected datasets [46]. ML algorithms represent the human approach of learning a certain task in different environments, which overcomes the limitations of traditional statistical methods that do not generalize for different situations [47].
ML models can be divided into two groups: supervised learning and unsupervised learning.
2.6.1.1 Unsupervised learning
Unsupervised learning algorithms learn patterns from unlabeled data. It is referred as unsupervised because there is a lack of response variable that can supervise the analysis of data [48]. These algorithms have a great ability to discover similarities, hidden patterns and the best representation that describes the data without human intervention.
14
2 . 6 . A R T I F I C I A L I N T E L L I G E N C E M E T H O D S
Unsupervised ML models can be used for clustering, association and dimensionality reduction [49].
Principal component analysis
Principal component analysis (PCA) is an unsupervised, non-parametric statistical procedure primarily used for dimensional reduction in ML. High dimensional datasets are very common, but can lead to model overfitting, which reduces the ability to general- ize beyond the dataset examples. As well, models can be more efficient with a decrease in the computational cost by removing redundant features. PCA is usually employed to reduce the dimensions of features and increase interpretability while minimizing in- formation loss. It does so by extracting uncorrelated variables that maximize variance [50].
PCA allows the transformation of a correlated variable set into another set of uncorre- lated variables that successively maximize variance, called principal components. This resultant set is arranged into decrescent order, where the first principal component has the highest variability and therefore has the most information [48].
As shown in the equation 2.1, PC1 is an artificial variable constructed as a linear combination to determine the magnitude and the direction of the maximum variance in the dataset. PC2 is also a linear combination that describes the remaining variance in the dataset and so on for the rest of the components. The maximum number of components equals the data’s dimension. In order to reduce the features dimensionality, the principal components corresponding to the highest variance in the original space are used.
P CN =w1,N(Feature A) +w2,N(Feature B)...+wN ,N(Feature N) (2.1) P CN - principal component N
wN ,N - eigenvalues
2.6.1.2 Supervised learning
Supervised learning classifies data or outcome predictions using labeled datasets, by generating a function that maps inputs to desired outputs [47]. The algorithms have the ability to learn over time from a training dataset with inputs and correct outputs, in order to yield the desired prediction.
Supervised learning can be separated into two types of problems, classification and regression. In classification, algorithms classify data into specific categories, while re- gression understands the correlation between independent and dependent variables to predict outcomes [51]. In the next subsections, the supervised learning agorithms used in this dissertation will be briefly described.
C H A P T E R 2 . T H E O R E T I C A L C O N C E P T S
Linear discriminant analysis
Linear discriminant analysis (LDA) is mostly used in classification and data dimen- sionality reduction purposes. This technique is widely used in multivariate statistical methods for data analysis with categorical label outcomes, because of its simplicity and robustness.
LDA searches for a linear combination of variables that best separates 2 classes [52].
Considering a classification problem where a random variable X belongs to one ofK classes, with some class-specific probability densities f(x), a discriminant rule aims to divide the data space into Kseparate regions, representing all the classes. Given these regions, the classification is performed by assigning Xto a certain class if it belongs to that class’ region. There are two rules to allocate each sample to a region, maximum likelihood rule and the Bayesian rule [48].
• Maximum likelihood rule -The classes have equal probabilities, soxbelongs to classhif the equation 2.2 is verified.
h=argmaxifi(x) (2.2)
• Bayesian rule-The classes have different probabilities,π, thenxbelongs to classh if the following equation 2.3 is true.
j=argmaxiπifi(x) (2.3)
Logistic regression
Logistic regression (LREG) is a traditional statistical method used for discrete binary classification problems. It is a transformation of linear regression, by instead using a sigmoid function (equation 2.4) to define the binary output variable. The main difference between logistic and linear regression is that LREG models output a value between 0 and 1, corresponding to the probability of a certain class.
This function is derived from the weighted transformation of the input features, which means each feature is multiplied by a weight and then added up [53]. The S-shape sigmoid curve, shown in Figure 2.5, establishes the boundaries between two categories predicting the probability of the input data belonging to a certain class [47].
sigmoid(x) = 1
1 +e−x (2.4)
16
2 . 6 . A R T I F I C I A L I N T E L L I G E N C E M E T H O D S
Figure 2.5: Graphic representation of sigmoid function [54].
Naive bayes
Naive bayes (NB) is a simple classification technique based on Bayes Theorem, that assumes features are independent from each other (class conditional independence). This assumption is unrealistic for most datasets but leads to a simple prediction framework.
The Bayes theorem, represented by equation 2.5, calculates the probability of an event occurring given the probability of another even that has already occurred [55].
P(H|X) =P(X|H)P(H)
P(X) (2.5)
Where,
H- some hypothesis, such that data tupleXbelongs to a specified classC X– some evidence, describe by measure on set of attributes
P(H|X)– the posterior probability that the hypothesisH holds given the evidenceX P(H)– prior probability ofH, independent onX
P(X|H)– the posterior probability that ofXconditioned on H
A final decision is reached, by calculating two probabilities ofX(equation 2.6) belong- ing to one of either classes and comparing them.
R=P(i|X)
P(j|X) = P(i)P(X|i)
P(j)P(X|j) = P(i)nP(X|i)
P(j)nP(X|j) (2.6)
Comparing these two probabilities, the highest probability indicates the predicted class label. Another way is to verify theR, ifR>1the predicted class label isi, inversely the prediction isj. NB is characterized by high precision and speed when applied to large datasets. Also, it is a very easy to construct model that requires short computational time [56].
K-Nearest neighbors
K-Nearest neighbors (KNN), a non-parametric classification algorithm, is based on the principle that the instances within a dataset will be in close proximity to other in- stances that have similar characteristics. Given a testing instance, this algorithm finds the K-nearest neighbors from the training set according to a distance metric, where the
C H A P T E R 2 . T H E O R E T I C A L C O N C E P T S
predicted label is the most frequent class of the K-nearest neighbors [55]. In general, the training samples can be represented by a point in ann-dimensional space, where each of then-dimensions corresponds to one of then-features.
There are two important elements in KNN, the distance metric to compute the dis- tance between two samples and the number of nearest neighbors (K). The main purpose of the distance measure is to minimize the distance between samples of similar classes, while maximizing the distance between samples of distinct classes [57]. Multiple exam- ples of distance metrics are listed in Table 2.2.
The performance of this algorithm is also dependent on the selection of a good K value, which can be computationally expensive [51]. The choice of K has to minimize the number of errors and allow the model to correctly predict unseen data. A larger K can solve the problem when there are noisy samples that may win the majority votes, resulting into misclassification errors. While a smaller K is more appropriate when the class distribution is uneven and the instances belonging to the majority class win the votes [55, 57].
Table 2.2: Approaches to define the distance between instances (x and y) in the KNN algorithm.
Distance Equation
Manhattan D(x, y) =Pm
i=1
xi−yi
Chebychev D(x, y) =max
xi−yi
Euclidean D(x, y) =
Pm
i=1
xi−yi
21/2
Canberra D(x, y) =Pm
i=1
|xi−yi|
|xi|+|yi| Kendall’s Rank Correlation D(x, y) = 1− 2
m(m−1)
Pm
i=j
Pi−1
j=1sign(xi−xj)sign(yi−yj)
m- number of dimensions Decision tree
The decision tree (DT) classifier is a powerful statistical tool for complex data analysis.
It can be defined as a compilation of decision nodes, connected by branches, extending downward from the root node until the ending leaf nodes [53].
The algorithm calculates the class of a sample by repeatedly partitioning a dataset into uniform subsets. The internal nodes in the tree represent a test on the attributes where each branch represents a value that the node can assume. The samples are classified starting at the root node and carefully split based on their feature values until the final leaves at the end of the tree, which compute the output variables [57]. The root node is the feature that best divides the training data by decreasing the impurity on the subsequent
18
2 . 6 . A R T I F I C I A L I N T E L L I G E N C E M E T H O D S
leaves. This feature is determined by one of various measures such as Gini index or information gain [55].
DT is a simple and fast model that may achieve good results depending on the data.
However, this algorithm is prone to overfit and not generalize well on unseen data if there is not a stopping criterion. So, there is a risk of generating high dimensional trees with leaf nodes containing only one observation.
Random forest
Random Forest (RF) is an ensemble method that consists of a large number of individ- ual DTs. Each DT in the RF gives a predicted class, in which the class with the majority of votes is selected as the model’s final prediction [51]. The utilization of uncorrelated models to produce ensemble predictions is more accurate than individual predictions. RF generates various DTs in two ways, a random sampling of image data bootstrap samples (bagging) and random selection of input features for generating an individual base DT.
In the bagging technique, each tree randomly selects samples from the original dataset with replacement (picking the same sample more than once) using only a random subset of these samples, which results in different trees. In the random feature selection process, the features are randomly selected to build the binary rule at each node, resulting in a bigger variation among trees [58].
Due to the advantages present in RF, this model has played a relevant role in ML.
The advantages of RF classifiers are its simplicity, speed, robustness to noise, reduction of overfitting, evaluating variable’s importance and running effectively on large datasets.
However, with the increase of tree number, the algorithm requires a high time consump- tion [56].
Support vector machine
Support vector machine (SVM) is a technique used for classification and regression problems. It aims to determine the location of decision boundaries, also known as hy- perplane, which produces the optimal separation between 2 classes [57]. The algorithm revolves around the notion of maximizing the margin that separates two data classes, by finding the data points (support vectors) that define the hyperplane and derivative line coefficients.
SVM can also be extended to learn non-linear decision functions. It is necessary to transform the input data onto a high-dimensional feature space using kernel functions (quadratic, cubic or high-order polynomial functions), where features are already linearly separable. Another case is when two classes are not linearly separable, in which the SVM tries to maximize the distance to the hyperplane while minimizing the number of misclassification errors by using soft margin [55].
SVM is widely used in ML due to its robustness and accuracy when compared with other algorithms. Other advantages are needing less examples to train and the reduced
C H A P T E R 2 . T H E O R E T I C A L C O N C E P T S
risk of overfitting. On the downside, with the increase of training samples, SVM takes an extremely long computational time and requires lots of memory [55].
2.6.2 Deep learning
DL is a subset of ML with numerous layers of algorithms (artificial neural networks or ANNs) [59]. A neural network is inspired in the representation of the human brain, simulating the learning process by having the ability to learn and generalize [40, 42].
It consists of many interconnected processing elements (nodes). In ANNs, the learning process is achieved by the adjustment of the weights between the interconnected layers regarding the training procedure and the input data [31].
Regardless of the classical methods popularity, DL architectures started to solve many computer vision problems such as segmentation with the use of CNNs [60]. CNNs, com- pared with other approaches, perform better in terms of accuracy and efficiency. The main advantage of DL approaches is the capability to automatically learn more compact and meaningful feature representations from images than hand-crafted methods [61].
Segmentation with DL methods can be categorized into semantic segmentation and instance segmentation. Semantic segmentation predicts class labels for regions of an image so that every pixel is labeled within an unique semantic entity [35, 62]. It is a pixel-wise classification that contributes for the identification of objects or semantically meaningful distributions and patterns in images [34]. Instance segmentation separates la- bels for different instances of the same class. It provides more information than semantic segmentation with the calculation of the number of elements belonging to one class.
2.6.2.1 Convolutional neural networks
A CNN is a deep feed-forward architecture with the ability to generalize, inspired by the organization of animal visual cortex. It can extract automatically highly abstract features without human supervision by using numerous blocks of layers [63, 64].
The main advantages of CNNs compared to its predecessors are the wide application range and high accuracy [26]. However, the structure of CNNs has its limitations as it needs a long training process before it can perform certain tasks. Also, it has a high computational and memory cost, the diseases or lesions present in the dataset are very scarce and it requires a large number of labeled data for learning, which can be hard to obtain since annotated medical images are difficult to collect. Other complication is the overfitting issues, that may require several adjustments in the architecture or its learning parameters [29, 64].
A CNN model (Figure 2.6) typically consists of three types of layers: convolutional layer, pooling layer and fully connected layer. The convolutional layer extracts features from the input by computing the convolution operation between its input and a kernel.
The pooling layer performs a downsampling operation to reduce the network parameters, 20
2 . 6 . A R T I F I C I A L I N T E L L I G E N C E M E T H O D S
accelerating training. Next, the fully connected layer maps the features obtained into a final output, which varies depending on the type of analysis [59, 65].
Figure 2.6: Typical CNN architecture containing several convolutional and pooling layers followed by a fully connected layer which maps the final feature maps for classification.
Convolutional layer
Convolutional layers extract features with a linear operation called convolution, where a kernel is applied across the input. Each value of a kernel acts as a weight, which can be random numbers or initialized numbers by various methods. Over each training step, these weights are adjusted [4].
The convolutional operation, displayed in Figure 2.7, consists of a kernel sliding over the input vector horizontally and vertically, to generate a feature map. The product between each element of the kernel and the input vector at each location is calculated, followed by the sum to obtain a single value. This operation extracts N features and is repeated with multiple kernels to obtain a combination of feature maps, which represent distinct characteristics of the input [59].
Figure 2.7: An example of a convolution operation with a 3x3 kernel.The kernel is applied across the input, and an element-wise product between the elements of the kernel and the input is calculated and summed at each location to obtain the output value in the corresponding position, creating the output feature map. (Adapted from [66]).