Figure 6.1: Image gradient, from [35]. Figure 6.2: Shortest paths from the bottom to the middle.
Figure 6.3: Shortest paths from the middle to the bottom.
Figure 6.4: Strong paths between middle and bottom.
Figure 6.5: Selected shortest paths. Figure 6.6: Strong paths between top and bot-tom.
Figure 6.7: The endpoints are the highest points of the shortest path.
Figure 6.8: Automatic breast contour detection with the shortest path algorithm.
6.3 Methodology
6.3.1 Deep Keypoint Detection Algorithm
Deep Neural Networks (DNN) offer a valuable framework to achieve this integrated learning.
However, as in biomedical applications we usually deal with small datasets, DNN tend to overfit
Figure 6.9: Nipple candidates detection [26]. Figure 6.10: Harris corner detection. [26].
severely. There are some approaches that allow to mitigate the effect of overfitting, for example, transfer learning and learning an intermediate representation. This latter idea was explored in other domains in the works done by Belagiannis et al.[13] and Caoet al.[21]. In both works, an intermediate representation consisting of confidence maps in relation to the location of the keypoints was created. An additional interesting idea also explored in those works was an iterative process of refinement.
Based on the ideas previously mentioned, we have built a DNN to automatically detect key-points in photographs of patients after being subjected to BCCT (see Figure 6.11). As shown in Figure6.11, the architecture of the proposed DNN comprises two main modules: regression and refinement of heatmaps, and regression of keypoints.
The first module is what we call Heatmap Regression and Refinement. Here the goal is to generate an intermediate representation consisting of a fuzzy localization for the keypoints we want to detect. This is done in order to help the regularization process of the DNN. Heatmaps are obtained using a fully convolutional neural network. In this work, we used the well-known segmentation model, U-Net [135]. Figure 6.12b presents an example with an image from the dataset and the respective ground truth heatmap super-imposed.
The second module has as input the multiplication of the image with the refined output of the previous module (Out put1(n)). The regression of the keypoints is composed of three blocks:
a convolutional backbone (here, the convolutional blocks of VGG16), some additional convo-lutional layers and, finally, the classifier composed of fully-connected layers. The first block, VGG16 [157], is pre-trained with ImageNet and then fine-tuned in our dataset. After VGG16, four convolutional layers are added to further increase image processing and decrease the size of feature maps before the dense layers. Finally, three dense layers are used to regress the 74 coor-dinates, corresponding to the keypoints that make up the breast contour, endpoints, nipples and supra-sternal notch (Figure6.12a).
The proposed fully supervised learning scheme requires not only the ground truth for the key-points but also a ground truth for the heatmaps, which is created considering a Gaussian centered at each keypoint and with a pre-defined standard deviation.
Regarding the learning process, we have two different terms in the loss function: heatmap regression, which works here as a regularization term, and keypoints regression, our goal. Thus,
6.3 Methodology 77
Input (image)
FCN(1)
Out put1(1) (heatmap)
FCN(n)
⊗ . . .
. . .
Out put1(n) (heatmap)
⊗
Conv Backbone
Conv Layers
Dense Layers
Output2 (coordinates) Heatmap Regression and Refinement
Keypoints Regression
Figure 6.11: Proposed iterative DNN architecture for keypoint detection. FCN stands for fully-convolutional network. Conv Backbone stands for convolutional backbone.
(a) Keypoints (b) Heatmap
Figure 6.12: Example of Ground Truth
the loss function is a linear combination between these two terms, withλh andλk weighting the importance of each term (Eq.6.1).
L =λhLheatmaps+λkLkeypoints (6.1)
In relation to the regression of the keypoints, the mean squared error (MSE) was the loss
function selected (Eq. 6.2). Nk is the number of coordinates,xtargetk the ground truth for a single coordinate and ˆxk the prediction.
Lkeypoints= 1 Nk
∑
∀k
(xtargetk −xˆk)2 (6.2)
The heatmaps were also learnt using MSE. However, the heatmaps undergo an iterative process of refinement. Thus, the complete process is defined by Eq. 6.3, where jrepresents a step in the refinement process andλjrepresents the weight given to that step.
Lheatmaps=
Nh j=1
∑
λjLheatmap(j) (6.3)
Finally, the loss for the heatmap in each step is defined as follows Lheatmap(j) = 1
Np
∑
∀p
(xtargetp −xˆp)2 (6.4)
whereNp corresponds to the number of pixels in the image, andxtargetp and ˆxp to the ground truth and prediction for the pixel values, respectively.
6.3.2 Hybrid Keypoint Detection Algorithm
Besides the fully deep learning approach, we also explored the use of a hybrid (deep + traditional) approach to the detection of keypoints (see Fig.6.13). With this hybrid approach, the endpoints and the nipples are obtained with the use of the previously presented deep learning algorithm, whereas the contour is detected with the conventional shortest path algorithm, which was described in section6.2 (Related Work). The main difference here is that the endpoints given as input to the shortest path algorithm are obtained with the Deep Keypoint Detection Algorithm, instead of being specified by the user, or detected through traditional approaches. This hybrid algorithm led to an improvement in the results when compared with using the deep learning approach to find all keypoints.
Input (Image)
Deep Keypoint Detection Algorithm
Compute Image Gradients
Detected Image Endpoints
Model Image as
Graph
Shortest Path Algorithm
Breast Contour Coordinates
Figure 6.13: Hybrid Keypoint Detection Algorithm.
6.3 Methodology 79
6.3.3 Keypoint detection through deep segmentation
Our third approach consists on combining image segmentation and the deep keypoint detection algorithm in a pipeline. Since the approach heavily relies on segmentation, the first step was train-ing a deep image segmentation model to achieve good results in breast segmentation. To train this model, it was necessary to generate ground-truth breast masks, which were obtained with the sup-port of the ground-truth keypoints and images. Using the ground-truth keypoints as delimiters of the area occupied by both breasts, the value of 255 was assigned to all pixels inside and the value of 0 to all pixels outside this area. Images were then normalized to have pixel values between 0 and 1. Taking into account previous results with other models [70], for this experiment, it was decided to use the U-Net++ architecture as the deep segmentation model. The official Keras implemen-tation by Zhouet al.1 was trained and fine-tuned (the model has an encoder which is initialized with the ImageNet [53] weights) during 300 epochs with Adadelta [184] as the optimizer. During training, binary cross-entropy was selected as the loss function. Regarding data augmentation, im-age and mask translations, rotations and flips were employed in an online fashion during training.
These hyperparameters were chosen based on model performance. The trained U-Net++ model was then used to generate segmentation masks. From these masks, contours were extracted using the marching squares algorithm (a special case of the marching cubes algorithm [103]), imple-mented in scikit-image [172]. The intuition behind this contour detection step is that the detected contours will contain the breast contour keypoints, assuming that breast segmentation masks were well generated. As such, this first step outputs a variable number of contour keypoints, some of which are not desired, because they do not belong to what is considered the breast contour. As a post-processing step, the deep keypoint detection algorithm’s predicted endpoints are projected onto the mask contours through the minimization of the Euclidean distance between the mask contour keypoint and the predicted keypoint. At the end of this processing step, there are new 34 breast contour keypoints plus the nipples and sternal notch, which were predicted by the deep keypoint detection algorithm alone (see Fig.6.14). Fig.6.17shows a chronological scheme of this algorithm.
Input (Image)
Deep Segmentation
Model
Deep Keypoint Detection Algorithm
Predicted Masks
Predicted Keypoints
Contour Detection Algorithm
Detected Contours Processing Step
Final Predicted Keypoints
Figure 6.14: A Novel Segmentation-Based Keypoint Detection Algorithm.
1See:https://github.com/MrGiovanni/UNetPlusPlus