3.2 Materials and Methods
3.2.5 Approach to Post-Processing Image RGB-D
Therefore, the encoder transformer passes the extracted information to a simpler decoder. This approach has proved to be very accurate compared to CNNs that work with the encoder-decoder format. Inserting a trans-former in the encoder is the crucial point of SETR. The method SETR has reached good results in the famous image bases ADE20K [Zhou et al., 2017], where it obtained an average of 50.28% of the IoU (mIoU), Pascal Context [Mottaghi et al., 2014], where it obtained an average of 55.83%
of the mIoU, in addition to good results in the Cityscapes [Cordts et al., 2016] image set.
• SegFormer: The SegFormer semantic segmentation works with the uni-fication of transformers with multilayer perception (MLP) decoders. In this approach, the encoder is a hierarchically structured transformer to produce multiscale features. SegFormer decoders are simple and lightweight. This decoder carries the information from the layers to work with local and global attention more efficiently. SegFormer has a simple and lightweight design yet achieves robust results in image segmenta-tion. The SegFormer framework contains a series of models with various parameters. The SegFormer-B4 model achieved a state-of-the-art result in the ADE20K image set, with 50.3% mIoU. SegFormer’s best model, SegFormer-B5, achieved an excellent result in the validation set of the Cityscapes-C dataset, with 84.0% mIoU.
Figure 3.4: Overview of the workflow of approach. Source: The author, 2022.
3.2.5.1 Finding Connected Components in the Depth Map
The first stage of the post-processing technique consists of creating an al-gorithm capable of identifying all connected components in the depth map of RGB-D images using region growth. The depth matrices contain information on the distance of objects to the camera. The greater the value in the depth matrix at a given coordinate, the closer the object is to the camera. The 3.2.5.1 algorithm presents the pseudo-code to find the connected components in the depth map to create a new matrix filled with the values of the connected ob-jects in the depth matrix, separated by class. This algorithm aims to group all connected objects in the depth matrix and enumerate them with individual classes by growing regions. The depth information will be used, as different objects at the same depth are expected to belong to the same object. This approach may be helpful, as it will provide a matrix filled with all objects con-nected from the depth matrix, therefore allowing the possibility of assigning a pixel-by-pixel vote on the pixels in the image segmented by the SIS net-works. For this approach to work, we will need to consider some essential points about depth maps, such as the acceptable tolerance limit for two ob-jects to be of the same class when creating the new depth matrix filled with the connected objects. For growing regions and defining classes of connected objects, we start by assigning negative classes to all connected values of the depth map. This approach to negative class values will be helpful in our work.
It will help the algorithm’s performance, as it prevents the same value from being reclassified again, given that during the execution of the algorithm, only positive values will be calculated. In the proposed algorithm to search for
connected components in the depth matrix, we start by defining class -1 for the background in the depth matrix. The other segments will be classified in descending order from the value -2, as shown in the algorithm. The growing regions were then cultivated using theflood_fill function from the Scikit-image library [van der Walt et al., 2014]. Theflood_fill function fills all values close to the source value based on an acceptable tolerance. The new value filled in the matrix differs from the origin point value, which is the negative class passed as a parameter in our case. This process is repeated until all matrix points are visited and enumerated with classes with negative values. The region’s growth step ends when there are no more positive points to visit. At the end of the algorithm, we multiply the final matrix values by -1 to make the class values positive and return a new depth matrix filled with the values of the connected components.
1: Algorithm 1: Growing Region of Post-Processing Approach Require: DepthMatrix, Tolerance
Ensure: FillImage
2: if DepthMatrix there is no pixel != 0 then
3: return DepthMatrix
4: end if
5: Index ← -2
6: seed_point ←first point != -1 ofDepthMatrix
7: Set the value -1 for a background in DepthMatrix
8: while there are pixels to grow do
9: # Grows current pixel based on tolerance
10: DepthMatrix← f lood_f ill(DepthMatrix,seed_point,Index,Tolerance)
11: if DepthMatrixthere is no pixel > 0 then
12: break
13: end if
14: Index ← Index−1
15: seed_point ← next point >0 of DepthMatrix
16: end while
17: FillImage= ← ((DepthMatrix)*(-1))
18: return FillImage
3.2.5.2 Algorithm for Improve Results
The last stage of our approach was the development of an algorithm to improve the results of images segmented by SIS networks. In this method, we combine the pixels of the image segmented by the SIS networks with the values of a new matrix of connected components obtained from the algorithm
of the previous subsection. This algorithm voted for each pixel of the image segmented by the SIS network, validating them with the connected compo-nents matrix. This poll compared the pixel values of the segmented image with the neighboring pixels of the depth matrix to overcome problems such as segmentation faults, holes, erosions, and segmentation errors in Eucalyptus trunks. The Algorithm pseudo-code 3.2.5.2 presents how this process works.
First, we extracted the new array of components. This array provided a set of components that belonged to the same class. Then, we loaded the segmented image and extracted the objects with the value 255, as they represented the segmented Eucalyptus trunks of interest. We use the x and y coordinates of the segmented image to index the array of connected components, result-ing in a list of objects that we use to calculate the labels and frequencies for comparison. We calculated the frequency of each pixel in the image object to determine how many times a given class appeared in the indexed image. This returned us the unique objects and the number of times each object appeared in the indexed array of objects. The next step was to find objects with a vol-ume greater than 10% of the total volvol-ume of the image. For this, we divided the number of times each object appeared in the depth matrix by the total number of pixels in the image, resulting in a matrix with the percentage of each object. Finally, we replaced the values of objects with a volume greater than 10% of the total image volume with the value 255, representing the seg-mented trunks. We expected this approach to improve the outputs of current SIS networks significantly.
1: Algorithm 2: Improve Results of Networks Outputs Require: ImageSegmeted, DepthMatrixImage, Tolerance Ensure: NewImageSegmented
2: FillDepthImage← grown regions of DepthMatrixImage
3: PointsX Y ← new empty array of points
4: foreach pixel in ImageSegmeted do
5: if pixel is equal to 255 then
6: add pixel in thePointsX Y
7: end if
8: end for
9: Ob jects← new empty array
10: for each pixel in FillDepthImagedo
11: if PointsX Y contains pixel then
12: add pixel in the Ob jects
13: end if
14: end for
15: Find unique values inOb jects and setbinsto those values
16: Count the number of occurrences of each bin and set the counts to those values
17: Dividecounts by the sum of all values incounts
18: Find all bins with a count greater than 0.1 and setnew_ob jectsto those bins
19: for eachob j in new_ob jectsdo
20: if ob j is not equal to 1then
21: SetFillDepthImage to 255 where FillDepthImageis equal to ob j
22: end if
23: end for
24: NewImageSegmented ← FillDepthImage
25: return NewImageSegmented