Automatic Habitat Mapping using Machine Learning

(1)

Automatic

Habitat Mapping

using

Machine Learning

André Filipe Costa Diegues

Mestrado Integrado em Engenharia de Redes e

Sistemas Informáticos

Departamento de Ciência de Computadores 2017/2018

Orientador

Pedro Manuel Pinto Ribeiro, Professor Auxiliar, Faculdade de Ciências da Universidade do Porto

Coorientador

José Carlos de Queirós Pinto, Assistente de Investigação, Faculdade de Engenharia da Universidade do Porto

(2)

Todas as correções determinadas pelo júri, e só essas, foram

efetuadas.

O Presidente do Júri,

(3)

Abstract

Habitat mapping is an important task to manage ecosystems. This task becomes most challenging when it comes to marine habitats as it is hard to get good images in underwater conditions. In order to do habitat mapping in a vast coastal region, we combine the use of sonar and camera sensors embedded in Autonomous Underwater Vehicles that can accurately locate itself in underwater environments and the data it retrieves. Side Scan Sonar images are not easy to analyze as they are distorted by the movement of the robots. In our approach, human specialists annotate a subset of the optic images and machine learning is applied to automatically identify other images. Moreover, optic images are associated with Sides Scan Sonar imagery which is partially co-located. A tool (Hijack) has been specifically developed to help biologists in the annotation phase. This tool interacts with the user to quickly analyze each image and associate it with a habitat type from the European Nature Information System taxonomy. The optic images are enhanced using Computer Vision techniques, to help specialists to be certain of the habitats they are observing. Later we employ Convolutional Neural Networks to automatically classify habitats with the annotations of specialists by predicting the remaining imagery, both optical and sonar. With this approach, we present a method to perform marine habitat mapping, associating optical images with Side Scan Sonar images with a Machine Learning predictive model.

(4)

(5)

Resumo

O mapeamento de habitats é uma tarefa importante para gerir ecossistemas. Esta tarefa torna-se mais desafiante quando se trata de habitats marinhos, pois é difícil obter boas imagens ópticas em condições subaquáticas. Para fazer o mapeamento de habitats numa vasta região costeira, utilizámos sensores de sonar e câmara instalados em Veículos Autónomos Subaquáticos que são capazes de localizar com precisão todos os dados. As imagens do sonar de varrimento lateral não são fáceis de analisar, pois são distorcidas pelo movimento dos veículos. Na nossa abordagem, biólogos marinhos anotam apenas as imagens ópticas e estas são, posteriormente, associadas a imagens de sonar de varrimento lateral na mesma área geográfica. Uma ferramenta (Hijack) foi desenvolvida especificamente para ajudar os biólogos na fase de anotação. Esta ferramenta interage com o usuário para analisar rapidamente cada imagem e associá-la a um tipo de habitat da taxonomia European Nature Information System. As imagens ópticas são melhoradas através de técnicas de Visão Computacional, para ajudar os especialistas a ter maior certeza do habitat que estão a observar. Posteriormente, usámos Redes Neuronais de Convolução para caracterizar automaticamente as imagens restantes, tanto ópticas como adquiridas com sensores sonares. Com esta abordagem, apresentamos um método para fazer mapeamento de habitats marinhos, associando imagens ópticas com imagens de sonar de varrimento lateral com um modelo de previsão de Machine Learning.

(6)

(7)

Acknowledgements

I wish to express my sincere gratitude to my supervisors, Professors Pedro Ribeiro and José Pinto, for all the support, knowledge and freedom they gave me to develop this dissertation. Also, this dissertation would not be possible to complete without the cooperation of Vasco Ferreira, marine biologist, who not only made himself available to annotate the optic images, as was directly involved with the development of Hijack.

I also need to thank the LSTS and its members, as they provided me a place to work at the laboratory, the hardware where I developed and wrote this dissertation and let me participate directly in the OMARE missions in Esposende.

I would like to thank Professor Miguel Coimbra, for letting me attend his Computer Vision course at FCUP and giving me advice towards the image enhancement developed in this dissertation, and Professor Inês Dutra for reviewing the Machine Learning part of this dissertation.

Last, but not least, I would like to thank my family and girlfriend for their support, love and patience.

(8)

“Failure is simply the opportunity to begin again, this time more intelligently.” Henry Ford

(9)

To my grandfather, Joaquim

(10)

(11)

2.3.2 Color Models . . . 15 2.3.3 White Balance . . . 17 2.3.4 Shannon’s Entropy . . . 17 2.4 Machine Learning. . . 18 2.4.1 Deep Learning . . . 18 3 Related Work 25 4 Experiments and Tests 29 4.1 Field Missions . . . 29 4.2 OMARE Missions . . . 30 5 Implementation 33 5.1 Technical Approach . . . 33 5.2 Image Retrieval . . . 34 5.3 Image Enhancement . . . 37 5.3.1 Optic Images . . . 38

5.3.2 Measuring Information of images . . . 40

5.4 Hijack . . . 41

5.4.1 Requirements . . . 42

5.4.2 Application Development . . . 42

5.4.3 Architecture . . . 45

5.5 Associating Optic to Side Scan Sonar images . . . 47

5.6 Optic Image Classification . . . 48

5.6.1 Data Wrangling. . . 49

5.6.2 Data Analysis. . . 49

5.6.3 Predictive Modelling . . . 51

(13)

5.7 Model Testing . . . 56

5.8 Side Scan Sonar Image Classification . . . 56

5.8.1 Data Wrangling. . . 57

6 Results 61 6.1 Image Enhancement . . . 61

6.1.1 Optic Images . . . 61

6.1.2 Side Scan Sonar Images . . . 63

6.2 Image Annotation Process . . . 63

6.3 Prediction Analysis . . . 64

7 Conclusion 71

Bibliography 73

(14)

(15)

List of Tables

5.1 First records of the data contained in a log’s CSV file exported in Neptus. . . . 48

5.2 First records from a log CSV file exported by Hijack. . . 49

5.3 Comparison of the classification distribution and the median. . . 50

5.4 Example of a CSV file generated by Neptus, with respect to the SSS images. . . 57

5.5 Optic and SSS image association data set. . . 58

5.6 Data set containing the SSS images’ EUNIS classification information. . . 59

5.7 EUNIS classification distribution of SSS images. . . 59

6.1 Confusion matrix composition. . . 66

6.2 Overall accuracy percentage of the model. . . 67

6.3 Classification report on predictions of the level 2 of EUNIS. . . 68

6.4 Classification report on predictions of the level 3 of EUNIS. . . 68

(16)

(17)

List of Figures

1.1 OMARE online platform. Biogeography of sponges in the PNLN. Taken from [1] 2

1.2 EUNIS Habitat classification for two different marine habitats. Taken from [2] . 3

2.1 Bathymetry map of the coast of Madeira, Portugal. Image retrieved by LSTS. . 8

2.2 Functionality of a side scan sonar sensor. Taken from [3]. . . 9

2.3 Possible rotations and movements of the vehicle that can affect the performance of the sonar. Taken from [3] . . . 9

2.4 Side scan sonar image of a coastal area with rock and sand environments. . . 10

2.5 SSS analyzer in Neptus MRA. . . 11

2.6 Visualization of a plot in Neptus MRA. . . 11

2.7 Power-Law transformation examples for different values of γ. Taken from [4]. . . 13

2.8 Aerial image and subsequent power-law transformations with γ values of 3, 4 and 5 and c = 1. Taken from [4]. . . 13

2.9 Plot of the contrast stretching function. On the lower-right we have the contrast stretching applied to the upper-right image. Adapted from [4]. . . 14

2.10 Histogram Equalization application example. Adapted from [4]. . . 15

2.11 Example of a feed-forward Neural Network. Taken from [5]. . . 19

2.12 A Convolutional Neural Network example. Taken from [6].. . . 20

2.13 Description of a convolutional layer and its stages. Adapted from [7]. . . 21

2.14 Application of a convolution operation according to Equation 2.31. Taken from [7]. 22 2.15 Example of a max pooling operation computation. Taken from [7]. . . 22

4.1 Noptilus 2 and 3 LAUVs. . . 30

(18)

4.2 Small boat used in the OMARE missions. . . 30

4.3 Example of a plan used to retrieve data in Esposende using Noptilus 1 and 3. . . 31

4.4 Example of a plan used to retrieve data in Esposende using Noptilus 1, 2 and 3. 31 4.5 AUV navigating at ocean surface. . . 32

4.6 Operator using Neptus to download the mission’s data from AUVs. . . 32

5.1 Representation of the phases of our approach. . . 34

5.2 Hijack’s dependencies. . . 43

5.3 UI of the application when started. Allows the user to choose a folder with images to annotate. . . 43

5.4 UI of the application after loading images. . . 44

5.5 Modifications done to the first UI attempt. On the top we can see the image information (image name, location and depth), below that the image itself and on the bottom the area to annotate the habitat. . . 44

5.6 UI of the application after loading images. . . 45

5.7 System context diagram, representing the level 1 of the C4 model. . . 45

5.8 Container diagram, representing the level 2 of the C4 model. . . 46

5.9 Component diagram, representing the level 3 of the C4 model. . . 46

5.10 Class diagram, representing the level 4 of the C4 model. . . 47

5.11 Habitat classification distribution of the retrieved logs. . . 50

5.12 Plot of the distribution of the depth of optic images by EUNIS habitat type. . . 50

5.13 Habitat classification distribution of the retrieved logs after undersampling. . . . 51

5.14 Available configurations of the VGG CNN. Taken from [8].. . . 52

5.15 Architecture design of our first CNN approach with 13 layers. . . 53

5.16 Architecture design of our Configuration D VGG CNN approach with 16 layers.. 55

5.17 Plot of the SSS images’ classification distribution.. . . 59

6.1 Image enhancement to an optic image retrieved in an OMARE mission. . . 62

(19)

6.2 Histogram analysis after transforming the original image (on the left) with the Max-White algorithm (center image) and the HSV contrast stretch (rightmost

image). . . 62

6.3 An example of the image enhancement performed. . . 63

6.4 Another example of the image enhancement performed. . . 63

6.5 SSS image enhancement performed in Neptus. In a) we have a SSS image captured by the AUV in an OMARE mission. In b) we have the SSS image enhanced by slant-range correction, TVG and normalization optimization. . . 64

6.6 Data augmentation example. . . 65

6.7 Plot of the training of the first CNN design. . . 66

6.8 Plot of the training of the Configuration D VGG CNN design using Dropout and Fine-Tuning. . . 66

6.9 Number of samples per level 2 class predicted of EUNIS. . . 67

6.10 Number of samples per level 3 class predicted of EUNIS. . . 67

6.11 Distribution of inaccurate predictions per class. . . 69

(20)

(21)

Listings

5.1 Video to frames exporter. . . 35

5.2 Side Scan Sonar positions exporter . . . 37

5.3 Max white algorithm . . . 39

5.4 Contrast Stretching and Power Law transformation algorithms . . . 40

5.5 RGB image enhancement algorithms . . . 41

5.6 Shannon’s Entropy programs to filter noisy images . . . 41

5.7 Great circle distance in kilometers of two pairs of (longitude, latitude) . . . 48

5.8 Data Wrangling . . . 49

5.9 One-Hot encoding and Train-Test splitting. . . 51

5.10 Adaptation of VGG configuration D with 13 layers.. . . 54

5.11 Fine-tuning a VGG configuration D. . . 56

5.12 Merge of SSS logs. . . 57

5.13 Squared subimages generator. . . 58

5.14 Optic and SSS image association. . . 58

(22)

(23)

Acronyms

AI Artificial Intelligence

ASV Autonomous Surface Vehicle AUV Autonomous Underwater Vehicle CNN Convolutional Neural Network CSS Cascading Style Sheets

CSV Comma-Separated Values

CV Computer Vision

DCC Computer Science Department

DL Deep Learning

DM Data Mining

DT Decision Tree

DVL Doppler Velocity Log

EUNIS European Nature Information System FCUP Faculty of Sciences of the University

of Porto

FCL Fully-Connected Layer FEUP Faculty of Engineering of the

University of Porto GPU Graphics Processing Unit GUI Graphical User Interface HCI Human-Computer Interaction

HM Habitat Mapping

IMU Inertial Measurement Unit JVM Java Virtual Machine

LSTS Underwater Systems and Technology Laboratory

ML Machine Learning

MLC Maximum Likelihood Classifier MRA Mission & Review Analysis MVC Model-View-Controller

NN Neural Network

OMARE Marine Observatory of Esposende PNLN Northern Littoral Natural Park PO-SEUR Operational Programme for

Sustainability and Efficient Use of Resources

ReLU Rectified Linear Unit

RF Random Forest

ROV Remotely Operated Vehicle SGD Stochastic Gradient Descent SSS Side Scan Sonar

SVM Support Vector Machine TVG Time Variable Gain UAV Unmanned Aerial Vehicle xxi

(24)

UI User Interface

UUV Unmanned Underwater Vehicle

VGG Visual Geometry Group

(25)

Chapter 1 Introduction

Many natural habitats and respective ecosystems all across the globe are being subject to high risks of destruction due to changing weather patterns, pollution, invasive species or over-exploitation [9]. In order to understand how human activity impacts these often fragile habitats, one needs to quantify them through habitat mapping. Two consecutive habitat maps of the same region can help specialists assess the health of the region and determine a course of action to improve it. Habitat Mapping (HM) can, this way, provide a better understanding of the consequences of human actions and decisions and help to manage ecosystems in order to preserve them [10].

The problem of creating accurate habitat maps increases when the area is inaccessible such as in submerged coastal areas. In this case, habitat maps can not be inferred from aerial photography as the bottom of the sea is not visible from air (except on very rare occasions / clear waters).

In this dissertation we propose a novel technique for performing habitat mapping in submerged coastal areas using Autonomous Underwater Vehicles (AUVs) and Convolutional Neural Network (CNN) based interpretation of the collected data. Our approach consists in an initial quick assessment of the different natural habitats in the region by travelling close to the bottom with a camera and then complete coverage of the area using a side scan sonar. While habitats in the optical images can easily be identified by marine biologists, habitats in sonar images are identified by a CNN trained using the annotated optical images.

1.1 LSTS

The Underwater Systems and Technology Laboratory (LSTS) is a research laboratory established in 1997 in the Faculty of Engineering of the University of Porto (FEUP). LSTS targets the development of technologies and Open Source software tools1 for the operation of networked vehicle systems, such as DUNE, IMC or Neptus [11].

1

https://github.com/LSTS

(26)

2 Chapter 1. Introduction

LSTS has, also, designed and developed their vehicles and, currently, its fleet contains Unmanned Underwater Vehicles (UUVs), such asAUVs and Remotely Operated Vehicles (ROVs), Autonomous Surface Vehicles (ASVs) and Unmanned Aerial Vehicles (UAVs), which can be used for underwater, surface and aerial applications, respectively. In particular, the AUVs can carry sonar sensors, which we will describe in Section 2.1, and optic cameras that can be used to map and explore the ocean.

As a result of a project for mapping and exploring a coastal marine reserve in the north of Portugal, the City Council of Esposende partnered up withLSTSto useAUVs to retrieve images of the area and to automatize the habitat classification process.

1.2 OMARE Project

The Marine Observatory of Esposende (OMARE) [1], formally born at November 16, 2017, is an Operational Programme for Sustainability and Efficient Use of Resources (PO-SEUR) project led by the City Council of Esposende, consisting in an online portal that provides information of its marine reserve (see Figure 1.1). The project was created to manage the coastal area of Esposende, protect threatened species and preserve biodiversity of the Northern Littoral Natural Park (PNLN). In order to do so, mapping the habitats of the region will allow to implement an ecosystem-based management of the sea and realize the consequences of human activities as, with the rising evolution of technology, retrieving data about these habitats is much easier than before [10].

Figure 1.1: OMARE online platform. Biogeography of sponges in the PNLN. Taken from [1] .

In the context of theOMARE project, our approach is to employ multipleAUVs that use advanced location sensors such as the optic Inertial Measurement Unit (IMU) and the acoustic Doppler Velocity Log (DVL) to collect geo-referenced optic and sonar-based images of the bottom. The employedAUVs are man-portable (weighting around 25 kg), can be used from the coast all the way down to 100 meters and carry a camera and series of sonars (single-beam, multi-beam

(27)

1.2. OMARE Project 3 and side scanning).

Side Scan Sonars (SSSs) (which we will describe in Section2.1) have a nominal coverage of 50 meters to each side of the AUV where optical sensors (video camera) can only cover about 2 meters and only when the vehicle is close to the bottom and under good (low turbidity) conditions. The two types of data (sonar and optical) can be associated by their (accurate) geographic location and, in this project, we will use machine learning to perform automatic classification of habitats between the manually tagged optical images and sonar-based images which will provide a complete habitat map of the region.

1.2.1 Habitat Classification Metric

Habitats will be classified with the European Nature Information System (EUNIS)2 taxonomy, which is an European reference system for hierarchical classification of all kinds of habitats and communities, natural and artificial. It provides the comparative information about the characteristics of a habitat, allowing to monitor and evaluate ecological quality.

TheEUNIS classification system has 6 levels (see Figure 1.2): levels 1, 2 and 3 represent physical criteria of the seabed (light, energy and soil) while levels 4, 5 and 6 are defined by the physical criteria available plus the biological criteria (communities and species associated to their habitats) [12].

Figure 1.2: EUNIS Habitat classification for two different marine habitats. Taken from [2]

(28)

4 Chapter 1. Introduction

1.3 Objectives and Contributions

CNNs have been the best model to classify images. To our best knowledge, there is no preceding work that has used CNNs as a predictive model for marine HM using the EUNIS taxonomy, although there already exist works usingCNNs regarding marineHMto coral or seabed sediment classification (as we will present in Chapter 3).

Our primary goal is to train a CNN that can effectively classify marine habitats with the EUNIStaxonomy using SSS images. To do that, we need first to get ground truth from optic images and annotations from marine biologists. In order to accelerate the annotation process, we want to develop a tool that can provide all the information needed in one screen and, afterwards, develop a CNN that can accurately classify optic images into marine habitats. We are required to predict the first 3 levels of the EUNIStaxonomy for each habitat. This classification should be associated to an area where it is valid. Associating optical images toSSS images should be possible as theAUVs can collect geo-referrenced images with the correct location of the vehicle. Lastly, we want to make results readable and usable by its end-users (marine biologists, city hall staff).

1.4 Thesis Outline

This dissertation is structured as follows:

Chapter 1 - Introduction. We briefly describe the problem this dissertation approaches, explaining its importance and context. We also define the objectives that this dissertation should meet. Additionally, it outlines the chapters’ contents presented.

Chapter 2 - Background. We explain the theory notions and concepts that are used throughout the dissertation. These concepts include algorithms, methodologies and knowledge of SSS, Computer Vision (CV) and MLstudy fields.

Chapter 3 - Related Work. We introduce the work done in the past to solve the problem of performing underwater image classification or HM.

Chapter 4 - Experiments and Tests. We describe the experiments and tests that were planned and the challenges that we faced performing them.

Chapter 5 - Implementation. We introduce our approach and the various phases of development, including data wrangling, image enhancement, a software tool for manual annotation and ML modelling.

Chapter 6 - Results. We share the results of the tests performed, making an exhausting comparison on the predictions obtained.

(29)

1.4. Thesis Outline 5 our opinion about the results achieved and prospects about future work.

(30)

(31)

Chapter 2 Background

In this chapter we briefly introduce the main concepts that will be talked about throughout this document. These include topics in study areas of sonar technologies, Computer Vision (CV) and Machine Learning (ML). We also introduce Neptus, a software tool designed by the Underwater Systems and Technology Laboratory (LSTS) to control, command and monitor the Autonomous Underwater Vehicles (AUVs) as well as analyze the data retrieved from them.

2.1 Sidescan Sonar

Side Scan Sonar (SSS) data is very useful for habitat mapping of vast regions as it can quickly map the ocean floor. However, the use of sound in spite of light introduces many distortions and errors to the resulting images as described here. Sonar systems can, also, be easily embedded in different types of Unmanned Underwater Vehicles (UUVs), such as Remotely Operated Vehicles (ROVs) andAUVs. These systems can be discriminated into three categories: single-beam, multibeam and SSS [3].

Single-beam sonar systems have low frequency sonar pulses (below 20 kHz), low pulse lengths (below 2 ms) and small seafloor coverage. Multibeam is an upgraded version of single-beam sonars which in this case emit multiple sound beams at different angles of incidence to the bottom allowing faster, more accurate and fine-grained bathymetry mappings (see Figure2.1).

SSS is a system that is used to get high resolution acoustic images of the seafloor and a near-visual representation of the geological characteristics [13]. It can be used, along side with single-beam and multibeam systems but, typically,SSSs are the best system to map large seafloor areas in search of small targets such as ship wrecks, pipeline or cables, since its capability of using higher frequencies improves image resolution [3]. All the three systems can be used in the same vehicle at the same time to improve performance.

(32)

8 Chapter 2. Background

Figure 2.1: Bathymetry map of the coast of Madeira, Portugal. Image retrieved by LSTS.

ASSSsystem is typically composed by two transducers (one on each side of theAUV) which can both emit and record sound pulses. The transducer (see Figure 2.2) emits acoustic pulses that propagate in the water at a certain angle (angle of incidence) until they hit the seafloor (slant-range). Then, the energy of the pulse that was reflected in a grazing angle (90o minus the angle of incidence), as echoes, are received by the transducers in a short amount of time (in the order of milliseconds). These echoes are then amplified and transmitted to a recorder which processes the signal and stores its position.

Given how aSSS works, let’s consider some characteristics of the imagery produced:

• Beam Pattern: also called nadir, the main axis of the beam is typically angled 10 to 20o _to

improve performance and noise reduction but leaves the image with a gap that does not retrieve any data.

• Resolution: short pulses (usually along-track) have higher frequencies which result in better resolution, on the contrary, long pulses (usually across-track) have lower frequencies that travel further and are less susceptible to noise.

• Backscatter: the energy that is scattered back to the sensor is used to calculate the slant-range and the location of the data processed by the recorder, which can affect the image if it has a lot of noise.

• Distortion: is caused by the vehicle movements (yaw, roll, pitch and heave), speed variations and instabilities (see Figure 2.3).

(33)

2.1. Sidescan Sonar 9

Figure 2.2: Functionality of a side scan sonar sensor. Taken from [3].

Figure 2.3: Possible rotations and movements of the vehicle that can affect the performance of the sonar. Taken from [3]

Figure 2.4is an example of a SSS image. The performance of theSSS can be influenced by the imaging, acoustic and transmission frequencies, the directivity of the sonar, sidelobes, spatial resolution, height/speed combinations, transmission power and the nadir[3].

(34)

Figure 2.4: Side scan sonar image of a coastal area with rock and sand environments.

2.2 LSTS Toolchain: Neptus

To control its vehicles, LSTS has produced several different software tools [11]. Neptus1 is the Command and Control software used to control and monitor fleets of unmanned vehicles. It’s a Java application, compatible with Linux and Windows systems and can be extended by third-parties through independent plug-ins. Neptus was designed to give adaptability and flexibility to the needs of an operator in field mission using one or more autonomous vehicles. In a mission, different and unexpected situations may occur and the operators need to address them. Neptus was created to support and surpass these situations, in a quick way [11].

Usually, a mission life-cycle passes through 3 different phases:

1. Planning phase: the operator prepares and analyzes the mission’s objectives and possible obstacles and starts preparing the mission plans to be used and performs simulations of them.

2. Execution phase: the operator prepares the vehicles for mission and plan execution. The operator then monitors the execution of the plans and may or may not adapt the plans in order to achieve the mission’s objectives.

3. Review and Analysis phase: can be done after the mission is concluded. The data retrieved by the vehicles is analyzed and processed to produce the desired mission outcomes.

Neptus’ User Interface (UI) provides several consoles in which the operators can create, send or execute plans and monitor the vehicles’ plan execution in geographic maps (either web-based tiles such as Google Maps and Open Street Map or cartographic charts) and also simulate the vehicle behaviors prior to actual execution. The other available interface in Neptus is the Mission & Review Analysis (MRA).

1

(35)

2.2. LSTS Toolchain: Neptus 11 The Neptus MRA is used to analyze the data retrieved by the vehicle in a mission being its interface organized in two parts [11]: an "explorer" on the left with the available data (charts, visualizations, tables and other data associated with the kind of mission that was deployed) and on the right we have the visualization of the selected option in the "explorer". Figures2.5 and2.6 are examples of data analysis of the Neptus MRA (SSSimages and a plot of a mission). Besides these features, Neptus MRA contains a number of data exporters that can be used to generate reports, plots and images as external files to use outside of Neptus.

Figure 2.5: SSS analyzer in Neptus MRA.

Figure 2.6: Visualization of a plot in Neptus MRA.

In this dissertation, Neptus plays an important part, as it is the tool that an operator interacts with the AUVs and retrieves data from habitat mapping missions.

(36)

2.3 Computer Vision

Computer Vision is the field of computer science that deals with the extraction of information from images. An important part of this thesis is to retrieve underwater optic images in order for marine biologists to annotate them, as we need ground truth to train aMLmodel. As underwater optic images may not always have good quality due to lack of light or turbidity of the water, we are usingCV image enhancement techniques to facilitate the annotation by marine biologists.

This section was written based on the Gonzalez and Woods’ book: Digital Image Processing [4], which helped us gain knowledge of commonly usedCVtechniques relevant to the process and enhancement of images. In the following sections, we are going to describe only the techniques that we used throughout this dissertation.

2.3.1 Gray Level Transformations

Gray level transformations are intensity transformations that are used to change the contrast or brightness of a gray image with gray pixel intensities in the interval [0, L − 1]. We say that a gray level transformation corresponds to:

g(x, y) = T [f (x, y)] (2.1)

where g(x, y) (which can also be designated by s) is the output intensity level of applying the transformation T with function f in the pixel positioned at (x, y) of the gray image. The value of

f (x, y) is designated the input intensity level of the pixel at the position (x, y) which we express

by r.

s = T (r) (2.2)

2.3.1.1 Power-Law Transformation

The power-law transformation applies a power function to the image. For this particular case, we adapt the transformation T in Equation2.2to a power function:

s = crγ (2.3)

where γ and c are positive constants. In Figure2.7, we can observe how the different values of γ can affect the output gray level.

(37)

2.3. Computer Vision 13

Figure 2.7: Power-Law transformation examples for different values of γ. Taken from [4].

Figure 2.8: Aerial image and subsequent power-law transformations with γ values of 3, 4 and 5 and c = 1. Taken from [4].

(38)

2.3.1.2 Contrast Stretching

This transformation increases the dynamic range of the image resulting in higher contrast and better perception to the human eye. In Figure2.9, we can see how the function works: points (r1, s1) and (r2, s2) are set to the minimum and maximum gray levels, respectively, which means

that (r₁, s1) = (rmin, 0) and (r2, s2) = (rmax, L − 1) and the gray levels are stretched maintaining

the same order without altering information about the image [4]. The equation that makes this calculation is:

s = (L − 1) r − rmin

rmax− rmin

(2.4) The only downside of this transformation is that if both rmin = 0 and rmax = L − 1 then the

transformation has no effect whatsoever. In these situations the user chooses the rmin and rmax

values. From the Equation 2.4:

s = (L − 1) r − 0

L − 1 − 0 = r (2.5)

Figure 2.9: Plot of the contrast stretching function. On the lower-right we have the contrast stretching applied to the upper-right image. Adapted from [4].

2.3.1.3 Histogram Equalization

The main objective of histogram equalization is to "spread" the histogram of an image, creating better contrast (see Figure 2.10). An image’s histogram contains all the gray-levels and the amount of pixels with that value. Formally, it can be written with the expression

(39)

2.3. Computer Vision 15

h(rk) = nk (2.6)

where k is between the range of gray levels [0, L − 1] [4] and n_k is the occurrence of the intensity level k in the image. For this particular technique the gray levels should be normalized in the interval [0, 1] to guarantee there are no values over 1 or under 0 after the transformation.

Figure 2.10: Histogram Equalization application example. Adapted from [4].

The probability of occurrence of a gray level k in an image with n number of pixels can be given by

pr(rk) = nk

n (2.7)

where n_k is the number of pixels with gray intensity k, implying that

sk= T (rk) = k X j=0 pr(rj) = k X j=0 nj n 0 ≤ k ≤ L − 1 (2.8)

This procedure results in a discrete histogram that is much wider than the original.

Although this method enhances visual contrast it does not work the same way as the previous method. With contrast stretching, all the gray levels would get applied a linear function and all of them would be mapped in the same order as the original image. Histogram equalization may not guarantee the original gray levels order therefore it can change the information of the image. Despite that, if our goal is only to enhance the image, this method is very effective and simple to implement.

2.3.2 Color Models

Color models are used to represent a pixel according to a coordinate system. The most used color model is the RGB (red, green, blue) color model as it is embedded in monitors and other hardware devices. This is due to the human’s eyes strong perception of these colors.

(40)

When it comes to color description, R.C. Gonzalez and R.E. Woods [4] affirm that the HSI (hue, saturation, intensity) color model is “(...) an ideal tool for developing image processing

algorithms based on color descriptions that are natural and intuitive to humans (...)”, which

implies that converting color models from RGB to HSI and vice versa can be a good idea.

2.3.2.1 Converting between RGB and HSI

Given an RGB image with its values normalized to the interval [0, 1] the corresponding equivalent image in the HSI color model can be obtained by applying the following equations:

H =    θ if B ≤ G 360◦− θ if B > G with θ = cos−1 ( − 1 2[(R − G) + (R − B)] [(R − G)2_{+ (R − B)(G − B)]}12 ) (2.9) S = 1 − 3 (R + G + B)[min(R, G, B)] (2.10) I = 1 3(R + G + B) (2.11)

Converting from HSI back to RGB is more complex as it depends on the value of hue. if 0◦ ≤ H ≤ 120◦ _then: B = I(1 − S) (2.12) R = I 1 + S cos H cos(60◦− H) (2.13) G = 3I − (R + B) (2.14) If 120◦≤ H ≤ 240◦ then: H = H − 120◦ (2.15) R = I(1 − S) (2.16) G = I 1 + S cos H cos(60◦_{− H)} (2.17) B = 3I − (R + G) (2.18)

(41)

2.3. Computer Vision 17 Lastly, if 240◦ ≤ H ≤ 360◦ then: H = H − 240◦ (2.19) G = I(1 − S) (2.20) B = I 1 + S cos H cos(60◦_{− H)} (2.21) R = 3I − (G + B) (2.22) 2.3.3 White Balance

White balance is a technique to perform color correction in images. We can find automatic color correction approaches in the literature with a wide variety of algorithms to be considered nowadays [14].

In this section, we are only presenting the Max White algorithm, since this was the algorithm that worked out the best for us. The Max White algorithm considers that the color white is represented with value (255, 255, 255) and computes the gain for each channel (red, green and blue) by the following [14]:

Rgain = 255/Rmax (2.23)

Ggain = 255/Gmax (2.24)

Bgain = 255/Bmax (2.25)

Afterwards, we multiply each pixel’s intensity level by its gain (in each respective channel) to obtain the image with color correction.

2.3.4 Shannon’s Entropy

Even though not directly related with CV, this concept from Information Theory can help quantify the amount of information in an image. The amount of information, in units, of a random event E can be given by the following:

I(E) = log − 1

(42)

In case the logarithmic function has a base equal to 2, then the unit of information can be represented as a bit, which allows to measure the amount of bits of information that E contains. Note that if P (E) = 1 or P (E) = 0, E has no information.

From this principle, it is possible to measure the amount of bits necessary to represent a stream of pixels, in case of an image. Assuming there is a finite set of symbols {s₁, s2, ..., sk},

which are the pixels of the image, then the average amount of bits necessary to represent the intensity of a pixel of that image can be obtained by the Shannon’s Entropy equation:

H(X) = − k X

i=1

P (si)log2P (si) (2.27)

From this, we can say that the Shannon’s Entropy calculates the average of chaos that the intensities of the pixels of an image have and affirm that an image has a higher entropy value when it has more noise, as it contributes to chaos in the image.

2.4 Machine Learning

Witten’s book on Data Mining [15] gives us an idea of what ML is being used for nowadays: decisions simulation, image surveillance, forecast and diagnosis predictions, marketing and sales, manufacturing processes and so on. All these applications have tremendous amount of data to analyze allowing for ML models to be a good option to automatize problem solving.

Due to the sheer amounts of information (images) that are to be processed in this project, we recur toML to automatically process information based on an initially supervised training. To explain MLwe are using the Goodfellow et al. book: Deep Learning [7].

A MLmodel solves problems in a formal and repeatable way making judgments about the information that is present in datasets by extracting patterns and acquiring knowledge from the data available and predicting future instances according to a task performed in previous observations [16]. Problems ofCV, such as face recognition [6], speech recognition [17], video recognition [18] and Natural Language Processing [19] can be solved using ML models, more specifically, using Deep Learning (DL) models.

2.4.1 Deep Learning

TheDLconcept started to gain popularity when the availability of large volumes of data started to be ubiquitous. DLmodels have the ability to interpret better grid-based data, when comparing toML models, but they require a vast amount of training data to be successful. By learning a hierarchy of concepts, where a concept derives from simpler concepts, a DLmodel learns about the representation of data and relationships between concepts. Graphically, it can be seen as a directed acyclic graph in which a deeper concept in the graph is related to the previous ones.

(43)

2.4. Machine Learning 19 This structure is designated by artificial neural networks or, to simplify, Neural Networks (NNs). In the scope of our dissertation, we are only describing feed-forward NNs, as this topology was the one we used.

2.4.1.1 Neural Networks

ANNtries to map an input x into a target y = f∗(x). It consists in a set of units, called neurons, and layers: one input and output layer and 0 or more hidden layers, as we can see in Figure 2.11.

Figure 2.11: Example of a feed-forward Neural Network. Taken from [5].

In each layer, neurons compute an activation function to x to preserve the non-linearity property, otherwise, the model can only solve linear problems and, therefore, could not learn the relationship between two variables:

ˆ

y = f (x; θ) = f3(f2(f1(x; θ))) (2.28)

where θ is the parameters used to compute the approximation and f1, f2 and f3 represent the function computed by the first, second and third layer of the network, respectively.

This function contains a non-linear transformation, represented by φ(x), and provides features of x so that the model can learn from new representations of x. The better the new representations of x, i.e. φ, the better are the relations learned by the model. Adapting Equation2.28, the model can be described as:

ˆ

y = f (x; θ, w) = φ(x; θ)Tw (2.29)

where φ(x; θ) defines a hidden layer and should be improved with an optimization algorithm to find the most suitable θ. The Stochastic Gradient Descent (SGD) is the optimization algorithm that is most commonly used (or based in) to perform DL. It uses a fixed learning rate that

(44)

decays until convergence of the SGD algorithm (when it achieves an optimal value for θ). w is the weight parameter associated to the connection of the neuron. This parameter is, initially, set to a random value and then is recalculated in the training phase after the computation of the back propagation algorithm.

The back-propagation algorithm computes the gradient from the cost function, which calculates the error from the probability calculated so far against the true value, computes the average of the errors (for each output) backwards through the network in order to update the weight values the SGD calculated when going forward.

After deciding the number of layers and neurons of theNNwe need to choose an optimizer algorithm (usuallySGD), the activation function, the cost function and the learning rate to train the model. To diminish the chance of the model to overfit, the Dropout technique [20] randomly drops neurons in the hidden layers when theNNis training, forcing it to fit to the correct ˆy with

less neurons. Early stopping the training of the model can also prevent overfitting by saving the parameters of the model when the validation set loss was at its minimum.

2.4.1.2 Convolutional Neural Networks

A Convolutional Neural Network (CNN) is aDLmodel that contains 1 or more convolutional layers followed by Fully-Connected Layers (FCLs) or dense layers (where each neuron is connected to the next layer’s neurons), as we can see in Figure 2.12, that excel at processing grid-like data, such as images.

Figure 2.12: A Convolutional Neural Network example. Taken from [6]. A convolution layer can be described in three stages (see Figure2.13):

1. Convolution stage: where the convolution operation (linear function) is applied;

2. Detector stage: where a nonlinearity function is applied (e.g. Rectified Linear Unit (ReLU));

3. Pooling stage: where the transformations (features) remain invariant to translations (e.g. edges of an object in an image).

(45)

2.4. Machine Learning 21

Figure 2.13: Description of a convolutional layer and its stages. Adapted from [7]. The convolution operation can be simply explained as an operation on two functions, for example x and w, at a time t providing a new function s:

s(t) = (x ∗ w)(t) (2.30)

where ∗ represents the convolution operation. x can be interpreted as the input of the convo-lutional layer and w the output, designated by kernel or feature map. Adapting Equation 2.30to a convolution operation that takes an image I as input and an output kernel K (see Figure 2.14):

S(i, j) = (K ∗ I)(i, j) =X

m X

n

I(i − m, j − n)K(m, n) (2.31)

Convolution provides advantages in learning by getting sparse interactions (kernels are smaller than the input image, which reduces the size of the output while retaining important features), parameter sharing (we can use the same functions to several parameters which allows for neighbourhood awareness) and equivariant representations (if the input transforms the output is transformed in the same way). For more details on this operation, Dumoulin and Visin [21] work provides a better understanding of the mathematical arithmetic of the convolution function.

Following the ReLU computation (detection stage), the pooling operation calculates the maximum, in case of max pooling, of the nearby outputs. This method learns to correlate positioning with the existence of a feature. If a feature is strong it is likely to be important to

(46)

Figure 2.14: Application of a convolution operation according to Equation2.31. Taken from [7].

learn from it, on the contrary, if a feature has a low output value then it should not be relevant to learn from it (invariance). The pooling operation improves statistical efficiency and reduces the size of inputs for the next layer, as we can see in the examples of Figures2.12 and2.15, this process is called downsampling or subsampling.

Figure 2.15: Example of a max pooling operation computation. Taken from [7].

After all convolutional layers are computed, the input of the first FCL must be flattened into a 1-D vector to go further in the net. From this stage the CNN behaves like a regular NNby having each neuron of a layer connected to every neuron of the next layer. In the final layer (output) the Softmax function is computed to calculate each classification probability (also known as confidence).

(47)

2.4. Machine Learning 23 As we stated previously in the beginning of Section 2.4, ML models, specificallyCNNs, have successfully solved many CV related problems and they are still being studied to find ways to increase their performance [22–25].

In this chapter, we explained how a SSS system works, described the tool LSTS uses to retrieve missions’ data, introduced simple CVmethods and techniques of image enhancement and specified how ML models work and have a primary part on solving representation tasks, such as image classification problems. In the next chapter, we are going to analyze the work done in the past in the Habitat Mapping (HM) field and describe the current state-of-the-art.

(48)

(49)

Chapter 3 Related Work

In this chapter we are going to present the related work and describe the evolution of Machine Learning (ML) based Habitat Mapping (HM).

Earlier in the 2000s, work on HM that used ML algorithms were mainly using Maximum Likelihood Classifiers (MLCs) in order to save time and reduce cost of performing such time-consuming (and sometimes tedious) task [26]. Many works followed this trend by experimenting other models, such as Kobler et al.[27] who wanted to extend an already developed system to use a Decision Tree (DT) algorithm to improve the system’s accuracy.

Both works presented above used satellite images to map habitats in the soil but not marine habitats. As we discussed in Section 2.1and according to Pandian et al. [28], sonar-based images provide more quality and better resolution about the underwater surroundings than optic images, where the light conditions are not adequate for good quality images, therefore (since the quality of the data fed into a ML can influence its accuracy) sonar systems work better to perform marine HM.

Purser et al. [29] work supports this statement. They used a Self-Organizing Map NN to classify coral habitats using the texture of video recording and their results were worst in comparison with a similar approach by Marsh and Brown [30], who used only sonar images.

The efforts from the Artificial Intelligence (AI) and Computer Vision (CV) communities to make a model to understand texture in the sonar imagery go back to the late 70s, though, for example, Pace and Dyer’s [31] or Czarnecki’s [32] works inspired others to follow and research this matter:

• Bourgeois and Walker[33] used Neural Networks (NNs) to interpret Side Scan Sonar (SSS) images;

• Shang and Brown[34] went further by creating a new NN that classified the principal feature patterns of the texture in SSSimages;

• Stewart et al.[35] classified the seafloor with great success using an NN with accuracy

(50)

26 Chapter 3. Related Work

approximately between 80 and 91 percent (depending on the seafloor class);

• Marsh and Brown [30] specifically used multibeam sensors to classify the seabed using features on the backscatter and bathymetry of the images with a Self-Organizing Map NN.

Although they had good results and reason to affirm that NNs have great potential to do texture analyses in images, they suffered with the problems of Deep Learning (DL) (lack of a big data volume). This evolution led scientists to validating their model’s accuracy with video sampling:

• Ierodiaconou et al. [36] used a DT to perform marine benthic habitat mapping. After that, they progressed to compare [37] the prediction performance betweenMLC and two distint DTs to classify biological communities. The results showed thatDTs were more accurate than the MLCmodel;

• Hasan et al. [38] evaluated four classification models for benthic habitat mapping: DT, Support Vector Machine (SVM), Random Forest (RF) andMLC; achieving better overall accuracy with theSVM model. Wahidin et al.’s work [39] conclude the same.

• Hasan et al. [40] used a RFmodel with features about bathymetry and backscatter data of multibeam sonar sensors to predict benthic habitats;

• Diesing et al. [41] mapped seabed sediments using an object-based image analysis, RFand geostatistics approaches achieving overall accuracy of the best model (RF) 75.56%;

• Henriques et al. [42] successfully mapped and classified 892 km of Arrábida’s Natural Park habitats, with the European Nature Information System (EUNIS) taxonomy, in Portugal southwest coast using sediment sampling with a single-beam echo-sounder using a hierarchical agglomerative cluster analysis (bottom-up) approach.

Different approaches also used unsupervised learning algorithms to discover patterns in the habitats, such as Calvert et al.’s [43] work but did not achieve great results unlike the very recent work (2018) of Vassallo et al. [44] that combined both learning methods to produce a unique solution, achieving 89% of overall accuracy when predicting 57 test samples into 5 clusters.

Berthold et al.’s [45] work on seabed sediment classification using Convolutional Neural Networks (CNNs) retrieved SSS imagery to predict the seabed sediment and use sediment sampling for ground truth. They used a patch-based approach with aCNN architecture already available (GoogLeNet [46]), cross-validation and data augmentation, which is a technique to generate more samples and improve the accuracy of the CNN when there are not many training samples (flips and rotations, in this case). The results achieved worked well for coarse, mixed and sand sediments (70%, 61% and 83% accuracy respectively), but it struggled to classify fine sediment (11% accuracy).

Gómez-Ríos et al. [47] also designed a solution to coral classification using CNNs. This solution achieved 98.2% accuracy. Also, we were inspired to adapt the solution found by the

(51)

27 Guirado et al.’s work [48] to classify marine habitats. They used a pre-trainedCNN (fine-tuned) to detect plant species with data augmentation.

Overall, we analyzed the related work and current state-of-the-art inHM concluding there has been a lot of work related with marine HM usingML models, from the earlier times where HM usedMLCs, DTs,SVMs orRF to the recent DLapproaches with CNNs. Comparing our approach to the ones described above, we are using CNNs to classify marine habitats with our ground truth method (which are the annotated optic image samples retrieved in-situ) with an existing architecture, similarly to [45, 47, 48]. We start by classifying optic images to get a large volume of labeled data, as CNNs require a large amount of data samples to achieve good results, before classifying SSS images. Although our solution is not innovative, to the best of our knowledge, no preceding work is classifying habitats images using CNNs and the EUNIS taxonomy.

In the next chapter, we are going to explain how we retrieved the data from the Northern Littoral Natural Park (PNLN), using Autonomous Underwater Vehicles (AUVs) and Neptus.

(52)

(53)

Chapter 4 Experiments and Tests

In this chapter, we present each step of the missions carried by Underwater Systems and Technology Laboratory (LSTS) to retrieve image data in the Northern Littoral Natural Park (PNLN), in Esposende, Portugal. These missions are fundamental as this is the process that can most affect the success of our Convolutional Neural Network (CNN) model. We, also, explain how the annotation process of the images was held by the marine biologists and how we tested the model with different images (not used in training).

4.1 Field Missions

Planning a mission for underwater exploration consists in having available operators and Autonomous Underwater Vehicles (AUVs) that contain sensors to retrieve the desired data from the chosen location. It is also strongly advised to deploy the mission in good weather and sea conditions, as well as low turbidity of the water (for camera surveys). According with these conditions, the retrieved data quality may vary and, in our case, the accuracy of theCNN model.

Typically theAUVs are deployed to the water from a small boat or from the shore. On the water, theAUVs are tele-operated away from obstacles, into a place where they can start the autonomous plan sent by the operator. This plan is designed by the operator, which establishes a Wi-Fi and/or an acoustic networks, tests and monitors the plan’s execution, using Neptus, until it is concluded. After plan’s conclusion, the operator can download the log from the AUV and analyze the plan’s results using the Neptus Mission & Review Analysis (MRA) interface.

Even if we assure all the conditions presented above, the mission can still be compromised by technical faults of the AUV. Examples of such faults can be: noise or distortion in imagery from uncalibrated compasses or misbehaving movements of the AUV; colisions with fish nets, tubes, rocks or ships sailing around the surveying area and broken hardware parts.

(54)

30 Chapter 4. Experiments and Tests

4.2 OMARE Missions

In scope of the Marine Observatory of Esposende (OMARE) project, LSTS deployed several missions with lightAUVs: Noptilus 1, 2 and 3 (see Figure4.1). Noptilus 1 and 2 retrieved sonar data while Noptilus 3 retrieved optic data.

Figure 4.1: Noptilus 2 and 3 LAUVs.

Arrangements were made in order to explore areas in thePNLN in Esposende, such as the boat (see Figure 4.2), good weather conditions, low water turbidity, Neptus operators that planned the surveys (see Figures 4.3 and4.4).

(55)

4.2. OMARE Missions 31 The plans from Figures4.3and4.4were designed in a Neptus console with the main objective to overlap the mapping of the camera survey with the Side Scan Sonar (SSS) survey to provide imagery of both types in the same location. This way, we guarantee that optic images will have one or more correspondingSSS images.

Figure 4.3: Example of a plan used to retrieve data in Esposende using Noptilus 1 and 3.

Figure 4.4: Example of a plan used to retrieve data in Esposende using Noptilus 1, 2 and 3. After the planning phase is complete, the AUVs are introduced in the water to begin the plans previously designed (see Figure4.5). Upon execution of all plans (or if it is low on battery) the light AUVs are retrieved from the water and the data retrieved can be downloaded from the vehicles, as we can observe in Figure 4.6, to begin the mission analysis.

(56)

32 Chapter 4. Experiments and Tests

Figure 4.5: AUV navigating at ocean surface.

Figure 4.6: Operator using Neptus to download the mission’s data from AUVs.

In the next chapter we are going to present our approach and explain the implementations towards geographic association between SSS and optic images to have ground truth, image processing and enhancement, software tool development for manual annotation and CNN design and training.

(57)

Chapter 5 Implementation

In this chapter, we describe the implementation done in order to accomplish the objectives mentioned in Section 1.3. We start by explain our approach, followed by our developments of data and image processing, presenting Hijack (a software tool developed for marine biologists to manually annotate images) and the deployment of Machine Learning (ML) prediction models.

5.1 Technical Approach

Most of the Autonomous Underwater Vehicles (AUVs) do not carry camera sensors, as sonar sensors are more useful for marine exploration (as we explained in Section2.1), which makes the task of performing Habitat Mapping (HM) using optic images a bit difficult, sinceML algorithms require vast amount of data samples to perform well.

To surpass this obstacle, we designed a solution that classified habitats using Side Scan Sonar (SSS) images instead. This process would only be possible if marine biologists give ground-truth about the habitats present in the optic images, as it is harder to perceive key features of a habitat inSSSimages. In this way, we trained a Convolutional Neural Network (CNN) to predict classifications of optic images using ground-truth of marine biologists annotation and associate the predictions of this model as ground-truth to another CNN that predicted SSSimages. The AUVs record the location of both image types (optic and sonar) and we can use that location to associate the habitat classification. That said, our approach can be described in 5 phases (see Figure 5.1):

1. Image retrieval and image enhancement; 2. Photo annotation by marine biologists; 3. Model training and performance estimation:

(a) Data splitting into training, validation and test sets (training and validation sets are used for training the model and the test set is used to evaluate the model);

(58)

34 Chapter 5. Implementation

(b) Fitting of train and validation sets on a fine-tuned CNN and prediction of test set; (c) Assessment of the prediction score of the model on the test set.

(d) Repeat if score is low by adding test samples to the training set; 4. Associate the optical images annotations with the SSS in the same location; 5. Model training:

(a) Data splitting into training, validation and test sets (training and validation sets are used for training the model and the test set is used to evaluate the model);

(b) Fitting of train and validation sets on a fine-tuned CNN and prediction of test set.

Figure 5.1: Representation of the phases of our approach.

The following sections describe how we implemented each phase of our approach. Sections5.2 and 5.3 represent phase 1, Section 5.4 represents phase 2, Section 5.5 represents phase 4 and Section 5.6represents phases 3 and 5.

5.2 Image Retrieval

In section2.2, we presented Neptus Mission & Review Analysis (MRA)and its features. When we load a log file into Neptus MRA, the log checks for video footage and SSS data. If these exist, they are indexed and available for visualization.

(59)

5.2. Image Retrieval 35 From the video footage it is possible to see that, sometimes, the seabed is not visible due to water turbidity or high altitude of theAUV. Obviously, we do not want to process these images, so we developed a new plug-in in Neptus MRA that would filter frames where theAUVstate contained high values of roll, pitch and altitude, since these values measure the distortion of the frame, for example when theAUV is diving, the pitch and altitude values for that operation is much higher than when the AUV is parallel to the sea bottom. The plug-in decoded the video footage available into frames and created a Comma-Separated Values (CSV) file to attach the location, state of the AUV and time of each frame that complied to a roll, pitch and altitude values introduced by the user (these values would differ depending on the water turbidity). If the frame was valid it would be copied to a new folder (see Listing 5.1, the full source code is available in GitHub1).

public class VideoToPhotosFilter implements MRAExporter{

// . . .

frameDecoder.load();

int numberOfFrames = frameDecoder.getFrameCount(); int numberOfPhotos = 0;

File dir = new File(source.getFile("mra"), "FilteredPhotos/Original");; File positions;

String header =

"filename,timestamp,latitude,longitude,altitude,roll,pitch,depth\n"; dir.mkdirs();

positions = new File(source.getFile("mra/FilteredPhotos/"),"positions.csv"); positions.createNewFile();

BufferedWriter bw = new BufferedWriter(new FileWriter(positions)); bw.write(header);

for(int i = 0; i < numberOfFrames; i++) { frameDecoder.seekToFrame(i);

VideoFrame frame = frameDecoder.getCurrentFrame(); double frameTimeStamp = (double) frame.getTimeStamp(); if(isValid(frameTimeStamp)) {

numberOfPhotos++;

File outputfile = new File(dir ,"frame" + numberOfPhotos + ".jpg"); ImageIO.write((RenderedImage) frame.getImage(), "JPG", outputfile); bw.write("frame" + numberOfPhotos + ".jpg," +

getPosition(frameTimeStamp) + "," + getVehicleData(frameTimeStamp) + "\n"); } } bw.close(); // . . . }

Listing 5.1: Video to frames exporter.

1

https://github.com/andrediegues/neptus/blob/feature/hmapping/plugins-dev/mjpeg/pt/lsts/neptus/ plugins/mjpeg/VideoToPhotosFilter.java

(60)

In case of SSS images, Neptus MRA already has a SSS image exporter that extracts the images onto a folder. To serve our cause, we developed a new plug-in (adapted from the existing one) that exported these images in gray and with slant correction (to avoid the presence of the nadir). Besides that, it would create a CSV file with the location and state of theAUV in subareas of the image, sinceSSS image covers much more area then the optic frames. Note that, in this case we do not need to filter the images by roll, pitch and depth as we are only using the SSS subimages that match (within a 5 meter range) the location of a frame. Listing 5.2details the implementation of this exporter (full source code available in GitHub2).

2

https://github.com/andrediegues/neptus/blob/feature/hmapping/plugins-dev/hmapping/pt/lsts/neptus/ plugins/hmapping/FilteredSidescan.java

(61)

5.3. Image Enhancement 37

public class FilteredSidescan implements MRAExporter {

// . . .

@Override

public String process(IMraLogGroup source, ProgressMonitor pmonitor) {

// . . .

for (SidescanLine l : lines) { numberOfLinesInImg++; applySlantCorrection(l);

if(numberOfLinesInImg \% (subImageHeight/2) == 0 && numberOfLinesInImg \% subImageHeight != 0) {

CorrectedPosition cp = new CorrectedPosition(source); double timestamp = l.getTimestampMillis() / 1000;

SystemPositionAndAttitude pos = cp.getPosition(timestamp); writeToPositionsFile(bw, l, filename, pos);

}

if (ypos >= height || time == end) {

ImageIO.write(img, "PNG", new File(out, filename+".png")); img.getGraphics().clearRect(0, 0, img.getWidth(), img.getHeight()); ypos = 0; image_num++; } tmp = l.getImage();

img.getGraphics().drawImage(tmp, 0, ypos, width-1, ypos+1, 0, 0, tmp.getWidth(), tmp.getHeight(), null);

ypos++; }

}

ImageIO.write(img, "PNG", new File(out, filename+".png")); ypos = 0;

bw.close();

return I18n.textf("\%num images were exported to \%path.", image_num, out.getAbsolutePath());

}

// . . .

}

Listing 5.2: Side Scan Sonar location exporter java program.

5.3 Image Enhancement

In Section 2.3, we described Computer Vision (CV) techniques to perform image enhancement. In this section, we are going to explain the implementation done regarding those techniques.

To perform image enhancement to theSSSimages, we used theSSSimage exporter described in Section5.2, where the user can introduce values of Time Variable Gain (TVG) and normalization. These parameters can distribute the exposure and the brightness of theSSS image (instead of being most bright where the signal’s reflection was stronger).

(62)

5.3.1 Optic Images

In order to make the annotation task, carried by the marine biologists, easier we implemented image enhancement techniques to the optic images.

All the implementation towards optic images was developed using the OpenCV library3 and the Python programming language. OpenCV is a very useful tool to perform CVtechniques as it has built-in methods, such as Histogram Equalization, color model conversion, etc. However, additional image processing algorithms were studied and implemented in the context of this thesis.

The listings5.3,5.4 and 5.5contain the code to perform various transformations that were not available in the OpenCV library, such as the Max white algorithm, contrast stretching, power law transformation, histogram equalization and contrast stretching to RGB images using the HSV color model.

In Chapter6we present the results of applying these transformations to the optic images.

3

(63)

5.3. Image Enhancement 39 def findmax(img): maxpixel = img[0,0] maxBlue = maxpixel[0] maxGreen = maxpixel[1] maxRed = maxpixel[2]

for i,j in it.product(range(img.shape[0]), range(img.shape[1])): bluePixel = img[i,j,0] greenPixel = img[i,j,1] redPixel = img[i,j,2] if bluePixel > maxBlue: maxBlue = img[i,j,0] if greenPixel > maxGreen: maxGreen = img[i,j,1] if redPixel > maxRed: maxRed = img[i,j,2] return [maxBlue,maxGreen,maxRed]

def rescale(pixel, white): val = pixel * white if _{pixel * white > 255:} return 255 return val def whitebalance(imgname): img = cv2.imread(imgname) newwhite = findmax(img)

scale = [255 / e for e in newwhite] b = img[:,:,0] g = img[:,:,1] r = img[:,:,2] hb = {} hg = {} hr = {}

for e in list(set(b.ravel())): hb[e] = rescale(e, scale[0]) for e in list(set(g.ravel())):

hg[e] = rescale(e, scale[1]) for e in list(set(r.ravel())):

hr[e] = rescale(e, scale[2]) newimg = img

for i,j in it.product(range(img.shape[0]), range(img.shape[1])): newimg[i,j] =

np.array([hb[img[i,j][0]],hg[img[i,j][1]],hr[img[i,j][2]]]) return newimg

(64)

def createHash(img, bot, top): hashmap = {}

for i,j in it.product(range(img.shape[0]), range(img.shape[1])): if img[i,j] not in hashmap:

if img[i,j] <= bot: hashmap[img[i,j]] = 0 elif img[i,j] > top: hashmap[img[i,j]] = 255

else_{: hashmap[img[i,j]] = 255 * ((img[i,j] - bot) / (top - bot))} return hashmap

def linearStretch(img): bot = min(img.ravel()) top = max(img.ravel())

hashmap = createHash(img, bot, top) img2 = img

for i, j in it.product(range(img2.shape[0]), range(img2.shape[1])): img2[i,j] = hashmap[img2[i,j]]

return img2

def powerStretch(img): img2 = img

img2 = img2/255.0

im_power_law_transformation = cv2.pow(img2,2.5) im_power_law_transformation *= 255

im_power_law_transformation = im_power_law_transformation.astype(’uint8’) return im_power_law_transformation

Listing 5.4: Contrast stretching and Power Law tranformation implementation in Python.

5.3.2 Measuring Information of images

The scikit-image library4 for Python already has built-in a Shannon’s Entropy function that returns the number of bits (on average) the pixel of an image needs in order to be represented. This was used to filter images after enhancement that had too much noise. For each transformation, an image can gain noise and noise in an image is considered to be more chaos in the image, so an image with noise has a higher Shannon’s Entropy value than itself without transformations. With this, we kept the images that contained less or equal value of 75% of distribution of the entropy to avoid noisy images. We chose the 75% boundary as, upon experimentation of different values, this threshold cleaned up almost every noisy image. Listing5.6 is the code used to do this procedure. Also, the images with a lot of noise are also the ones that “escaped” the first filter described in Section 5.2.

4

Automatic Habitat Mapping using Machine Learning

Automatic

Habitat Mapping

using

Machine Learning

André Filipe Costa Diegues

Mestrado Integrado em Engenharia de Redes e

Sistemas Informáticos

Orientador

Coorientador

Abstract

Resumo

Acknowledgements

Contents

List of Tables

List of Figures

Listings

Acronyms

Chapter 1

Introduction

1.1

LSTS

1.2

OMARE Project

1.3

Objectives and Contributions

1.4

Thesis Outline

Chapter 2

Background

2.1

Sidescan Sonar

2.2

LSTS Toolchain: Neptus

2.3

Computer Vision

2.4

Machine Learning

Chapter 3

Related Work

Chapter 4

Experiments and Tests

4.1

Field Missions

4.2

OMARE Missions

Chapter 5

Implementation

5.1

Technical Approach

5.2

Image Retrieval

5.3

Image Enhancement