A 2 1/2 D visual controller for autonomous underwater vehicle

(1)

A 2

1 /

2 D Visual Controller for Autonomous

Underwater Vehicle

Brazil

May, 2017

(2)

A 2

1

_/

₂

_{D Visual Controller for Autonomous Underwater}

Vehicle

Presented to the Master’s Program in Elec-trical Engineering of the Federal University of Bahia in partial fulﬁllment of the require-ments for the degree of Master of Science in Electrical Engineering.

Federal University of Bahia - UFBA Master’s Program in Electrical Engineering

Supervisor: Prof. André Gustavo Scolari Conceição

Co-supervisor: Dr. Jan Christian Albiez

Brazil

May, 2017

(3)

Cesar, Diego Brito dos Santos

A 2 1/2D Visual Controller for Autonomous Underwater Vehicle / Diego Brito dos Santos Cesar. -- Salvador, 2017. 107 f. : il

Orientador: André Gustavo Scolari Conceição. Coorientador: Jan Christian Albiez.

Dissertação (Mestrado - Programa de Pós-Graduação em Engenharia Elétrica) -- Universidade Federal da Bahia, Universidade Federal da Bahia, Escola Politécnica, 2017.

1. Controle Servo Visual. 2. Marco Fiducial Artificial. 3. Veículo Autônomo Submarino. I. Conceição, André Gustavo Scolari. II. Albiez, Jan Christian. III. Título.

(4)

(5)

(6)

As this work comes to an end I realize just how many people helped directly or indirectly to make it possible. I would like to ﬁrstly thank my parents, Rose and Cesar, that did their best to give me the conditions to go further and become a better person and a better professional. I thank my brother, Gabriel for the company all over these years.

I give thanks to my wife and partner of all moments, good or bad, Laila, the one that heard my moans, suﬀered each suﬀering and vibrated with each small success during this journey. I’ll be eternally grateful.

To my supervisors André Scolari and Jan Albiez, that encouraged me to do my best and guided me based on their experience and knowledge.

To the whole BIR and the FlatFish team, people that I have been spending more time with than any other in these last years. Thanks for making my days pleasant. A special thanks to my friends Gustavo, for helping me so many times, and Geovane, for supporting me and encouraging me towards this goal. Marco Reis, thanks for believing in my potential and giving me the opportunity to be part of this amazing group.

Thanks to all colleagues from DFKI that somehow contributed to this work. Special thanks to Christopher Gaudig, for the goodwill and support with the experiments on the DFKI’s facilities.

To my friends Pedro Xavier, João Britto, Rafael Saback, Danilo Farias and Livia Assunção, for their support, encouragement, help and empathy during this whole process.

Last but not least, thanks to Shell for the ﬁnancial founding of the FlatFish Project via the Brazilian Industrial Research and Innovation Corporation (EMBRAPII) and the Brazilian National Agency of Petroleum (ANP).

(7)

is not ’Eureka!’ but ’That’s funny...’” Isaac Asimov

(8)

Navegação submarina é afetada pela falta de GPS, devido à atenuação de ondas eletromag-néticas. Por causa disso, os robôs submarinos baseiam-se em sistemas de navegação via odometria e sensores inerciais. Contudo, a localização via esse tipo de abordagem possui uma incerteza associada que cresce com o passar do tempo. Por isso sensores visuais e acústicos são utilizados para aumentar a precisão da navegação de veículos submarinos. Nesse contexto, a utilização de um controlador visual aumenta a precisão dos sistemas robóticos quando se locomovem em relação a um objeto alvo. Esse tipo de precisão é requerida para manipulação de objetos, inspeção, monitoramento e docagem submarina. Esse trabalho tem como objetivo projetar e avaliar um controlador visual híbrido para um veículo submarino autônomo (AUV) utilizando como referência marcos visuais artificiais. Os marcos artificiais são alvos planares projetados para serem facilmente detectados por sistemas de visão computacional, sendo capazes de fornecer meios para estimação da posição do robô em relação ao marco. As suas características de alta taxa de detecção e baixa taxa de falsos positivo são desejáveis para tarefas de controle servo visual. Este trabalho analisou, portanto, dentre os marcos mais populares e de código aberto, aquele que apresenta o melhor desempenho em ambientes submarinos, em termos de taxa de detecção, número de falsos positivos, máxima distância e ângulo para detecção. Posteriormente, o marco que apresentou melhor performance foi utilizado para aplicação de controle visual em um robô submarino. Os primeiros ensaios foram realizados na plataforma de simulação robótica Gazebo e, posteriormente, em um protótipo de AUV real, o FlatFish. Testes em um tanque de água salgada foram realizados visando avaliar a solução proposta utilizando um ganho estático e um ganho adaptativo para o controlador visual. Finalmente, testes no mar foram realizados utilizando o controlador que apresentou os melhores resultados no ambiente controlado, a fim de verificar seu desempenho em um ambiente real. Os testes mostraram que o controlador visual foi capaz de manter o veículo em frente aos marcos visuais artificiais e que o ganho adaptativo trouxe vantagens, principalmente por suavizar a movimentação do robô no início da missão.

Palavras-chave: Controle Servo Visual, Marco Fiducial Artiﬁcial, Veículo Autônomo Submarino

(9)

Underwater navigation is affected by the lack of GPS due to the attenuation of the electromagnetic signals. Thereby, underwater robots rely on dead reckoning as their main navigation systems. However, localization via dead-reckoning raises uncertainties over time. Consequently, visual and acoustic sensors have been used to increase accuracy in robotic systems navigation, specially when they move in relation to a target object. This level of precision is required, for instance, for object manipulation, inspection, monitoring and docking. This work aims to develop and assess a hybrid visual controller for an autonomous underwater vehicle (AUV) using artificial fiducial markers as reference. Artificial fiducial markers are planar targets, designed to be easily detected by computer vision systems and provide means to estimate the robot’s pose in respect to the marker. They usually have high detection rate and low false positive rate, which are desirable for visual servoing tasks. On this master thesis was evaluated, from among the most popular and open-source marker systems, one that presents the best performance in underwater environments in terms of detection rate, false positives rate, maximum distance and angle for successful detection. Afterwards, the best marker was used for visual servoing purposes in an underwater robot. The firsts experiments were performed on the Gazebo robot simulation environment and, after that, on a real prototype, the FlatFish. Tests on a saltwater tank were performed in order to assess the controller using static and adaptive gains. Finally, sea trials were performed, using the controller that best behaved on the controlled environment in order to assess its performance on a real environment. The tests have shown that the visual controller was able of station-keeping in front of an artificial fiducial marker. Additionally, it was also seen that the adaptive gain brings improvements, mainly because it smooths the robot’s motion on the beginning of the task.

(10)

Figure 1 – Mobile robots . . . 17

Figure 2 – Typical Underwater Vehicles . . . 18

Figure 3 – Diﬀerent marker systems . . . 20

Figure 4 – Classiﬁcation of diﬀerent underwater vehicles . . . 22

Figure 5 – Remotely Operated Vehicles . . . 23

Figure 6 – Commercial Mobile robots . . . 24

Figure 7 – Images of POODLE, the ﬁrst ROV . . . 25

Figure 8 – Early ROVs . . . 26

Figure 9 – SPURV . . . 27

Figure 10 – AUSS . . . 27

Figure 11 – Hugin AUV . . . 27

Figure 12 – FlatFish AUV . . . 28

Figure 13 – FlatFish thruster conﬁguration . . . 30

Figure 14 – Representation of the three components of light received by the camera 34 Figure 15 – Color absorption by function of distance . . . 34

Figure 16 – Picture of a miniaturized parking lot taken with a camera close to the objects. . . 35

Figure 17 – Pinhole camera geometry . . . 36

Figure 18 – Diagram showing the position of the pixel in the ideal case and the eﬀect of the radial distortion (δr) and tangential distortion (δt) . . . . 38

Figure 19 – Studied marker systems . . . 40

Figure 20 – Structure of dynamic look-and-move IBVS . . . 42

Figure 21 – Structure of dynamic look-and-move PBVS . . . 42

Figure 22 – Example of adaptive curve using λ(0) = 0.2, λ(∞) = 0.1 and ˙λ(0) = 0.05 50 Figure 23 – Camera head mounted on a 3D gantry robot inside the black tank . . . 55

Figure 24 – Placement of markers on the support structure. The ﬁrst row features an AprilTags and an ARToolKit marker. In the second row the ArUco marker is followed by smaller size versions of all markers. . . 56

Figure 25 – AprilTags at diﬀerent turbidity levels in a deep sea lighting scenario . . 56

Figure 26 – Angle steps used for the maximum detectable angle test . . . 57

Figure 27 – Smallest detectable size (in pixels) for ArUco, AprilTags and ARToolKit libraries in a shallow sea scenario at diﬀerent turbidity levels. . . 58

Figure 28 – Smallest detectable size (in pixels) for ArUco, AprilTags and ARToolKit libraries in a deep sea scenario at diﬀerent turbidity levels. . . 59

Figure 29 – The largest camera-to-marker distance for successful detection at diﬀer-ent turbidity levels for the shallow sea lighting scenario . . . 60

(11)

ent turbidity levels for the deep sea lighting scenario . . . 60

Figure 31 – Maximum detected angle for the three libraries in a shallow sea lighting scenario at diﬀerent turbidity levels . . . 60

Figure 32 – Maximum detected angle for the three libraries in a deep sea lighting scenario at diﬀerent turbidity levels . . . 61

Figure 33 – The detection time for ArUco, AprilTags and ARToolKit . . . 61

Figure 34 – FlatFish inspecting a shipwreck on Gazebo . . . 63

Figure 35 – Starting and ending position of the vehicle in the world [(a) e (b)] and starting and ending marker positions on the image plane [(c) e (d)] . . 64

Figure 36 – Reference following without oceans current eﬀects . . . 65

Figure 37 – Pixels trajectory on the image plan when oceans current are not present 66 Figure 38 – Reference following with oceans current eﬀects . . . 66

Figure 39 – Pixels trajectory on the image plan in the presence of oceans current . 67 Figure 40 – Maritime Exploration Hall - DFKI . . . 68

Figure 41 – Setup scenario on the DFKI’s big basin . . . 69

Figure 42 – Cascade PID diagram . . . 70

Figure 43 – Visual control chain diagram . . . 70

Figure 44 – Coordinate systems from FlatFish’s body, camera and marker . . . 71

Figure 45 – Vehicle’s camera detecting the Apriltag marker at two diﬀerent distances on the tank . . . 73

Figure 46 – Controller performance in surge during setpoint changing in surge (case 1). . . 73

Figure 47 – Controller performance in sway during setpoint changing in surge (case 1). . . 74

Figure 48 – Controller performance in heave during setpoint changing in surge (case 1). . . 74

Figure 49 – Controller performance in yaw during setpoint changing in surge (case 1). 75 Figure 50 – Pixel trajectory on the image plane during motion in surge (case 1). . . 76

Figure 51 – Controller performance in surge during setpoint changing in surge (case 2). . . 76

Figure 52 – Controller performance in sway during setpoint changing in surge (case 2). . . 77

Figure 53 – Controller performance in heave during setpoint changing in surge (case 2). . . 78

Figure 54 – Controller performance in yaw during setpoint changing in surge (case 2). 78 Figure 55 – Pixel trajectory on the image plane during motion in surge (case 2). . . 79

Figure 56 – Controller performance in sway during setpoint changing in sway (case 3). 79 Figure 57 – Controller performance in surge during setpoint changing in sway (case 3). . . 80

(12)

3). . . 80 Figure 59 – Controller performance in yaw during setpoint changing in sway (case 3). 81 Figure 60 – Pixel trajectory on the image plane during motion in sway (case 3). . . 81 Figure 61 – Controller performance in heave during setpoint changing in heave (case

4). . . 82 Figure 62 – Controller performance in surge during setpoint changing in heave (case

4). . . 83 Figure 63 – Controller performance in sway during setpoint changing in heave (case

4). . . 83 Figure 64 – Controller performance in yaw during setpoint changing in heave (case 4). 84 Figure 65 – Pixel trajectory on the image plane during motion in heave (case 4). . . 84 Figure 66 – Controller performance in yaw during setpoint changing in yaw (case 5). 85 Figure 67 – Controller performance in surge during setpoint changing in yaw (case 5). 85 Figure 68 – Controller performance in sway during setpoint changing in yaw (case 5). 86 Figure 69 – Displacement of sway when performing Yaw control . . . 86 Figure 70 – Controller performance in heave during setpoint changing in yaw (case 5). 87 Figure 71 – Pixel trajectory on the image plane during motion in yaw (case 5). . . 87 Figure 72 – Brazilian experiments location (Google Maps) . . . 88 Figure 73 – FlatFish AUV in Todos os Santos bay . . . 89 Figure 74 – Apriltag marker ID=5 placed on metal pedestal . . . 89 Figure 75 – Vehicle’s camera detecting the Apriltag marker at two diﬀerent distances. 91 Figure 76 – Surge controller performance in the sea during several setpoint changes 91 Figure 77 – Sway controller performance in the sea during several setpoint changes 92 Figure 78 – Heave controller performance in the sea during several setpoint changes 93 Figure 79 – Yaw controller performance in the sea during several setpoint changes . 93 Figure 80 – Pixel trajectory on the image plane during the entire mission in the sea 94 Figure 81 – Surge controller performance on interval [770-900] seconds during

mis-sion in the sea . . . 95 Figure 82 – Sway controller performance on interval [770-900] seconds during mission

in the sea . . . 95 Figure 83 – Heave controller performance on interval [770-900] seconds during

mis-sion in the sea . . . 96 Figure 84 – Yaw controller performance on interval [770-900] seconds during mission

in the sea . . . 96 Figure 85 – Pixel trajectory on the image plane on the interval [770-900] seconds

(13)

Table 1 – FlatFish Speciﬁcations . . . 29

Table 2 – Thrust Conﬁguration Matrix . . . 30

Table 3 – Model parameters of AUV FlatFish . . . 32

Table 4 – Gains for the simulated visual servoing task . . . 64

Table 5 – PID Coeﬃcients . . . 71

Table 6 – Setpoint for the test cases scenarios on the basin . . . 72

(14)

ANP Brazilian National Agency of Petroleum AUSS Advanced Unmanned Search System AUV Autonomous Underwater Vehicle CTD Conductivity, Temperature and Depth CURV Cable-controlled Undersea Recovery Vehicles DFKI German Research Center for Artiﬁcial Intelligence DVL Doppler Velocity Logger

EMBRAPII Brazilian Industrial Research and Innovation Corporation HVS Hybrid Visual Servoing

IBVS Image Based Visual Servoing IMU Inertial Measurement Unit LBL Long Baseline

PBVS Position Based Visual Servoing PID Proportional–Integral–Derivative PnP Perspective-n-Point

RIC Robotics Innovation Center ROCK Robotic Construction Kit ROV Remotely Operated Vehicle

SPURV Special Purpose Underwater Research Vehicle TCM Thrust Conﬁguration Matrix

UAV Unmanned Aerial Vehicle USBL Ultra-short Baseline Visp Visual Servoing Platform

(15)

λ Greek letter Lambda

τ Greek letter Tau

η Greek letter Eta

ρ Greek letter Rho

∈ Math symbol "element of" ∀ Math symbol "for all"

(16)

1 INTRODUCTION . . . 17

1.1 Motivation . . . 17

1.1.1 Vision-based control and underwater vehicles . . . 18

1.2 General Objective . . . 20

1.3 Specific Objectives . . . 20

1.4 Thesis Outline . . . 21

2 AUTONOMOUS UNDERWATER VEHICLES . . . 22

2.1 Autonomous Underwater Vehicles . . . 22

2.1.1 Underwater Vehicles . . . 22

2.1.2 Historical Context. . . 24

2.2 FlatFish . . . 28

2.2.1 Sensors and Actuators . . . 29

2.2.1.1 Navigation System . . . 29 2.2.1.2 Inspection System . . . 30 2.2.1.3 Propellers . . . 30 2.2.2 Vehicle Model . . . 31 2.2.3 Software Layer . . . 32 3 COMPUTER VISION . . . 33

3.1 Underwater Image Processing . . . 33

3.2 Perspective Transform . . . 35

3.2.1 Pinhole camera model . . . 35

3.2.2 Distortion parameters . . . 37

3.2.3 Camera Calibration . . . 38

3.2.4 Pose Estimation with Single Camera . . . 38

3.3 Artificial Fiducial Markers . . . 39

3.3.1 The evaluated marker systems . . . 40

4 VISUAL CONTROLLER . . . 41

4.1 IBVS . . . 44

4.1.1 Interaction Matrix for Points . . . 44

4.1.2 Estimation of Interaction Matrix - ˆLs . . . 46

4.2 PBVS . . . 46

4.3 Hybrid Visual Servoing - 2 1_/₂_D_{. . . 48}

(17)

4.5 Stability Analysis . . . 50

4.5.1 Stability for PBVS . . . 52

4.5.2 Stability for IBVS . . . 52

4.5.3 Stability of 21_/₂_{D visual servoing} _{. . . 53}

5 EVALUATION OF FIDUCIAL MARKERS . . . 54

5.1 Experimental Setup . . . 54

5.2 Methodology . . . 55

5.3 Results and Discussion . . . 58

5.4 Conclusions . . . 62

6 GAZEBO SIMULATION . . . 63

6.1 Experimental Setup and Methodology. . . 63

6.2 Results . . . 64

6.3 Conclusions . . . 67

7 EXPERIMENTS WITH THE REAL VEHICLE . . . 68

7.1 Experiments in the big basin . . . 68

7.1.1 Experimental Setup . . . 68

7.1.1.1 Control Chain . . . 69

7.1.2 Methodology . . . 72

7.1.3 Results and Discussion . . . 72

7.1.3.1 Case 1 - Surge . . . 73 7.1.3.2 Case 2 - Surge . . . 75 7.1.3.3 Case 3 - Sway . . . 77 7.1.3.4 Case 4 - Heave . . . 82 7.1.3.5 Case 5 - Yaw. . . 85 7.1.4 Conclusions . . . 88

7.2 Experiments in the Sea . . . 88

7.2.1 Experimental Setup . . . 88

7.2.2 Methodology . . . 89

7.2.3 Results and Discussion . . . 90

7.2.3.1 Entire servoing mission . . . 90

7.2.3.2 Interval Analysis . . . 94

7.2.4 Conclusions . . . 94

8 FINAL CONSIDERATIONS . . . 98

(18)

1 Introduction

This chapter presents the importance of subsea exploration and the potential of unmanned underwater vehicles for this task. It also presents the limitations of current underwater robots and shows how computer vision and vision-based controlled vehicles have overcome real problems. The objectives of this work are also presented. Finally, the structure of this master thesis is shown.

1.1 Motivation

The introducing of robots in the manufacturing industries in the 1960s revolutionized the way human beings produced their products. The ability to perform repetitious tasks with high accuracy and speed have increased industry productivity, reduced costs and increased product quality. Nowadays, manipulator robots are widely used in many sectors of industries in manners that make it now unthinkable for company of such segments to remain competitive and survive without them.

Although still far from the same technological maturation of the industrial robots, currently we are witnessing an enormous progress of mobile robots. The advances on sensory equipment, computing power as well as signal processing and artiﬁcial intelligence have enabled actual applications of mobile robots as well as revealed the potential for execution of more complex tasks. Mobile robots have been applied to replace humans in dangerous situations, to extend human perception and to actuate on inhospitable areas, e.g., analysis of surface materials on Mars [1], search and rescue victims of natural disasters [2], observation of active volcanoes [3] and inspection in power transmission lines [4]. Figure 1 shows some examples of spacial, terrestrial and aerial mobile robots.

(a) Sojourner [5] (b) Aeryon Scout [6] (c) Asimo [7]

(19)

Moreover, mobile robots have also been used in underwater environments to perform tasks previously done by divers as well as to enable operations on lower depths than those humans can tolerate. Underwater vehicles are mainly classiﬁed either as Remotely Operated Vehicle (ROV) and Autonomous Underwater Vehicle (AUV). The former is a tethered vehicle guided by a pilot, which relies on information coming from sensors such as sonars and cameras to control the movement of the vehicle. On the other hand, the latter performs missions without human intervention using internal and external positioning sources to localize itself and navigate in the environment. In Figure 2 some underwater vehicles are shown.

(a) ROV Nexxus [8] (b) ROV Magnum [9]

(c) AUV Leng [10] (d) AUV Sabertooth [11]

Figure 2 – Typical Underwater Vehicles

Underwater robots have been used for inspection of man-made submarine structures such as pipelines, cables, columns of oﬀshore platforms [12]; localizing and grasping objects [13]; seaﬂoor mapping; maintenance and surveillance of observatories, sampling of marine chemistry, geology and biology [14].

1.1.1 Vision-based control and underwater vehicles

The absence of GPS in underwater environments prompts most of the underwater vehicles to rely on visual and acoustic sensors for navigation. Acoustics sensors are not aﬀected by visibility and light conditions. However, they are expensive, operate with a low update frequency and their weight might not be suitable for certain AUVs. On the other hand, visual sensors such as cameras are inexpensive and standard equipment in most underwater vehicles, providing rich-information at a high update rate in addition to being a passive sensor [15]. Moreover, cameras have a higher resolution than acoustic sensors [16], which allow more accuracy on the retrieved data. Thus, the visual sensor has

(20)

been applied for estimation of abundances of marine organisms [17], 3D reconstruction [18], failure detection in hydroelectric dams [19], simultaneous localization and mapping (SLAM) [20] and visual servoing [21, 22, 23].

Visual servoing or vision-based control is a control scheme that uses visual infor-mation to control the motion of a robot. It is a multidisciplinary ﬁeld that covers the knowledge of areas such as kinematics, dynamics, image processing and control theory [24]. Visual servoing mimics the way that humans use their visual system to perform simple actions such as grabbing objects, positioning the body to enter through a door or seating on a chair. On mobile robots it allows motion in a local target-referenced frame, thus providing higher accuracy when compared to inertial navigation, and enables closer target-relative movements.

Although visual servoing was developed ﬁrst for manufacturing robots, it has been largely used in mobile robots such as for automatic landing of Unmanned Aerial Vehicle (UAV) [25]; for accuracy enhancement of medicinal micro-robots [26] and in assisting pilots of remotely operated spacecrafts on satellite maintenance [27]. In underwater environments, the displacement relative to the target is desired in tasks such as docking [28], pipeline following [29] and structure inspection [30]. Moreover, underwater visual servoing has been used for helping robot inspection and 3D reconstruction [18], assisting ROV pilots [31] and valves manipulation [32].

Visual servoing controllers are classiﬁed in two forms according to their input. Image Based Visual Servoing (IBVS) receives input on the image domain (pixel coordinates), so it is also called 2D Visual Servoing. Position Based Visual Servoing (PBVS) receives input on the Cartesian space and so it is also referred as 3D Visual Controllers. While IBVS is robust against camera calibration and more computationally eﬃcient, it has singularities that cause the system to be locally asymptotically stable. On the other hand, the global asymptotic stability is proved for PBVS. However, PBVS requires an additional computational step to estimate the camera pose and it demands previous knowledge of camera calibration parameters and 3D model of target object. Hybrid Visual Servoing (HVS) or 21_/₂_D _{visual controller is a controller that combines the advantages of both}

strategies while avoiding its drawbacks [33].

Regarding the image processing layer, visual servoing relies on the extraction of visual features such as points, lines or ellipses. Those are either the input of the controllers (IBVS) or used to compute the camera pose to feed the controller (PBVS). In that sense, this work aims to take advantage of the artificial fiducial markers for feature extraction. Artificial fiducial markers are similar to QRcode, having encoded on their pattern an unique ID and being especially designed to be easily detected by computer vision systems. In Figure 3 some fiducial marker systems are shown.

(21)

Figure 3 – Diﬀerent marker systems [34]

false-positive and false-negative rates. The use of fiducial markers is suitable when object recognition or pose determination is needed with high reliability and when the environment can be modified to affix them [34]. These characteristics are desired for control tasks, which motivates its use on this work. Examples of possible applications are in a docking station for AUV and structures of oil and gas industries. Few works have used planar target to visual servoing in underwater robots [35, 22], but to the best of the author’s knowledge, artificial fiducial markers have not been used for visual servoing in underwater environments.

1.2 General Objective

The main objective of this work is to develop and assess the performance of an 21_/₂_D _{Visual Controller when controlling an autonomous underwater vehicle using the}

information provided by a single camera detecting a ﬁducial marker in a real environment.

1.3 Specific Objectives

• Assay the particularity of visual servoing tasks in Autonomous Underwater Vehicles; • Assay the performance of artiﬁcial ﬁducial markers systems in underwater

environ-ments;

• Define among the known open-source artificial fiducial markers systems the one that best suits underwater applications;

• Analyze the performance of a hybrid approach of visual servoing on an simulated environment;

• Evaluate the performance diﬀerence of using adaptive gains in relationship with static gains;

(22)

• Validate the results on the real vehicle in a big basin;

• Evaluate the performance of the proposed controller in the ocean; • Investigate the challenges of the application in a real environment.

1.4 Thesis Outline

This document is composed of this chapter, that presents a brief history of robotics, the evolution of visual servoing tasks from manipulators to mobile robot, the motivation, and goals of this work.

The Chapter 2 describes the particularities of autonomous underwater vehicles and introduces the FlatFish AUV, the robot used for the experiments of this work.

The Chapter 3 introduces the basic concepts of image processing, the challenges of image processing in underwater environments and a bibliographic review of artiﬁcial ﬁducial markers.

The Chapter 4 is dedicated to presenting the background related to visual servoing in addition to details of the theory behind the diﬀerent classiﬁcations of visual controllers. The chapters 5, 6 and 7 are dedicated to presenting the methodology, results and discussion of the performed experiments.

The Chapter 5 shows an evaluation of artiﬁcial ﬁducial markers in underwater environments.

The Chapter 6 shows the AUV performing visual servoing on a simulated environ-ment.

The Chapter 7 details the visual servoing experiments in the real vehicle both in a basin and at the sea.

Finally, the Chapter 8 points the conclusion of the work and suggestions for future works.

(23)

2 Autonomous Underwater Vehicles

This chapter focuses on the aspects of autonomous underwater vehicles. A historical panorama of mobile robots is presented aiming to contextualize the development and applications of underwater vehicles. The state of the art of this technology is also introduced. A section is reserved to explain the dynamic model of an autonomous underwater vehicle. The remainder of the chapter introduces the AUV of interest in this work, the FlatFish AUV, with hardware details and characteristics of the embedded software.

2.1 Autonomous Underwater Vehicles

2.1.1 Underwater Vehicles

Underwater vehicles have been developed in diﬀerent sizes for diﬀerent purposes. Figure 4 illustrates the main categories of this kind of vehicles.

Figure 4 – Classiﬁcation of diﬀerent underwater vehicles [36]

The main division concerns whether the vehicle is able to carry people or not. The ones who lift humans are classified as manned underwater vehicles, while the ones who do not are classified as unmanned underwater vehicles. In all manned vehicles, human lives are at risk, so the construction must be very robust. Additionally, periodic inspections and maintenance by trained and qualified people becomes indispensable. Thus, manned vehicles are highly costly to build, operate and maintain [36]. On the other hand, unmanned underwater vehicles do not bring any risk to humans and therefore part of the costs is reduced. Moreover, these vehicles permit operation at higher depths, for extended periods and require less people involved, which reduces the operational costs. Such vehicles are either controlled via remote control or operate on their own.

(24)

Remotely operated vehicles are the most common unmanned underwater vehicle and their operation relies on a pilot that uses the vehicle sensors, such as camera, sonar and depth to make decisions on how to move and send commands to the robot via a cable, usually called tether or umbilical. ROVs size varies from small enough to be carried by one person up to the size of a garage, weighing some tons (see Figure 5). ROVs applications include scientiﬁc research, educational uses, and inspection of diﬃcult or dangerous environments. The bigger ones are usually equipped with manipulators and are used to inspect and maintain oil rigs and pipelines [36]

(a) ROV Seabotix LBV150 [37] (b) ROV Dock Ricketts [38]

Figure 5 – Remotely Operated Vehicles

Autonomous Underwater Vehicles, in turn, are tetherless robots guided by a preprogrammed computer which uses data such as speed, depth, and orientation to control and localize itself in the environment. Additional sensors may detect potential obstacles and record data for posterior human inspection. AUVs have their own power supply system and need to recharge it periodically. This is performed by retrieving the vehicle and bringing it back to the vessel. Some AUVs, also called resident AUVs, recharge their battery on a submerged docking station. In addition, the docking station permits communication with the topside and so it can be used to transfer the inspection data and to load new missions [36].

AUV technology is viable thanks to advances in many areas, such as higher energy density batteries, which enable operation by extended periods of time and for larger ranges. The quality of sensors, such as sonars and cameras, enables not only abilities of navigation such as obstacle avoidance, but also signiﬁcantly improves the quality of the collected data. Moreover, encapsulating permits the merge of sensors like magnetometers, accelerometers and gyroscopes in a single package, which allows for it to be embedded on the vehicle, improving navigation quality in AUVs [39].

Navigation is one of the most critical areas when it comes to AUVs. Since radio signals are strongly attenuated underwater, the standard way an AUV navigates is by using dead reckoning. This is performed by combining data such as velocity samples

(25)

from a Doppler Velocity Logger (DVL), acceleration and angular velocities from Inertial Measurement Unit (IMU), depth from pressure sensor and velocities from the dynamic model. Moreover, AUVs can use perception sensors such as camera and sonar to detect known landmarks and resolve its position on the environment. Additionally, some operations allow the deployment of multiple acoustic positioning transponders on the seaﬂoor, giving the robot its position in relation to an inertial frame. Those are known as Long Baseline (LBL). Another option is to use an Ultra-short Baseline (USBL), that provides the vehicle’s position by triangulation of two short-distanced antennas on the reference frame and the antenna on the vehicle [39].

2.1.2 Historical Context

In the last decades, robots have played an important role on the development of modern society. Day after day they become more present in the daily life of humans and indispensable in the industry. Given its multidisciplinary nature, robotics depends on the progress of areas such as electronics, material engineering, computer science, control theory and so on and so forth. The advances on these areas have permitted the development of reliable robots for commercial purposes. Figure 6 shows the Autonomous Guided Vehicle (AGV) from Swisslog, which transports material in warehouses, the TUG from Aethon used in hospitals for transporting medicines, and the RC3000 autonomous vacuum from Kärcher for cleaning domestic environments [40].

(a) AGV from Swisslog [41] (b) TUG from Aethon [42] (c) RC3000 from Kärcher [43]

Figure 6 – Commercial Mobile robots

Terrestrial robots, as well as spacial and aerial robots, progressed rapidly so many of them are currently able to perform autonomous tasks. Underwater robots, however, did not experience the same expressive progress. The materials for severe environment, high research costs and complexity of the experiments did not attract researches to this area and therefore it was developed almost exclusively for military purposes.

Although humans have used the sea and boats for traveling since history started to be recorded, the development of vehicles that can go underwater is tardy. The ﬁrst

(26)

Figure 7 – Images of POODLE, the ﬁrst ROV [36]

mention of an underwater vehicle design is credited to Leonardo Da Vinci, registered as a military underwater vehicle in his collection of documents, namely Codice Atlanticus, which was written between 1480 and 1518. It is also said he worked on his invention but destroyed it later because he judged it could be too dangerous [44]. The development of manned underwater vehicles such as submarines started in the early 1600s and their development grew concurrently with the Industrial Revolution (1776), so that by the year of 1914, during World War I, they were already vastly used as military weapons and a threat to navies [36].

The development of the ﬁrst unmanned underwater vehicle, however, only happened in 1953, when ROV POODLE was built (Figure 7) [44]. A nautical archaeologist called Dimitri Rebikoﬀ improved his torpedo-shaped machine, which he used as a propeller when diving. He equipped it with a camera and controlled it remotely, using the ROV to search for shipwrecks.

In parallel, the British and United States Navies began to use ROVs to locate and retrieve expensive materials that had been lost in the water. During the 1950s a British ROV called Cutlet and the American Cable-controlled Undersea Recovery Vehicles (CURV) were used to recover mines and torpedos [36]. Between 1950s and 1960s, ROVs evolved mainly for military purposes. Two memorable events in the ROVs history are the recovering of a hydrogen bomb that was lost in the coast of Spain in 1966 and the rescue of the trapped crew of the submersible Pisces III in 1973. Both tasks were performed by the series of U.S. Navy ROV CURV [36, 45]. Figure 8 shows the ﬁrst ROVs and an illustration of the ROV CURV III recovering the hydrogen bomb.

During the 1970s, the growing demand for offshore oil extraction propelled in-vestment on underwater vehicles, as they had proven to be quite useful for underwater missions. Thus, scientific and commercial development of ROVs grew significantly. The requirements of Oil & Gas industry pushed the development of increasingly more complex robots, more efficient and capable of operating reliably at higher depths [36]. The need to operate in deep water became a problem for the use of tethers and has encouraged the

(27)

(a) Cutlet [45] (b) CURV II [45]

(c) Illustration of CURV recovering the hydrogen

bomb [46]

Figure 8 – Early ROVs

commercial development of tetherless robots with the ability to work autonomously, i.e., without human intervention.

The ﬁrst autonomous underwater vehicle is the Special Purpose Underwater Re-search Vehicle (SPURV), built in 1957 in the University of Washington, shown in Figure 9). Its purpose was to study underwater communication and track submarines [47]. The decades of 1960 and 1970 consisted of an experimentation period as many concept proofs were developed. These years can be considered as an experimental period because, though there were many successes and failures, a considerable advance happened in this upcoming technology.

Only in the 1980s major attention was given tetherless vehicles. On those years, most laboratories began to develop test platforms. During these years, many technological advances in electronics and computing, such as power consumption eﬃciency, larger memory and more processing power, in addition to the implementation of software engineering, helped create more complex robots. Besides that, vision systems, decision-making and navigation became more popular and reliable in AUVs [49].

One highlight of the 1980 decade is the Advanced Unmanned Search System (AUSS), presented in Figure 10. It was developed by the Naval Ocean System Center, in USA.

(28)

Figure 9 – SPURV [48]

AUSS was launched in 1983 and even during the 1990s many reports and publications were still in press. It could operate at 6000 meters of depth, with an autonomy of 10 hours and it was able to communicate via acoustic sensors, which were used to transmit images through the water. The vehicle’s weight was 907kg. A remarkable event was its detection of an American WWII bomber while operating at San Diego [50, 51].

Figure 10 – AUSS [51]

In the 1990s, the proof of concept became the ﬁrst generation of operational systems. The highlights of this period are: HUGIN I and HUGIN II developed by Kongsberg Maritime in cooperation with the Norwegian Defense Research Establishment, Odyssey AUV created by MIT Sea Grant College AUV Lab. and REMUS (Remote Environmental Monitoring Unit System) developed by Woods Hole Oceanographic Systems Lab.

On the beginning of 21th century, the ﬁrst commercial vehicles became available. They are capable of performing tasks such as marine search and rescue, oceanographic research and environmental monitoring. One of the leaders in AUV business, the Kongsberg Maritime, has on its portfolio the AUV HUGIN, one of the most famous commercial AUVs. Figure 11 shows this vehicle.

Figure 11 – Hugin AUV [52]

(29)

is available in versions that can operate at 6000 meters depth, its battery lasts 100 hours when it is operating at 4 knot ( 2m/s). It is typically equipped with side-scan sonar, multibeam echo sounder, sub-bottom proﬁler, camera, and Conductivity, Temperature and Depth (CTD) sensor [52]. It is used for military purposes, in mine countermeasures (MCM), intelligence, surveillance and reconnaissance (ISR), rapid environmental assessment (REA) and commercial purposes, in seabed mapping and imaging, geophysical inspection, pipeline and subsea structure inspection, oceanographic surveys, environment monitoring, marine geological survey and search operations [52].

Underwater vehicles have evolved from a purely military role to being largely used in the Oil and Gas industry. Although oil production will continue playing an important role on our lives for a while, the future of underwater vehicles however is not limited to it. Deep mining and oﬀshore alternative energy production will demand underwater vehicles and prompt their development to fulﬁll the requirements of new businesses [39].

2.2 FlatFish

FlatFish project is an ongoing project for Dutch Royal Shell, founded by the Brazilian Government via Brazilian National Agency of Petroleum (ANP) and Brazilian Industrial Research and Innovation Corporation (EMBRAPII). It is developed by SENAI CIMATEC, at the Brazilian Institute of Robotics in cooperation with the Robotics Innovation Center (RIC), which is part of the German Research Center for Artiﬁcial Intelligence (DFKI). FlatFish is shown in Figure 12

Figure 12 – FlatFish AUV

The goal of the FlatFish project is to develop an AUV for inspection of assets of the Oil&Gas industry, such as pipelines, manifolds, SSIV, etc. FlatFish is a subsea-resident AUV, e.g., it has a docking station where the vehicle can park, recharge its battery, send data to shore and load new missions. It enables FlatFish to operate for extend periods submerged. When compared to ROV operations, this ability signiﬁcantly reduces costs since the robot does not depends on the weather, it does not require a dedicated support vessel and it permits an increase in the frequency of inspections, given that the vehicle will

(30)

Table 1 – FlatFish Speciﬁcations

Depth rating 300 m

Weight (in air) Size (LWH) 275 kg 220 cm x 105 cm x 50 cm

Propulsion 6x 60N Enitech ring thrusters (120N in each direction) Battery Lithium-Ion battery 5,8 kWh (11,6 kWh) @ 48V Communication (surface)

Rock7mobile RockBlock Iridium satellite modem (1,6 GHz) Digi XBee-Pro-868 (868 MHz)

ubiquiti PicoStation M2 HP WLAN-Modul (2,4 GHz) Communication (submerged) Evologics S2CR 48/78 kHz usable as USBL transponder Communication (tethered) 10 GBit/s optical ﬁbre

1 GBit/s Cat5e (max. 50m)

Light 4x Bowtech LED-K-3200 (3200 lumen each) Laser Line projector 2xPicotronic LD532-20-3(20x80)45-PL line laser

20mW each @ 532nm Sonar

BlueView MB1350-45 Multibeam Proﬁler (inspection sonar) Tritech Gemini 720i Multibeam Imager (navigation sonar) 2x Tritech Micron Sonar (obstacle avoidance)

Camera 4x Basler ace acA2040-gc25 2048x2048 at 25 frames/s, colour, GigabitEthernet Depth Sensor Paroscientiﬁc 8CDP700-I

INS/AHRS KVH 1750 IMU

DVL Rowe SeaProﬁler DualFrequency 300/1200 kHz

be available 24/7. So, a subsea-resident AUV has a lower cost per operation and allows early failure detection.

According to the scope of the project, one vehicle was built in Germany and another in Brazil. This strategy speeds up project development since it allows the involved scientists to exchange knowledge during the project. Table 1 provides an overview of the FlatFish features.

2.2.1 Sensors and Actuators

2.2.1.1 Navigation System

One of the problems in underwater vehicles is how to localize precisely on the environment. The absence of a global positioning system causes FlatFish to rely mostly on dead-reckoning for navigation. It is performed by data fusion from the accelerometers and gyroscopes of the Inertial Navigation System (INS) and the velocity measured by the Doppler Velocity Logger (DVL). The dead-reckoning error, however, grows exponentially. Thus, FlatFish also takes advantage of the visual and acoustic sensors to extract features of known submerged structures to correct the localization error. Flatﬁsh is also equipped with an Ultra-Short Base Line (USBL), a sensor placed on the docking station, that obtains the position of the vehicle according to the time-of-ﬂight of an emitted acoustic signal. The USBL is able to track the vehicle position within 1 Km radius. The vehicle is also

(31)

equipped with two mechanical single beam sonars able to detect obstacles around the vehicle.

2.2.1.2 Inspection System

The inspection sensors play an important role on FlatFish as it is the core of FlatFish’s objective. The vehicle’s payload is composed of a mix of visual and acoustic sensors, they were chosen in order to acquire data even under adverse environmental conditions, such as high turbidity level. The visual groups is composed of four cameras compounding 2 stereo cameras systems that can record colored videos at 25fps at 2K resolution (2040x2040). In addition, two lasers can project green lines on the image to extract depth information and use it, for instance, for 3D model reconstruction. Concerning the acoustic sensor for inspection, FlatFish uses a high resolution Proﬁling Multi-Beam sonar, which provides means to create a 3D model of the structure even in bad visibility. 2.2.1.3 Propellers

Flatfish uses six propellers of 60N force each. These are able to control the vehicle in five DOF, which characterizes the AUV as an overactuated system. The distribution of the thrusters is shown in Figure 13. Given distribution results on the Thrust Configuration Matrix (TCM) are shown in Table 2 [53]. The TCM is a matrix that relates the forces on the thrusters with the efforts in each degree of freedom.

Figure 13 – FlatFish thruster conﬁguration [53] Table 2 – Thrust Conﬁguration Matrix [53]

T1 T2 T3 T4 T5 T6 Surge 1 1 0 0 0 0 Sway 0 0 -1 -1 0 0 Heave 0 0 0 0 1 1 Roll 0 0 0 0 0 0 Pitch 0 0 0 0 -0.4235 0.556 Yaw 0.44 -0.4 -0.5735 0.936 0 0

(32)

2.2.2 Vehicle Model

The dynamic behavior of underwater vehicles are essentially non-linear. A Newton-Euler formulation for underwater vehicles proposed by [54] is expressed in Equation 2.1.

M ˙v + C(v)v + D(v)v + g(η) = τe+ τ (2.1)

where:

• M = inertia matrix, including the added mass;

• C(v) = matrix of Coriolis and centripetal terms, including added mass; • D(v) = damping matrix;

• g(η) = vector of gravitational forces and moments; • τe = vector of environmental forces;

• τ = vector of control inputs.

Note that C(v) and D(v) are non-linear velocity-dependent components. Addition-ally, the velocities described on the model aforementioned are coupled, which means that the velocity on one degree of freedom contributes to the movement of the vehicle in a diﬀerent DOF. Ocean currents and umbilical dynamics (in ROVs) are usually present. They are not typically considered for the controller designer, but regarded as disturbance. Moreover, underwater vehicles are subject to delay and saturation of propellers [55].

For vehicles operating at small velocities and symmetrical in the three planes, a common practice is to simplify the model by assuming the added mass constant and disregard the oﬀ-diagonal and coupling terms [56]. Thus, Equation 2.1 can be rewritten as:

mi˙vi(t) + dQvi(t)|vi(t)| + dLvi(t) + bi = τ (t) (2.2) where each degree of freedom i has its own added mass (m), damping terms (dQ and dL), buoyancy and control forces.

The FlatFish’s parameters model (from Equation 2.2) was identiﬁed via Least-Square method, collecting the poses and the thrusters’ eﬀorts when the vehicle performed sinusoidal movements on each degree of freedom. The results are shown in Table 3.

(33)

Table 3 – Model parameters of AUV FlatFish [56]

DOF Inertia m Quad. Damp. (qD) Linear Damp. (qL) Buoyancy (b) Surge 851.05 [kg] 8.62 [kg m] 39.57 [ kg s] -0.81 [N] Sway 976.05 [kg] 157.52 [kg m] 64.96 [ kg s] 3.00 [N] Heave 1511.53 [kg] 1911.21 [kg m] -70.29 [ kg s ] -12.42 [N] Yaw 301.60 [kg.m2 rad ] 279.39 [kg.m 2_] _{26.92 [}kg.m2 rad.s] -0.0909 [N.m]

2.2.3 Software Layer

The software layer is based on the Robotic Construction Kit (ROCK)1_{, an}

open-source model-based software framework for construction of robots. The framework provides standard tools for development, such as logging and log replay, data visualization and state monitoring [57].

Rock framework also has an integration with the robot simulator Gazebo, which permits the test of the software integration, algorithms and the behavior of the system under faulty situations. The FlatFish team enabled on Gazebo a robust simulation of the physical effects of water such as buoyancy, drag and added mass, simulation of different types of sonar, and the simulation of water effects on cameras (color attenuation and depth of view, which have been proved very useful for control and computer vision algorithm tests.

The models of Rock, such as drivers, navigation and image processing algorithms, controllers, are managed by the system coordinator Syskit, which is responsible for binding the component together and creating a function for that. For instance, in a visual servoing task, Syskit handles the connection of camera driver component with the image extractor and the controller. Given its status, Syskit also manages missions as a whole, with the actions state machine and also takes care of emergency procedures in case of failures, from resetting a single component to sending the vehicle to surface in case of some critical failure.

1

(34)

3 Computer Vision

This chapter introduces basic computer vision concepts related to this master thesis. It starts with the particularities of the underwater image formation and the challenges related to underwater image processing. Afterwards, the pinhole camera model is presented, together with the basic concepts of perspective geometry required for its understanding. Then, the process of pose estimation with a single camera is introduced. Finally, we introduce the artificial fiducial markers, the general concepts and specificities of the marker systems used in this work.

3.1 Underwater Image Processing

Underwater image analysis is interesting for many applications, such as: inspection of man-made structures, observation of marine fauna, monitoring, seabed studies [58]. Nonetheless, unlike normal images taken in the air, physical characteristics of the water medium causes the underwater images to be usually aﬀected by limited range visibility, low contrast, non uniform lighting, blurring, bright artifacts, and noise [59].

Light attenuation in the water is caused by light absorption and light scattering. The former reduces the energy of light while the latter changes the direction of the light. It directly affects light visibility, which can vary from twenty up to five meters or even less [60]. Light attenuation is not caused only by light propagation in the water medium, but also due to dissolved particles and the floating particles in the water. These interactions increase absorption and scattering effects.

The underwater image formation model is based on the assumption that the light received by the camera is composed of the superposition of three components: direct component (Ed), forward-scattered components (Ef) and backscatter component (Eb) [60], thus:

ET = Ed+ Ef + Eb (3.1)

The direct component is the light directly reflected by the object. The forward-scattered component is also reflected by the object, but forward-scattered in small angles. The backscatter component is reflected by objects not present on the scene, for instance, suspension particles, but enters in the camera [60]. Figure 14 shows these three components. Forward-scattering components contribute to the blurring effect on the final image, while backscattering affects the contrast.

(35)

Figure 14 – Representation of the three components of light received by the camera [60]

Figure 15 – Color absorption by function of distance [59]

The absorption of wavelength corresponding to the red color is higher than the other colors, so the red component quickly decreases after a few meters. The blue corresponding wavelength is less aﬀected than the others, which makes the underwater images have a bluish appearance. In the deep ocean, the absence of light requires an artiﬁcial lighting source, however, it tends to add a bright spot on the center of the image and low luminosity level on the corners [61]. Figure 15 shows color absorption as a function of distance [62].

Groups of researchers have been making eﬀorts on turning underwater images more comprehensible to humans and even for machines, since it would permit the use of standard image processing algorithms [60]. Underwater image processing is mainly divided in two groups: image restoration and image enhancement. The former uses a model of the original image formation and a degradation model to recover the original image. It

(36)

involves the knowledge of many parameters, such as attenuation and diﬀusion coeﬃcients. The latter does not use any physical models and relies on the use of techniques such as contrast stretching and color space conversion in order to transform the degraded image into a more legible one [60].

3.2 Perspective Transform

3.2.1 Pinhole camera model

This section introduces the simplest and more specialized camera model, the pinhole model. A camera model is a matrix that maps the transformation of a 3D world into a 2D image [63]. On that transformation, the depth information is lost, i.e., without any previous knowledge about the object size, it is impossible to discern, just by looking to the image, whether the object is small or the camera is positioned very far from the object [64]. Figure 16.a shows a picture of a parking lot. However, only with the people in the background of the Figure 16.b is it possible to realize that it is a miniaturized city.

(a) (b)

Figure 16 – Picture of a miniaturized parking lot taken with a camera close to the objects. The pinhole model represents the phenomenon where light goes through a tiny oriﬁce and an inverted image is projected on the wall of an obscure room [64]. The process of image formation on humans and in CCD cameras is similar to image formation on a pinhole camera. The inverted image is projected in the retina, or sensors, and ﬂipped by the brain or by the microprocessor. Unlike the pinhole camera, humans and cameras have convex lenses that allow more light to pass through and, consequently, produce brighter images [64].

Pinhole is a central projection camera, that is, any line that crosses a point in the world and its projection on the image plane ends up hitting the same point on the camera frame, namely, the camera center. Figure 17.a illustrates the central projection model. The non-inverted image is projected in plane parallel to the x-y plane and in Z = f, the image plane. As it was said, the center of coordinate systems C is the camera center or

(37)

optical center. The line from the camera center, perpendicular to the image plane, is the principal axis or principal ray, and the plane through C, parallel to the image plane, is called principal plane. Finally, the principal axis crosses the image plane at the principal point [63].

(a) Central projection model (b) Triangle similarities

Figure 17 – Pinhole camera geometry The mapping of the point P = (X, Y, Z)⊤

in the image plane p = (x, y) is demonstrated by triangle similarity, according to Figure 17.b, thus:

x= fX

Z y= f Y

Z (3.2)

This projection is a mapping from R3 _{to R}2 _{and has the following properties [64]:}

1. Straight lines in the world keep straight in the image plane.

2. Parallel lines in the world are intersected in the image plane at the vanishing point or point at inﬁnity, except by lines parallel to the image plane which do not converge. 3. Conics (circles, ellipses, parabolas, hyperbolas) are projected as conics in the image

plane.

4. For a given projected point p = (x, y), there is no unique solution for retrieving the

P = (X, Y, Z) of the world point. The point P can be anywhere along the segment CP (shown in Figure 17

5. The internal angles are not preserved, so the shape is not preserved.

Equation 3.2 considers the origin of the image plane at principal point, which may not to be. So letting the coordinates of the principal point be (px, py)

⊤

, the following map give us a more generic relation:

x= fX

Z + px y= f Y

(38)

The pinhole cameras consider that image coordinates have the same scales for both directions. However, it may be that CCD cameras have non-square pixels. Establishing

ρw and ρh as the pixel width and pixel height, respectively, the point on the image plane represented in the pixel dimensions is then related by:

u= x ρw x+ u0 v = y ρh x+ v0 (3.4)

where u0 and v0 are the principal point expressed in pixel dimension.

Writing this in the homogeneous form:

     u v w      =      αx 0 u0 0 0 αy v0 0 0 0 1 0              X Y Z 1         (3.5) where αx =f/ρw and αx=f/ρw. In short: p= K[I|0]P (3.6) with: K =      αx 0 u0 0 αy v0 0 0 1      (3.7) where K is the camera calibration matrix [63] and its elements are called the intrinsic parameters.

3.2.2 Distortion parameters

The above equations assume perfect lenses. However, real lenses, especially low-cost lenses, have some kind of imperfections. These add several distortions on the formed image, such as chromatic aberration, variation of focus across the scene and geometric distortions, which cause points to be projected on a different position from where they should be [64]. In robotic applications, geometric distortions are the most problematic, consisting of radial and tangential components. The radial distortion causes image point displacement along a circumference around the principal point and radii equals to the distance between the point and the principal point. Tangential distortions occur at right angles and usually are less significant than the radial [64]. Figure 18 illustrates the effect of geometrical distortions during the image formation process.

(39)

Figure 18 – Diagram showing the position of the pixel in the ideal case and the eﬀect of the radial distortion (δr) and tangential distortion (δt)

The radial distortion is modeled by a polynomial approximation:

δr = k1r3+ k2r5+ k3r7+ . . . (3.8)

where r is the distance to the principal point.

Therefore, the coordinate (u,v) of a point with distortion is given by:

  δu δv  =   u(k1r2+ k2r4+ k3r6+ . . .) v(k1r2+ k2r4+ k3r6+ . . .)  +   2p1uv+ p2(r2+ 2u2) p1(r2+ 2v2) + 2p1uv)   (3.9)

Usually three parameters are acceptable for characterizing radial distortion. Thus the distortion model is parameterized by (k1, k2, k3, p1, p2) and these parameters are

considered additional intrinsic parameters [64].

3.2.3 Camera Calibration

The process of determining the intrinsic parameters is known as camera calibration. It relies on the corresponding world points in the image plane. The Bouget’s calibration method is widely applied, requiring samples of a planar chessboard in diﬀerent positions and orientations [64]. A tool 1 _{that automatically detects the chessboard and computes}

the calibration matrix is available in both OpenCV and Matlab.

In practice it is common to ﬁnd the camera intrinsic parameters, by the camera calibration process, and undistort the image based on the distortion parameters.

3.2.4 Pose Estimation with Single Camera

The problem of estimating the position and orientation of a camera with respect to an object is known, and it requires the camera intrinsic parameters and the 3D geometric 1

(40)

model. In addition, it demands the correspondence of N points on the image (ui, vi) with the points in the world (Xi, Yi, Zi), where i ∈ [1, N]. This problem is known as Perspective-n-Point (PnP).

Many methods to address this problem have been proposed. They are divided in analytical and least-square solutions. There is a solution for a system with three points, though it yields to multiple solutions. A unique solution is possible with four coplanar, but non-collinear points. The correspondence of six or more points has an also unique solution and it provides not only the relation between camera and object, but also the intrinsic camera parameters [24]

3.3 Artificial Fiducial Markers

Artificial fiducial markers are landmarks designed to be easily detected by computer vision systems. They have an unique ID encoded in their structure following a pre-defined pattern system and provide a means to implement reliable pose estimation of the marker with respect to the camera.

Because of these features, fiducial markers have many applications in augmented reality [65] and robot navigation [66], where they can be used in addition to natural landmarks to improve navigation performance. They can also be employed to label specific structures [67], such as a docking station or areas of interest, such as sampling points. In underwater environments, fiducial markers can be also applied to robot mapping and localization [68] [69], to monitoring underwater sensors [70], to assist ROV operators in robot control [71] and as an interface for human-robot communication[72] [73].

Their advantages have aroused the interest of the scientiﬁc community and several marker systems have been proposed. MAtrix [74] and ARToolKit [75] were two of the marker systems ﬁrst created, and their quadratic form has inspired other markers such as ARTag [76], ArUco [77] and AprilTags [78]. Circular shapes were proposed by Rune-Tag [79]. The encoding systems also vary between marker systems. For instance, Mono-spectrum Markers [80] and Fourier Tags [81] rely on a frequency analysis of the image to decode their patterns.

The use of ﬁducial markers in underwater environments is challenging because the acquired image is subject to many degradation factors mentioned on the Section 3.1. The following section details the three marker systems chosen to be evaluated on this work. ARToolKit, AprilTags and ArUco were chosen because are released under open source license and provide a ready-to-use implementation.

(41)

3.3.1 The evaluated marker systems

ARToolKit [75] is one of the most popular marker systems and has been used in several applications over the years. During its detection stage, ARToolKit uses a global threshold to create a binary image and an image correlation method where the payload is compared with a predeﬁned database. Although this feature allows the user to create markers with intuitive patterns as shown in [67], it increases the computational eﬀort when a large database is formed, since each potential marker has to be checked against the entire database.

AprilTags [78] uses a graph-based image segmentation with local gradients to estimate lines and introduces a quad detection method designed to handle occlusions. Once the lines are detected, AprilTags relies on a spatially-varying threshold using known values of black and white pixels to decode the payload. This makes the detection system robust against lighting variations within the observed marker. Its encoding guarantees a certain Hamming distance between the codewords and for this reason decreases the inter-marker confusion rate.

ArUco [77] proposes a method to create a dictionary with a configurable number of markers and number of bits. This method maximizes the bit transitions and the inter-marker difference to reduce the false positive and inter-inter-marker confusion rates, respectively. The ArUco library also features a method for error correction. Its detection process consists of applying the adaptive threshold in a grayscale image and then finding the marker candidates by discarding the contours that cannot be approximated by a rectangle. Subsequently, the code extraction, marker identification and error correction stages are applied.

Figure 19 shows the presented marker systems.

(a) ARToolKit [75] (b) Apriltag [78] (c) ArUco [77]

(42)

4 Visual Controller

Visual servoing is a control scheme that uses visual information to control the motion of a robot. It uses single or multiple cameras to extract features from an image and map it into control commands. Vision-based control has been applied to increase the accuracy and ﬂexibility of robotic systems [24, 82, 33].

The first efforts to control robots using visual information dates from 1973, when an industrial robot used camera information to place a block inside a box [83]. In that work, the difference between the current and the desired position is computed and, based on that, the motion command is provided. Here, the robot moves without any visual information towards the goal pose. When the robot stops, it takes another picture to check if the desired position was achieved. If it was not, then it repeats the process. The term

visual servoing was ﬁrst introduced in 1979 and it characterizes a system with a faster

image feature extractor which provides real-time information and enables closed-loop control chain [84]. However, despite the eﬀorts during the 1970s, only in the 90s did the number of publications increase signiﬁcantly, mainly due to the considerable advances in processing power and computer vision techniques [24].

Visual servoing can be deployed with several conﬁgurations. It can either have a camera attached to the robot’s body (eye-in-hand) or an external camera mounted on the environment (eye-to-hand). It can also be classiﬁed regarding the level of actuation, where the dynamic look-and-move approach provides a setpoint to the low level controller as velocity commands, while the direct visual servo uses the visual controller to directly command the actuators [24].

Since most mobile robots have an embedded camera, the majority of visual servoing applications are eye-in-hand. Nevertheless, eye-to-hand has a potential use in collaborative robots and a mixed approach has been used in humanoid robots [85, 86]. Concerning the control level, most implementations adopt the dynamic look-and-move approach, mainly because it requires lower update rate than the direct visual servo. In addition, most of robots already have a low level controller implemented. Thus, the modularity added by the use of dynamic look-and-move grants simplicity and portability to the controller, since it gives a kinematic control character to the visual controller [24].

The classical visual servoing literature, such as tutorials and reviews [24, 87, 88, 89], classiﬁes the eye-in-hand visual servoing into two big categories according to the nature of the controller input. The Image Based Visual Servoing (IBVS) compares the pixel coordinates of the features with the pixels coordinates that the feature will assume when the robot achieves the desired pose. Since the task is deﬁned on the image plane, this

(43)

approach is sometimes referred as 2D visual servoing [33]. On the other hand, the Position Based Visual Servoing (PBVS) estimates the 3D camera pose based on the extracted features and tries to minimize the diﬀerence to the desired pose. Given that, the task function is deﬁned on the Cartesian Space, the PBVS is also referred as 3D Visual Servoing [33]. Figure 20 and Figure 21 illustrate the structure of both approaches.

Figure 20 – Structure of dynamic look-and-move IBVS

Figure 21 – Structure of dynamic look-and-move PBVS

Both IBVS and PBVS present their advantages and disadvantages. The image-based visual controller does not require any 3D model of the target, it is robust to errors on the camera calibration [33] and may reduce the computational time since the pose estimation step is not performed [87]. On the other hand, it introduces a challenge to the controller designed because the process is non-linear and highly coupled [87]. The position-based scheme is highly sensitive to camera calibration errors and uncertainties on the 3D object model [33].

Since PBVS is deﬁned in the Cartesian Space, the camera performs an optimal trajectory on that space, but it can result in the loss of features, since there is no direct regulation of image feature on the image space. On the other hand, IBVS is deﬁned on the image space. Therefore, there is no direct control of camera motion on 3D space. The IBVS controller may cross a singularity or reach a local minimum [64, 88].

A mixed approach introduced by [33] aims to compensate the shortcomings of both strategies while taking advantage of their strengths. As it combines aspects of 3D and 2D visual controllers, it is called 21_/₂_D _{visual servoing or hybrid visual servoing (HVS). The}

21_/₂_D _{approach does not need any previous knowledge of the targeted object geometrical}

model and, unlike the IBVS, it ensures convergence of control law for the whole task space [33].