Universidade de Aveiro Departamento de Eletrónica,Telecomunicações e Informática
2020
Pedro Miguel Simões
Bastos Martins
Análise da Interferência entre LiDARs de Tempo de
Voo
Universidade de Aveiro Departamento de Eletrónica,Telecomunicações e Informática
2020
Pedro Miguel Simões
Bastos Martins
Análise da Interferência entre LiDARs de Tempo de
Voo
Interference Analysis in Time of Flight LiDARs
Dissertação apresentada à Universidade de Aveiro para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Engenharia Eletrónica e Telecomu-nicações, realizada sob a orientação científica do Doutor António Neves, Professor Auxiliar do Departamento de Eletrónica, Telecomunicações e Informática da Uni-versidade de Aveiro, do Doutor Miguel Drummond, Investigador Auxiliar no Insti-tuto de Telecomunicações, e do Doutor André Albuquerque, Optical Designer na Bosch Car Multimédia Portugal, S.A..
o júri / the jury
presidente / president Professor Doutor Pedro Nicolau Faria da Fonseca
Professor Auxiliar da Universidade de Aveiro
vogais / examiners committee Professor Doutor Vítor Manuel Ferreira dos Santos
Professor Associado c/ Agregação da Universidade de Aveiro
Doutor Miguel Vidal Drummond
agradecimentos / acknowledgements
Obrigado,
Aos meus pais, por me apoiarem e confiarem em mim.
À minha irmã, por ainda achar que estudei para saber montar e arranjar computa-dores.
Ao meu irmão, por parecer interessado quando falo de robôs e tecnologia. À Bárbara, pela paciência e pelo apoio, no meu pior e no meu melhor.
Ao Zé, Coutinho, Bernardo, Samuel e Bia por nunca me terem deixado sozinho neste caminho.
À HART e a todos os que fizeram dela o que foi, por me ajudarem a consolidar o gosto por robótica e LiDAR.
Ao NEEET, à equipa do ENE3 e a todos os que ajudaram a construir o que hoje é o núcleo de estudantes de Engenharia Eletrónica e Telecomunicações, em especial o Carvalho e o Ricardo, pela camaradagem.
Ao BEST, e em especial, à Maria, Vala, Catarina, Jorge, Pacheco, Mariana, Dinis, Mateus, por todo o apoio, amizade e mensagens de encorajamento.
Um especial agradecimento aos meus orientadores e co-orientadores, António Neves, Miguel Drummond e André Albuquerque, pela confiança e autonomia para explorar o problema, e pela disponibilidade para me ajudarem.
À Bosch Car Multimedia, um agradecimento pela confiança e cedência dos equipamentos.
Ao IRIS Lab, pela cedência do espaço para obter os datasets experimentais.
À Universidade de Aveiro e ao Instituto de Telecomunicações, por me acolherem, fazerem-me crescer e ajudarem-me ao longos destes árduos 5 anos.
A todos,
Palavras Chave LiDAR, interferência entre LiDARs, camera, deteção de objetos, carros autónomos, point cloud, ROS
Resumo A cada 23 segundos, uma pessoa morre nas estradas. Em 2018, 1.35 milhões de pessoas morreram devido a acidentes nas estradas, 90% dos quais foram devidos a erro humano: condução perigosa, distrações, fadiga e más decisões. Veícu-los autónomos são uma das soluções apresentadas para resolver este problema, substituindo ou ajudando o condutor. Para tal, os veículos precisam de conseguir perceber aquilo que os rodeia com grande precisão, sendo o LiDAR um dos sensores mais promissores para essa tarefa.
Para compreender o que os rodeia, os LiDARs emitem raios laser que podem, teoricamente, ser recebidos por um outro LiDAR, noutro carro, interferindo com a capacidade desse segundo LiDAR compreender o que rodeia. Num cenário onde múltiplos carros autónomos equipados com LiDAR coexistem, a sua interferência mútua pode comprometer a sua capacidade para perceber o que o rodeia com precisão e a possibilidade de solucionar um dos problemas que inicialmente ira resolver: acidentes e mortes na estrada.
Nesta Dissertação de Mestrado, propomos o estudo do comportamento da inter-ferência entre dois LiDARs em vários cenários de interinter-ferência, onde variamos a sua distância, altura e posição relativa. Tentámos também perceber o diferente impacto da interferência direta e dispersa, através da obstrução da linha de vista entre os dois LiDARs, e verificar qual o comportamento da interferência em re-giões de interesse e objetos. Construímos um setup experimental contendo dois LiDARs e uma câmara, calibramo-los intrínseca e extrinsecamente e estimamos a posição dos objetos de interesse na point cloud através de regiões de interesse previamente detetadas em imagem. Usando este setup experimental, recolhemos mais de 600 GB de dados não tratados, aos quais aplicamos 4 técnicas de análise de interferência diferentes, todas desenvolvidas por nós.
As nossas descobertas permite afirmar que o número relativo de pontos com inter-ferência variam entre as ordens de magnitude de 10−7e 10−3. Os nossos resultados
mostram que a interferência direta predomina sobre a interferência dispersiva, cau-sando com que o valor da interferência relativa seja uma ordem de magnitude maior se a linha de vista entre os dois LiDARs for obstruída. Somos também capazes de identificar situações em que a interferência se comporta de forma parecida ao ruído do sensor, sendo quase indistinguível; e outros casos em que esta está forte-mente presente, causando erros nas medições de distância que ultrapassam até as dimensões físicas do espaço onde o setup experimental está a ser operado. Concluímos que a interferência não aparenta ser tão destrutiva para condução autó-noma como inicialmente previsto, devido à baixa ordem de grandeza da magnitude. De qualquer forma, esta pode ainda ter efeitos graves, principalmente em situações de interferência direta. Podemos também concluir que a natureza da interferência é altamente volátil, dependendo de condições ainda não 100% definidas, incluindo a influência como é criado o setup experimental.
Keywords LiDAR, LiDAR interference, camera, object detection, autonomous cars, point cloud, ROS.
Abstract Every 23 seconds, someone dies on the road. In 2018, 1.35 million people died because of a road accident, 90% of which were caused by human error: reckless behavior, distractions, fatigue, and bad decisions. Autonomous vehicles are one of the solutions to tackle this problem, by replacing or helping the human driver. For that, vehicles need to understand the world around them with great precision in 3D, which makes LiDAR one of the most promising sensors up for the task. To sense their surroundings, LiDARs emit laser beams, which can, theoretically, be received by a LiDAR on another car, disturbing the accuracy of its ability to map the surroundings. In a scenario where multiple autonomous vehicles equipped with LiDAR coexist, their mutual interference can undermine their capability to accurately understand the world and their capability to tackle one of the problems they came to solve: road accidents.
In this Master’s thesis we propose to study the behavior of two LiDARs on sev-eral interference scenarios, varying their relative distance, height and positioning. We also attempt to understand the different impacts of direct and scattered in-terference, by blocking the LiDARs line of sight and verify the behavior of the interference on specific regions of interest and objects. We construct an experi-mental setup containing two LiDARs and a camera, intrinsically and extrinsically calibrate them and estimate the position of the objects of interest on the point cloud through regions of intereset previously detected on the image. Using this experimental setup we gathered more than 600 GB of raw data on which we apply 4 different techniques of interference analysis.
Our findings show that the relative number of interference points lies between 10−7
to 10−3. The results also show that direct interference predominates over scattered,
generating relative values of interfered points one order of magnitude higher than when obstructing the line of sight between the LiDARs. We were able to identify cases on which interference seems to behave closely to sensor noise, being almost indistinguishable; in contrast when it was strongly deleterious, resulting on depth measurement errors that surpass the physical dimensions of the room where the setup is operating.
We can conclude that interference seems no to be severe for autonomous driving as few measurements are severely impaired by it. Nevertheless, it can still have ill effects, especialy in situations of direct interference. We also conclude that its nature is highly volatile, depending on conditions not yet fully understand, including the influence of the experimental setup.
Contents
Contents i List of Figures v List of Tables ix Glossary xi 1 Introduction 11.1 Scope and Motivation . . . 3
1.2 Objectives . . . 5
1.3 Document Structure . . . 5
1.4 Contributions . . . 6
2 State of the Art 9 2.1 Middleware Software and Message Passing . . . 9
2.1.1 Middleware Software Libraries and Robotics . . . 10
2.1.2 Robotic Operative System . . . 11
2.2 Datasets . . . 11
2.2.1 Ford Campus LiDAR Dataset . . . 12
2.2.2 KITTI Dataset . . . 13
2.2.3 Udacity Self-Driving Car Nanodegree Dataset . . . 13
2.2.4 Summary . . . 14
2.3 Camera as a Sensor on Computer Vision Applications . . . 15
2.3.1 Camera Geometry . . . 15
2.3.2 Camera Intrinsic Calibration . . . 19
2.3.3 Image Processing Libraries for Computer Vision . . . 20
2.4 Automotive LiDAR . . . 21
2.4.1 LiDAR Classification based on the Depth Measurement Principle . . . 22
2.4.3 The Mechanical Rotating LiDAR . . . 23
2.4.4 Point Cloud Processing Software . . . 24
2.5 Camera and LiDAR Extrinsic Calibration . . . 25
2.5.1 Calibration using Patterns . . . 25
2.5.2 Calibration without Human Intervention or Patterns . . . 26
2.5.3 Other Calibration Methods . . . 26
2.5.4 Summary of Extrinsic Calibration Techniques . . . 27
2.6 Sensor Fusion . . . 27
2.6.1 Calibration + Fusion . . . 28
2.6.2 Simultaneous Calibration and Fusion . . . 29
2.6.3 “Direct” Sensor Fusion . . . 29
2.6.4 Considerations about the Different Methods and Our Work . . . 29
2.7 Object Detection . . . 30
2.7.1 Deep-Learning based Approach . . . 30
2.7.2 Public Image Datasets . . . 31
2.7.3 YOLO . . . 31
2.7.4 Cloud based Platforms . . . 31
2.7.5 Objection Detection using standalone LiDAR and LiDAR + Camera . . . 31
2.8 LiDAR Interference . . . 32
2.8.1 Autonomous Vehicles Hacking through LiDAR Interference . . . 33
2.8.2 Two-Dimensional Coplanar LiDAR Interference . . . 33
2.8.3 A Departing Note on LiDAR Interference . . . 36
3 Camera and LiDAR Calibration 37 3.1 Experimental Setup . . . 37
3.1.1 Camera and Lens . . . 38
3.1.2 TOF LiDAR . . . 38
3.1.3 Connection Setup . . . 39
3.1.4 LiDAR and Camera Interference . . . 39
3.2 Camera Intrinsic Calibration . . . 40
3.2.1 Camera Focus and Depth of Field . . . 40
3.2.2 Camera Calibration Procedure . . . 41
3.2.3 Camera Calibration Results . . . 44
3.3 LiDAR Intrinsic Calibration . . . 45
3.4 Camera and Light Detection And Ranging (LiDAR) Extrinsic Calibration . . . 45
3.4.1 Calibration Method . . . 46
3.4.2 Implementation . . . 46
3.4.4 Comparison with KITTI Dataset . . . 49
3.5 Final Remarks . . . 50
4 LiDAR and Camera Data Fusion 53 4.1 Combining LiDAR and Camera Data . . . 53
4.1.1 Camera → LiDAR . . . 54 4.1.2 LiDAR → Camera . . . 54 4.2 Implementation . . . 55 4.3 Results . . . 56 4.3.1 Experimental Dataset . . . 56 4.3.2 KITTI Dataset . . . 57
4.3.3 Impact of Point Cloud Density . . . 59
4.4 Final Remarks . . . 59
5 Object Detection 61 5.1 Object Detection on Image . . . 61
5.1.1 Setup Specifications . . . 62
5.1.2 Integration with ROS . . . 62
5.1.3 Results . . . 63
5.1.4 Outcomes . . . 65
5.2 Correspondences between Objects in Image and Point-Cloud . . . 66
5.2.1 Darknet-rosPackage Limitations and Contributions . . . 66
5.2.2 Ray Casting . . . 67
5.2.3 Projecting LiDAR Points to the Image . . . 71
5.2.4 Comparison between the Two Methods . . . 73
5.2.5 3D Bounding Boxes and Regions of Interest . . . 74
5.3 Final Remarks . . . 79
6 LiDAR Interference 81 6.1 Experimental Setup . . . 82
6.1.1 Experimental Test Scenarios . . . 83
6.1.2 Parameters Under Study . . . 84
6.1.3 Dataset Organization . . . 85
6.2 Bosch© Dataset . . . 87
6.3 Room Outliers Accounting . . . 89
6.3.1 Bosch Dataset . . . 90
6.3.2 Our Experimental Dataset . . . 91
6.4 Ground-Truth Model Generation . . . 96
6.4.2 Frame Registration . . . 98 6.5 Voxel-to-Voxel Analysis . . . 99 6.5.1 Implementation . . . 100 6.5.2 Results . . . 100 6.6 Point-to-Point Analysis . . . 107 6.6.1 Implementation . . . 108 6.6.2 Results . . . 109 6.7 Interference on ROIs . . . 115 6.7.1 Implementation . . . 117 6.7.2 Results . . . 118
6.7.3 Comparison between Whole Point-Cloud Analysis and ROI Extraction . . . . 121
6.8 Comparison with the State of the Art . . . 122
6.9 Final Remarks . . . 123
7 Conclusions and Future Work 125 7.1 Summary of Work Developed . . . 125
7.2 Main Outcomes . . . 127
7.3 Limitations of this Work . . . 128
7.4 Future Work . . . 129
References 131 Appendices 143 Appendix A: RECPAD paper . . . 143
Appendix B: Powering AVT Manta G-504C . . . 146
Appendix C: ROS Implementation Node Diagrams . . . 147
List of Figures
1.1 Social media visuals produced by the WHO for raising awareness on road safety. . . 2
1.2 Example of a multiple sensor setup on an autonomous car. . . 4
2.1 Ford 250 used to acquire Ford Campus dataset. . . 12
2.2 Volkswagen Passat used for recording KITTI dataset. . . 13
2.3 Pinhole effect on a small aperture. . . 16
2.4 Pinhole camera model. . . 17
2.5 Pinhole effect on a lens. . . 18
2.6 Effects of barrel and pincushion distortion caused by a non-linear lens. . . 19
2.7 Chessboard calibration pattern with the corners used for calibration. . . 20
2.8 Example of a point and mesh clouds. . . 21
2.9 Examples of state of the art sensor fusion results between camera and LiDAR data. . . . 28
2.10 Spoofing obstacles on LiDAR at different angles and closer distances that the spoofer. . . 33
2.11 Experimental setups used by Kim et al. and Popko et al. for 2D LiDAR interference studies. 34 3.1 Perspective and top views of the experimental setup for the camera and LiDAR. . . 37
3.2 Chessboard used for the camera calibration procedures. . . 40
3.3 Subset of figures used for intrinsic camera calibration. . . 42
3.4 Camera calibration ROS node diagram. . . 44
3.5 Correspondences selected on the image and point cloud for extrinsic calibration between the two. . . 48
3.6 Experimental setup TF tree coordinate frames. . . 49
3.7 Relevant KITTI TF tree coordinate frames. . . 50
4.1 Colored point clouds computed for the datasets on IRIS Lab. . . 57
4.2 Example of preliminar data fusion on IT 2 Dark Room. . . 57
4.3 Example of data fusion on KITTI dataset. . . 58
5.1 Image object detection results on KITTI dataset. . . 64
5.3 Representation of a pyramid frustum. . . 67
5.4 Points corresponding to the image objects’ bounding boxes, using the frustum filtering correspondences algorithm. . . 71
5.5 Points corresponding to the image objects’ bounding boxes, using the 3D → 2D correspon-dences algorithm. . . 73
5.6 Comparison between the 3D → 2D and frustum filtering correspondences algorithm results. 74 5.7 Comparison between the estimated tridimensional bounding boxes’ dimensions and position and the point cloud data, object clusters and ROI. . . 79
6.1 HESAI Pandar40 LiDAR on a Velbon CX-460 tripod. . . 83
6.2 Dataset folder hierarchy for a “test day”. . . 86
6.3 Dataset file hierarchy for a single value of a parameter under test. . . 86
6.4 Scatter plot top-view of both LiDARs interference on the dataset provided by Bosch. . . 88
6.5 Scattered plot of VLP-16 interference taken on two different instants, on the same conditions. 88 6.6 Relative number of outliers when the distance between the LiDARs is varied on IRIS Lab. 92 6.7 Relative number of outliers when the height between the LiDARs is varied on IRIS Lab. 93 6.8 Relative number of outliers when the LiDARs LoS is obstructed and the distance between the LiDARs is varied, on IRIS Lab. . . 94
6.9 Relative number of outliers when the relative orientation between the LiDARs is varied on IRIS Lab. . . 95
6.10 Relative number of outliers when the distance and height between the LiDARs are varied on IT 2 Dark Room. . . 96
6.11 Step-by-step diagrams for the Frame Registration algorithm. . . 99
6.12 Voxel-to-Voxel analysis when the distance between the LiDARs is variated. . . 102
6.13 Voxel-to-Voxel analysis when the height between the LiDARs is variated. . . 104
6.14 Voxel-to-Voxel analysis when the LiDARs LoS is obstructed and the distance between LiDARs is variated. . . 105
6.15 Point-to-Point analysis principle for outlier classification. . . 108
6.16 Point-to-Point analysis when the distance between LiDARs is variated. . . 111
6.17 Point-to-Point analysis when the relative height between LiDARs optical center is variated.112 6.18 Point-to-Point analysis when the LoS between the LiDARs is obstructed and their relative distance is variated. . . 114
6.19 Test scenario for analyzing the LiDAR interference in ROIs. . . 117
6.20 Point-to-Point analysis of the interference on the Human dataset without ROI extraction. Results are presented for the ground-truth bag, interference bag and the subtraction of the results between the two. . . 119
1 ROS node graph for the extrinsic calibration. . . 147 2 ROS node graph implemented for coloring the point cloud. . . 147 3 ROS node graph for the point cloud frustum filter algorithm. . . 148 4 ROS node diagram for the estimation ob point cloud bounding box from image bounding
boxes of objects of interest. . . 148 5 ROS node graph to implement the ROIs recording for later analysis. . . 149
List of Tables
2.1 Comparative analysis between the datasets. . . 14
2.2 Summary of Kim’s et al. LiDAR interference results. . . . 35
3.1 Relevant specifications of the camera and its lens. . . 38
3.2 Velodyne© VLP-16 relevant™relevant specifications. . . 39
3.3 IP addresses and subnet masks for the devices connected on the experimental setup. . . . 39
5.1 Relevant specifications for CPU, graphics card, CUDA and Nvidia™ driver. . . 62
5.2 PCL’s Euclidian Cluster filter parameters for KITTI dataset. . . 76
5.3 Execution time comparison between methods to compute the tridimensional bounding boxes. 77 6.1 HESAI Pandar40 relevant specifications. . . 82
6.2 Test values taken by the parameters under test. . . 85
6.3 Room outlier analysis on Bosch interference dataset. . . 90
6.4 Parameters used on the Frame Stitching algorithm, for the ground-truth model generation. 97 6.5 Person position in relation to the LiDAR coordinate frame on the Human dataset. . . 116
6.6 Euclidian cluster parameters to select only the ROI containing the person on the Human dataset. . . 118
6.7 Comparison of Kim’s et al. interferference with our results for equivalent setups. . . 123
1 Electrical Operation conditions for Manta AVT G-504C. . . 146
Glossary
AABB Axis Aligned Bounding Box
ABS Anti-lock Braking System
ADAS Advanced Driver-Assistance Systems
ADC Analog-to-Digital Converter
AI Artificial Intelligence
AM Amplitude Modulation
AMCW Amplitude Modulated Continuous Wave
API Application Programming Interface
AVT Allied Vision Technologies
CAN Controller Area Network
CoC Circle of Confusion
CCD Charge-Coupled Device
CGAL Computational Geometry Algorithms Library
CMOS Complementary
Metal–Oxide–Semiconductor
CNN Convolutional Neural Network
COCO Common Objects in Context
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
CV Computer Vision
CSV Comma-Separated Values
CW Continuous Waveform
DARPA Defense Advanced Research Projects Agency
DETI Departamento de Eletrónica, Telecomunicações e Informática
DLT Direct Linear Transform
DoF Depth of Field
ETH Eidgenössische Technische Hochschule Zürich
FOV Field of View
FM Frequency Modulation
FPS Frames Per Second
GPS Global Positioning System
GUI Graphical User Interface
ICP Iterative Closest Point
IMU Inertial Measurement Unit
IR Infrared Radiation
IRIS Lab Inteligent Robotics and Inteligent Systems Laboratory
IP Internet Protocol
IT Instituto de Telecomunicações
KITTI Karlsruhe Institute of Technology and Toyota Institute
laser light amplification by stimulated emission of radiation
LiDAR Light Detection And Ranging
LoS Line of Sight
ML Machine Learning
MATLAB® MATrix LABoratory®
LCM Lightweight Communication and Marshalling
mAP mean Average Precision
MEMS MicroElectroMechanical mirrors
MP MegaPixel
MRPT Mobile Robot Programming Toolkit
MSL Middle Size League
NGO Non-Governmental Organization
NN Neural Network
OBB Object Bounding Box
OpenGL Open Graphics Library
OpenCV Open Source Computer Vision Library
PASCAL Pattern Analysis, Statistical Modelling and Computational Learning
PASCAL-VOC Pattern Analysis, Statistical Modelling and Computational Learning Visual Object Classes
PCAP Packet Capture
PCD Point Cloud Data
PCL Point Cloud Library
PnP Perspective-n-Point
PPM Portable Pixmap
RaDAR Radio Detection And Ranging
RAM Random-Access Memory
RANSAC Random Sample Consensus
R-CNN Region Convolutional Neural Network
RECPAD Portuguese Conference on Pattern Recognition
RNN Recurrent Neural Network
RPM Rotations Per Minute
ROI Region of Interest
ROS Robotic Operative System
SDK Software Development Kit
SLAM Simultaneous Location and Mapping
SPP Spatial Pyramid Pooling
SONAR Sound Navigation Ranging
SoTA State of the Art
SVD Singular Value Decomposition
TCP Transmission Control Protocol
TOF Time of Flight
UDP User Datagram Protocol
VLAN Virtual Local Area Network
WHO World Health Organization
XML eXtensible Markup Language
YAML YAML Ain’t Markup Language
YARP Yet Another Robot Platform
CHAPTER
1
Introduction
The day is September 27, 1908. The first Ford Model T has left the factory and with it the history of the automobile and transportation was forever changed [1], [2]. The Model T was not the first automobile, neither the first powered wheeled vehicle, but was the first mass-produced car in an assembly line [2], [3]. Other vehicles have preceded it, from steam carriages and coaches on the Victorian Era of Steam and electrical cars in 1888 [4], to internal combustion engines with hydrogen, kerosene and crude [5]; but none was so influential, memorable and widely adopted.The model T and other automobiles at an affordable price proliferated, allowing the middle-wage class to own motorized vehicles [2], [3]. The massification of such vehicles, while improving mobility of the middle class on a society at the brink of modern industrialization, started what is now one of the leading causes of mortality in the world: road accidents [6]. On the United States of America, such events triggered the foundation of the Automobile Safety League of America in 1930, which enforced the usage of seatbelts and padded dashboards. However, several decades would have to unfold before road safety became a major concern for governments, automobile manufactures, urban planners and Non-Governmental Organizations (NGOs).
Despite all the efforts and technological advances, the number of annual global road traffic deaths is rising and it reached 1.35 million in 2018, being the leading cause of death for people aged 5 to 29 years [6]. That means that every 23 seconds a road user will die from an accident [7]. Actions from multidisciplinary partners are being taken to reduce this growing number, such as road safety awareness campaigns, heavier fines, stricter regulations on road and automobile safety, inspection operations and driver assistance systems [6], [8]. However, only 40 countries in the world have legislated on road safety and vehicle manufacturing that follows the 7 most important directives on road safety issued by World Health Organization (WHO) [7]. The list of countries with non-conforming legislation includes not only “developed” countries from Europe and the United States of America, but also “sub-developed” countries in Asia and Africa (see more statistics in Figure 1.1).
accidents and injuries on the road [6], [9], causing up to 90% of the road accidents [6]. Distractions, fatigue, bad decisions and reckless behavior are some main reasons for human error. To reduce this grim number, manufacturers and tech companies put their efforts on developing driving systems that can aid drivers making decisions or even systems that make decisions on their own, such as adaptive headlamps and collision avoidance systems, respectively.
Figure 1.1: Social media visuals produced by the WHO for raising awareness on road safety. The
data is available on the Global status report on road safety 2018 [6] and the graphics material source can be found on WHO website [10].
Since the appearance of Advanced Driver-Assistance Systems (ADAS), consumers, experts and governments hoped that “smarter” cars would result in safer roads. Several studies have been conducted on autonomous driving technology and ADAS [9], [11]–[15], which appears as one of the most promising solutions to mitigate road accidents. Research shows that solving or at least reducing the impact of this problem could be “easy”: remove the human driver from the driving process and replace it with autonomous driving technology. However, the problem of making autonomous driving vehicles is rather complex and has gathered the attention of both technical and non-technical personnel. Newspaper articles, blog posts, podcasts and interviews have been published over the last years, revealing the interest on the topic. Artificial Intelligence (AI), computer vision and autonomous driving startups have boomed in the last years and tech giants have revealed their plans to conquer this market segment.
The mission of the automobile industry is to improve the way humans move from a place to another, allowing faster, cheaper and more convenient transportation [3], [5]. While such a
mission has been fulfilled, reshaping society and cities in the process, some of their downsides are yet to be solvable, such as road safety.
1.1
Scope and Motivation
To build an autonomous driving vehicle as the solution to improve road safety, one needs to be capable of perceiving the world surrounding it, both accurately and in real time. The quality of the data gathered is as crucial as its diversity, which is addressed with the usage of multiple sensors. A multiple sensor approach to autonomous driving allows:
• Data diversity: different aspects of the surrounding environment are measured, pro-viding complementary data;
• Data redundancy: multiple sensors complement the data gathered by each other, due to measurements of the same physical phenomena with overlapping Field of View (FOV); • Data robustness: since multiple sensors gather information about the vehicle surround-ings in different formats, individual sensor weakness and limitations are circumvented, creating a more realistic and accurate model, even in scenarios where one sensor cannot operate properly or reliably.
Autonomous driving vehicles are equipped with a wide range of sensors and systems, from parking sensors, crash detection, traffic signal detection, adaptive headlights, Anti-lock Braking System (ABS), adaptive cruise control, among others. Regarding the perception of the environment surrounding the car, the most used sensors are detailed below, with their common position and FOV being detailed in Figure 1.2.
• Camera: Most accurate sensor to represent objects, given good visibility conditions. Enables a good semantic view of the world, allowing the car to differentiate objects and understand their actions.
• Radio Detection And Ranging (RaDAR): capable of long range object detection, RaDARs uses radio waves to detect obstacle distance and velocity. It has a good performance on adverse weather conditions but lacks angular resolution to enable the distinction between different objects (cars, trucks, persons, etc.)
• LiDAR: medium range laser sensor that provides an accurate, but sparse, tridimensional view of the world.
• Sound Navigation Ranging (SONAR): small range sensor that uses sound waves to detect with precision obstacles nearby. Commonly used on to allow automatic parking. Since Defense Advanced Research Projects Agency (DARPA) Grand Challenge on the Mojave Desert, in 2004 and 2007, modern automotive LiDAR has become one of the leading technologies for autonomous driving cars [17]. Velodyne reinvented the sensor that is now considered by many automotive companies, tech enthusiasts and experts as the crucial sensor for the fully autonomous driving cars (level 5 automation), due to its precision when creating a tridimensional model of the surrounding environment. Despite LiDAR technology providing
Figure 1.2: Example of a self-driving taxi (Homer) multiple sensor setup and their relative positioning.
Source: Voyager [16].
on the necessity of using LiDAR for autonomous driving. For instance, while some authors believe LiDAR to be indispensable [9], [18], [19], Tesla is building an autonomous driving solution without LiDARs, relying solely on RaDAR and cameras (for more information see [9], [18]).
LiDAR sensors map their surroundings because of their capability to precisely measuring depth, with a few centimeters of error, over distances that can range until 100 m [19], [20] or even 300 m in certain conditions. This sensor is being employed as the solution to help cars understand the world around them, but there is a caveat that is not being addressed nor deeply considered by the scientific community: LiDARs may inadvertently produce interference.
Since LiDAR is an active sensor, it sends laser pulses (tenths to a few thousand of pulses per second) to the scene. Laser’s divergence and the detector’s FOV are small, but what happens when a hundred or thousands cars with LiDARs “flood” the streets? Do they interfere mutually, i.e., the laser emitted by a LiDAR on a car “A” interferes with the detector of the LiDAR of a car “B”? And if so, how do they interfere and what are the consequences?
In a society where the scientific community and automotive manufacturers are concerned about deploying autonomous driving technologies and cars to the streets as soon as possible, and governments lack scientific expertise to legislate on the topic, this research work focuses on understanding what happens if multiple LiDARs interact.
Before autonomous driving cars can replace the human driver and therefore reduce the growing tendency for road accidents and deaths, we need to understand if the sensors we are using for the vehicles to perceive their surroundings with better accuracy than us are scalable to a scenario when the majority (or even totality) of cars are autonomous. Failing to do so, will not only delay the exciting advent of autonomous driving technology, but also undermine one of the solutions to tackle the ever-growing problem that started such endeavour in the first place: reducing the amount of deaths on the road.
about multiple LiDAR interference by simply providing an answer to the question: What
happens if two LiDARs that coexist in the same space are switched on simultaneously?
1.2
Objectives
This Master’s thesis main objective is to study the behaviour and impact of interference between LiDARs. This study also aims to describe the impact of interference in objects of interest on the LiDAR FOV. Those objects’ position will be estimated by an algorithm to be developed, based on the results of object detection on image.
To do this, and before making any conclusions on LiDAR interference, we must attain several intermediate objectives:
1. Implement an algorithm for performing extrinsic calibration between the camera and the LiDAR of the experimental setup;
2. Merge the information obtained with the camera and the LiDAR to construct a more realistic scenario of the environment;
3. Detect objects of interest in camera images and estimate their position on the LiDAR point cloud;
4. Devise experiments with different metrics being varied and create a rich data set for assessing LiDAR interference.
Fulfilling objective #1 indicates that we are able to exchange information between different sensors, by converting data between the sensors’ coordinate frames, crucial on a multi-sensor setup. This allows to represent the information of the coordinate frame that is more convenient, either for processing or visualization.
Objective #2 allows merging information between different sensors, which broadens the way we can represent, process and understand the data. More representative model of the environment can be made, through methods that merge the different nature of the sensory data used, which, in our case, has depth and color information.
Objective #3 allows us to take advantage of the data conversion between the sensors, enabled by objectives #1 and #2. Since detecting objects of interest on an image is faster and more reliable than detecting objects on LiDAR data, the objects are detected on image and their position on the LiDAR coordinate frame is estimated, from 2D to 3D. The fulfilment of objectives #1 to #3 ensures that we are able to select Region of Interest (ROI) on the image, obtain their correspondent ROI on the LiDAR point cloud and then analyze the impact of interference on this specific ROI.
Objective #4 requires the creation of an extensive and rich data set, which enables us to perform a comprehensive study of the LiDAR interference behavior.
1.3
Document Structure
Chapter 1 - Introduction: contextualization of the topic and scope of this document, the motivation for this research and what it attempts to clarify. Briefly describes how the document is organized and the contributions associated with this research;
Chapter 2 - State of the Art: LiDAR technology and the available research on LiDAR interference are stated on this chapter as the foundation for this research. Since camera and image object detection are also used, an overlook of camera principles, object detection in image, camera and LiDAR calibration and sensory data merging between camera and LiDAR is also presented. On this chapter an overview of online datasets available for this research are also presented;
Chapter 3 - Camera and LiDAR Calibration: multi-sensor approach to LiDAR inter-ference requires not only a good intrinsic calibration of each sensor, but also a good calibration between the two. This chapter explains how the camera and LiDAR are calibrated: both intrinsically and extrinsically;
Chapter 4 - LiDAR and Camera Data Fusion: merging the information between multiple sources (sensors) generates more realistic world models. This chapter describes how calibrated online datasets and obtained experimental data can be used to provide color information to LiDAR depth data;
Chapter 5 - Object Detection: performing object detection on image allows the de-tection of Regions of Interest. Using the LiDAR and camera extrinsic calibration from Chapter 3 and image detection techniques detailed on the Chapter 2, this chapter explains the method to create correspondences between the objects on image and their 3D counterparts on LiDAR;
Chapter 6 - LiDAR Interference: the study on LiDAR interference is detailed on this chapter, from developing algorithms for estimating ground truth models to techniques for interference analysis. The methodology used to obtain the experimental dataset and its organization is detailed. Bosch© dataset is also described and analyzed. The
outcomes of the previous chapters are used to assess the interference on objects of interest;
Chapter 7 - Conclusions and Future Work: a summary of the results and outcomes presented and discussed across the document and a global view of this research, along with some topics for future work.
1.4
Contributions
This thesis outcomes contributed to State of the Art of the topic with the publication of a paper entitled “Impact of Interference on Time-Of-Flight LiDAR” in the proceedings of the conference “Portuguese Conference on Pattern Recognition”, accompanied by a poster presentation on the same conference. The paper was written by Pedro Martins, António Neves, Miguel Drummond and André Albuquerque. A copy of this article is presented in Appendix A: RECPAD paper.
The LiDAR and camera sensory data fusion algorithms were also presented on poster format on Students DETI, a fair on Departamento de Eletrónica, Telecomunicações e Informática
to showcase the research and projects developed by students, professors, researchers and companies.
The software developed for this thesis can be accessed on the author’s personal GitHub page, on https://github.com/martinspedro/lidar-interference-analysis. It contains all the software developed during this thesis, which can be used alongside Robotic Operative System (ROS). Several software packages were developed to meet the requirements of the thesis, such as: extrinsic camera calibration, LiDAR and camera data merging, ROI estimation on the point cloud from the camera image and point cloud bounding box estimation, LiDAR interference analysis and image object detection. Libraries to manage the experimental dataset data, generate ground truth model from LiDAR data, visualize and interact with camera and LiDAR data and statistically analyze the interference were also developed. Python scripts for graphic generation of results are also provided, along with an extensive set of bash scripts to automate data analysis, running packages against the whole dataset, graphics generation and other utilities. All the code is documented using Doxygen.
The dataset generated for the LiDAR interference analysis contains more than 600 GB of pre-processed raw data, ready to be analyzed, which corresponds to more than 4 hours and 20 minutes of playable data. The dataset is also thoroughly documented, with log and README files. Camera calibration data and the rigid body transformation between sensors is also provided. This dataset, to the best of our knowledge, is the only dataset described on academic work which contains interference data from a multi-line (tridimensional) LiDAR scanner and a camera feed.
Furthermore, during the development of the software on this thesis a contribution to the open-source package darknet_ros [21], that implements a neural network for real-time detection of objects in image was made. This package is maintained by the Robotic Systems Lab of Eidgenössische Technische Hochschule Zürich and the contribution made improves the synchronization between messages, which is better detailed in sub-Section 5.2.1.
CHAPTER
2
State of the Art
In this chapter we detail the foundation for this work, by conducting a thorough and extensive analysis to the current state of the art of several topics required on our work, giving some considerations about which approaches are followed and why.This chapter is organized as follows. Section 2.1 contains a brief analysis of the current possibilities for middleware and message passing software between programs and systems, allowing a better code workflow and modularity. On Section 2.2, an analysis of the currently available online datasets for autonomous driving is given, along with a comparison of their key features. Section 2.3 details the usage of camera in computer vision applications, its geometry, intrinsic calibration and image processing libraries. On Section 2.4, considerations on automotive LiDAR are given regarding its construction and technology, accompanied by an analysis of point cloud manipulation software. Having the camera and LiDAR detailed, an in-depth analysis of their extrinsic calibration procedures is given. Section 2.6 details three different techniques of sensor-fusion between camera and LiDAR. On Section 2.7, object detection on image is detailed, presenting some techniques and commonly used public image datasets for training. Lastly, Section 2.8 gathers and compares all the work on LiDAR interference, either from the standpoint of an attacker wanting to purposely interfere with a LiDAR and academic research seeking to understand the phenomena.
2.1
Middleware Software and Message Passing
Analyzing the LiDAR interference is our primary goal, but several other objectives have been established for this work, on Section 1.2. Fulfilling them requires using a LiDAR and a camera, through a multi-sensor approach, which enables us to analyze interference as a pure interference phenomenon between sensors and understanding its impact on ROI.
Developing such software, however, requires more than standalone program files, due to the complex interactions between sensors and data. Is crucial, for the success of this work, that standalone programs, which perform their own set of tasks, could communicate with
With those considerations in mind, we require the usage of a middleware, a software layer that can communicate directly with the operative system kernel and hardware devices, abstracting its behavior through an Application Programming Interface (API) that allows for the programs to communicate with the hardware devices, the kernel and among themselves [22], [23]. Such software also implements a message passing infrastructure between peer-to-peer and/or client-server communication, for data exchange and marshalling [22].
2.1.1 Middleware Software Libraries and Robotics
Several middleware libraries, written in many languages, are available online, capable of message passing between programs, from which the following are feasible for our work:
1. PocoLibs: an Open-Source library developed by OpenRobots that allows for communi-cation between process on a computer system, with support for real-time [24];
2. Lightweight Communication and Marshalling (LCM) library: focused on low latency and high-bandwidth, this library is developed with real-time systems in mind, providing data marshalling and message passing between a publisher and subscriber [23]; 3. ZeroMQ: a framework with bindings for many languages that can be used to transport messages between process, Transmission Control Protocol (TCP) or multicast, for example. Allows a series of different connection types [25].
4. Yet Another Robot Platform (YARP): a collection of programs, written in C++, focused on enabling peer-to-peer communication, supporting TCP, User Datagram Protocol (UDP), multicast, among many other protocols [26];
5. ROS: a complex and extensive framework for developing and test robot software. It was developed with the goal of simplifying the task of building complex and robust robot behavior, independently of the hardware and mechanical platform. Offers a collection of tools, libraries and conventions [27];
6. ROS 2: a more recent iteration of the ROS development, focused on extending ROS capabilities to new use cases, such as multiple robot teams, small embedded platforms and real-time systems [28];
7. Mobile Robot Programming Toolkit (MRPT): a mobile robot library, written in C++, which aims to provide a set of tools and libraries for computer vision, motion planning and Simultaneous Location and Mapping (SLAM). It also directly supports hardware and implements message passing middleware [29];
While the first 4 listed alternatives are focused on generic communication between pro-cesses1, inter-process and common communication protocols, the last 3 are oriented for robotics,
providing a comprehensive set of tools for robotics development besides the message passing. The latter is preferred, since it has the potential to unburden us from the development of generic debug tools, data visualization and logging. Therefore, we exclude PocoLibs [24], LCM library [23], ZeroMQ [25] and YARP [26].
1
Despite its name, Yet Another Robot Platform (YARP) is not a robotic operative system or framework, but instead a message passing framework for robots.
Since our work is also not oriented for real-time applications, we will not consider ROS 2 [28]. Comparing the three robotic platforms, MRPT is focused on robot mobility, SLAM, computer vision and motion planning, while ROS is more generic and widely used, containing all the capabilities of MRPT. Also, ROS is more widely accepted and has a larger community support.
Therefore, for the middleware layer and message passing we will use ROS, taking also advantage of all the other tools it provides and its Open-Source third-party libraries, such as multi-sensor data visualization software, network logger, data recorder and player, debuggers and other utilities.
2.1.2 Robotic Operative System
ROS has two long-term support versions, whose release schedule is synchronized with Ubuntu operating system [27]. At the time of this work, two versions are currently available:
ROS Kinetic Kame and ROS Melodic Morenia.
ROS Kinetic was released in May 2016 and support ends on April 2021. ROS Melodic was released in May 2018 and its end of life is on May 2023. At the time of this work, the most recent version, ROS Melodic, is stable, free of release bugs and most of the ROS Kinetic libraries have already been ported. Therefore, our choice relies on ROS Melodic Morenia, more specifically, version 1.14.3. To work with this version of ROS, the C++ standard used is version 14.
2.2
Datasets
Training an autonomous vehicle to be capable of driving itself is a complex challenge with multiple parts and problems to solve: recognizing other road users (people, vehicles, cyclists); understanding traffic signs; track and estimate the movement of the road users; plan the path; accurate modelling of the environment; among many others. Developing models that can describe and solve these problems, but also testing systems and other software, requires huge amounts of data.
To accelerate this progress, a collaborative effort is being made to release free datasets online for public usage under permissive licenses. These datasets vary in their objective, sensory data available, conditions which data has been acquired, driving conditions and format on which they are provided, among other aspects.
Despite this work does not intend to study autonomous driving, the algorithms developed for calibration, sensor fusion and computing the correspondence between objects of interest in image and point cloud are not exclusive for application on datasets with interference. Therefore, using public available datasets (which do not have interference) not only allows the development of those algorithms before experimental data can be gathered, but also widen the applicability of such work, without losing their applicability to the particular situation of LiDAR interference.
During the months when this research work was carried, several new datasets were made available online, such as nuScenes [30] and Waymo [31]. Since no new contributions to this work were expected from their usage, no further research on these was carried.
Other older datasets were considered and tested. On the next sections, a brief summary is given about the ones that have sensory data from camera and LiDAR. Those were the candidates to be used on this work.
2.2.1 Ford Campus LiDAR Dataset
Gathered in 2009, in Michigan, the Ford Campus LiDAR dataset contains camera, LiDAR, Inertial Measurement Unit (IMU) and Global Positioning System (GPS) data [32]. The dataset consists of two test scenarios, one inside the Ford Research campus and the other on downtown Dearborn. A small subset of the former test scenario is also provided.
The modified Ford F-250 pickup truck, which can be seen in Figure 2.1, uses 3 sensors for navigation [32]: one 3D LiDAR, Velodyne HDL-64E LiDAR [33]; one camera: Point Grey Ladybug3 omnidirectional camera; and two 2D LiDARs: Riegl LMS-Q120 lidar. For more details about the sensors, their relative positioning, data formats and files see [32].
Figure 2.1: Ford 250 pickup equipped with some sensors described in the Section 1.1. On the top,
the Ladybug omnidirectional camera, on the back the IMU and GPS unit.
The data is provided in raw format, accompanied by log files with timestamps, GPS and LCM2 logs with all raw data. The images from an omnidirectional camera are stored on
Portable Pixmap (PPM) and LiDAR data on Packet Capture (PCAP) format, from the TCP connection socket.
Rectified and synced data is stored in MATrix LABoratory®
(MATLAB®
) .mat files. Along with the raw data and synced and rectified .mat files, source code is also provided for parsing the raw data, visualizing the LCM logs and pre-processed data on MATLAB®
. Software that can render textured point clouds, implemented in C and Open Graphics Library (OpenGL), is also provided.
2
For more information on Lightweight Communication and Marshalling (LCM) protocol for estimating delays between sensory data registration on master-slave systems, see [23].
2.2.2 KITTI Dataset
A well known dataset for researchers of computer vision, autonomous driving and Machine Learning (ML), Karlsruhe Institute of Technology and Toyota Institute (KITTI) was recorded in 2011 and released to the public in 2013. This dataset contains various driving scenarios: suburban, highways, residential and campus areas; with trucks, cars, cyclists and persons. Alongside with data for testing, calibration measures are provided for all sensors.
The test car, a Volkswagen Passat, is equipped with two stereo pairs, one with color and one with gray cameras, a LiDAR, an IMU and a GPS sensor, is shown in Figure 2.2. Data from all four cameras is stored on Portable Network Graphics (PNG) format, LiDAR measurements stored as a binary float matrix, and GPS and IMU are saved textually. Additionally, the raw data logs containing the timestamps and the transformations between the sensors are also provided. Labeled data is also available for some test scenarios in eXtensible Markup Language (XML) files.
Figure 2.2: Volkswagen Passat equipped with the sensors described in Section 1.1 graphs, for the
KITTI dataset. On the top, the Velodyne LiDAR and below the 2 stereo pairs (color and grey). On the back, the IMU and GPS systems are present.
Along with the data and calibration parameters, several tools written in C++ or MATLAB®
are also provided. The dataset offers two types of data categories: (1) unsynced and unrectified data or (2) synced and rectified data. The latter type is the one relevant for sensory calibration purposes.
The sensory apparatus contains 2 PointGray Flea2 greyscale and color cameras, a Velodyne HDL-64E LiDAR [33], among others less relevant sensors [34].
2.2.3 Udacity Self-Driving Car Nanodegree Dataset
The data from Udacity online course on self-driving car is also publicly available [35]. This dataset dates to 2016 and was gathered to develop level 4 autonomy vehicles, containing much more data diversity than the previous two. The data available contains images from 3 color cameras, LiDAR, IMU and GPS, among other sensory data, such as speed, braking, etc.
This information is not available in raw format, only structured in ROS .bag files. Some tools for visualizing and interacting with their data are provided for ROS [35].
Not much information about the sensors and their positioning is publicly available, and the dataset does not contain information for calibration.
2.2.4 Summary
Table 2.1 summarizes all the relevant data from the datasets [32], [34], [35]. While there are few differences between the relevant types of data gathered, major differences can be observed on the format in which they are provided.
The diversity of scenarios and size of the dataset is also an important aspect to consider. On this matter, KITTI and Udacity’s are superior to Ford dataset, with KITTI providing the largest dataset in quantity, with already rectified and synced data.
Characteristics Datasets
Ford Campus KITTI Udacity
Sensors and Data
LiDAR X X X Color Camera X X X Grey Cameras X Stereo Camera X X Omnidirectional Camera X IMU X X X GPS X X X Driving data3 X
Data Formats and Tools
Raw data available X X
Data parsing tools X X X
Rectified data X X X
Synced data X X X
Calibration parameters X X
Raw data for sensors calibration X X
ROS data compatibility X4 X
Date of acquisition 2009 2011 2016
Table 2.1: Comparison between the datasets more appropriated to this thesis objectives.
Comparing the 3 datasets using the information in Table 2.1 and the previous sections, Udacity and KITTI are better suited to the purposes of this work. They provide a larger
3
Other driving data includes, but it is not limited to: vehicle speed, joints states, twist, brakes, suspension, fuel level, Controller Area Network (CAN) bus data, steering, tire pressure, among many others.
4
While Udacity dataset is the newest and provides direct out-off-the-shelf integration with ROS, that type of integration can also be achieved for KITTI by using other tools, such as kitti2bag [36].
dataset than Ford, have ROS compatibility and the tools provided are open-source and not developed to be used on proprietary software.
Since Udacity dataset integrates more easily with ROS than KITTI, providing also some tools for ROS preliminary tests, learning and earlier development stages were based on this dataset. Later, due to the lack of calibration parameters and raw data for sensors calibration5,
KITTI dataset was used instead.
2.3
Camera as a Sensor on Computer Vision Applications
Vision is the human sense most relevant to how we perceive the world and how we navigate in it [37], [38]. Replicating this ability on machines, through the usage of cameras, is a widely researched topic on computer vision and instrumentation [38]. The most common cameras, as our eyes, take advantage of the pinhole effect: a small hole (or pin), that is used to spatially filter the non-focused light beams through an aperture or lens [38]–[40], producing a mirrored, but focused image.
On the following sub sections, basic notions of the current status of camera models and camera calibration are given. Since this research is focused strictly to the application of a camera as a sensor on computer vision, the overview provided abstains from describing extensive research on camera technologies and camera models for the field of computer vision. From a purely working principle and technologies, several camera types can be named, such as catadioptric, plenoptic, biprism, among others [40]. For further research, the reader might be interested on [39]–[43].
2.3.1 Camera Geometry
A world point in a 3-dimensional Euclidean space can be represented by a vector with 3 real coordinates: (X, Y, Z) and an image pixel of a digital image is typically represented as an element of a 2D matrix with integer coordinates (u, v) [44]. From a mathematical standpoint, a camera can be considered as a mapping tool between the world objects in three dimensions into a 2D image plan, such as depicted by Equation (2.1), which leads to different mathematical models used to describe the camera [40], [44].
(X, Y, Z)−−−−−−−−−→ (u, v)transformation (2.1) When describing camera models, several considerations should be made [40]:
• Central Camera Models vs Non-Central Camera Models: quantifies the number of optical centers on a camera. A central camera has only one optical center through which all light rays pass through to reach the film and a non-central has two or more optical centers;
5
Note that, despite the Udacity dataset not providing data to ease the calibration of its sensors, such as camera intrinsic calibration or the extrinsic calibration between LiDAR and camera, such calibration parameters can be obtained from this data. Those parameters are not as accurate as those obtained using a proper calibration setup and are out the scope of this work (being another research topic on itself) and therefore no effort will be dedicated to this topic.
• Global vs Local models: indicates if the parameters of the model are the same for all the image/FOV of the camera or if different regions of the image have different parameters, respectively.
For the application currently envisioned, global central camera models are considered. For a more detailed explanation of the topics above or an extensive overview of non-classical camera models, different types of projections and other aspects of modelling cameras, see [39], [40].
Pinhole Camera Model
Based on the pinhole effect, represented in Figure 2.3, the pinhole camera model is the most common camera model [39]. The pinhole effect consists on using a small hole (or aperture) to spatially filter the light rays by angle of incidence, reducing the amount of light rays overlapping when coming from different incident angles through the pinhole, which otherwise would “blur” the image.
Figure 2.3: Pinhole effect on a small aperture. Source: [39].
The pinhole camera model describes the perspective transformation on Equation (2.1) as a central projection where all light rays meet at the camera center, Fc. A diagram of the pinhole camera is shown in Figure 2.4.
On this model, the z-axis is collinear with the optical axis6, aligned with the direction the
camera is facing. The image plane is at z = f , where f is the focal length, being intersected by the optical axis on the principal point, with coordinates (cx, cy) on the image plane axis. Note that the principal point is not the origin of the referential on the image plane, but instead the middle point, being the origin located in the top left corner, with a downward y-axis.
Due to the nature of the pinhole camera model, it is more convenient to use a Projective Space instead of the more common Euclidean Space [39], [40], [44]. On a Projective Space, as in a central camera model, all of its points are oriented through a single point [44]. Therefore, by using a projective space instead of a Euclidean for addressing the transformations of a pinhole camera model, becomes more intuitive and relatable to the actual geometry of the model. Despite projective spaces using homogeneous coordinates and non-Euclidean geometry, homogeneous coordinates can be converted for cartesian coordinates without loss, if the homogeneous points are possible to represent on a Euclidean coordinate frame [39], [44].
6
Figure 2.4: Pinhole camera model. Source: [45]
Projective Geometry
A tridimensional Euclidean Space can be represented on a Projective Space using 4 coefficients. Therefore, the previous tridimensional vector can be rewritten in homogeneous coordinates as (wX, wY, wZ, w) [44]. If w 6= 0, we can transform from the Projective to Euclidean Space, obtaining the world representation of such points by dividing the homogeneous point by w. For the cases in which w = 0, we have purely projective points at infinity, that only exist in Projective Space [44].
The relation depicted in Equation (2.1) can be expressed in projective geometry as in Equation 2.2, where P represents the projection matrix that performs the transform from world points to image pixels.
u v 1 = P × X Y Z 1 , if w = 1. (2.2)
The projection matrix in a pinhole camera model is the result of multiplication of two matrices: the camera matrix (or matrix of the camera’s intrinsic parameters), K; and a joint rotation and translation matrix, [R|t], where R is the rotation matrix and t the translation vector. The combination of these matrices is given on Equation (2.3) and the full camera transform is expanded on Equation (2.4), through the replacement of Equation (2.3) in Equation (2.1).
u v 1 = P z }| { fx 0 cx 0 fy cy 0 0 1 | {z } K rxx ryx rzx tx rxy ryy rzy ty rxz ryz rzz tz | {z } [R|t] X Y Z 1 (2.4)
The matrix of intrinsic parameters is used to represent the calibration parameters of the camera: the focal lengths of each axis, fx and fy, which scale the x and y axis; the principal point offset to the axis origin, cx and cy, for the x and y-axis, respectively, that translate the image center with relation to the origin of the camera referential.
The joint rotational-translation matrix, [R|t], is also known as the extrinsic camera parameters, translates the coordinates of a point to the Cartesian coordinate system fixed to the camera.
Cameras, Lenses and Distortion
Using a simple pinhole for filtering light rays reduces the amount of light reaching the camera sensor. Therefore, a lens is used to collect the light and focus it onto the image plane, providing the same effects as the hole [40], while allowing more light to be captured. The paraxial refraction model is represented in Figure 2.5, from which we can consider the Pinhole Camera model by letting z0= 0, which implies z′= f and the image is formed on the focal
point.
Figure 2.5: Pinhole effect on a lens. Source: [39].
Since lenses introduce non-linear distortion, the pinhole camera model needs to be extended to included radial and tangential distortion parameters [46]–[48]. Such parameters indicate how the non-linear distortion of the camera lenses affects the image projection on the camera plane [39], [40] and how can this be corrected [45], [46], [48]. The equations behind that expression are beyond the scope of this introduction and therefore will not be detailed here, but the effect of lens aberration can be seen in Figure 2.6.
For more information on the pinhole camera model and lens distortion, see Hartley and Zisserman Multi View Geometry in Computer Vision [44], Hata and Savarese course notes [39] or Open Source Computer Vision Library (OpenCV) official documentation [45].
(a) (b) (c)
Figure 2.6: Barrel (a) and Pincushion (b) distortion caused by the non-linear behavior of the camera
lens. The effect of an ideal lens, with no distortion, is shown on (c). Source: [39].
2.3.2 Camera Intrinsic Calibration
Calibrating a camera is the process in which the parameters of the model that describes its behavior are determined. For the extended pinhole camera model, this means determining the parameters of its intrinsic matrix, K, but also the distortion coefficients of the lens used [39], [44], [46], [48]. These parameters are called the intrinsic parameters and are independent of the scenario being viewed, if the focal length is kept constant. On the other hand, extrinsic camera parameters are different for every situation and therefore scenario dependent [45], [46], [48].
Zhang proposed in 2000 a method for calibrating a camera that unlike previous work, does not require a special setup, expensive experimental apparatus or complicated calibration patterns: it uses a planar object with a known pattern, which can be freely moved in front of the camera [49]. Only two images are necessary for camera calibration, as long as the pattern movement between them is not a pure translation. The estimation of the camera intrinsic parameters and distortion caused by the lenses can be determined by solving Equation (2.4), once several 2D ↔ 3D correspondences have been established.
Zhang’s algorithm differs from other algorithms at the time that attempted to calibrate a camera by rotating it on a static environment. Other alternatives include Tsai’s algorithm [44], [50], a 2-stage technique for camera calibration first presented in 1987, that does not determine the camera center. A Direct Linear Transform (DLT) algorithm can also be used to determine the camera calibration parameters (see [44]).
Following Zhang’s algorithm, the correspondences between the 2D and 3D representations of the pattern can be made by finding the corners or circles on planar patterns [44], [51]. Chessboards are the most common patterns used today [51], allowing corner detection for each cell (Figure 2.7). Since the dimensions of the cell and their arrangement are known previously, it is possible to compute the orientation of the chessboard [44], [45], [49]. On those conditions, revisiting Equation (2.4), only the K matrix is left to be determined.
The intrinsic calibration matrix coefficients can then be refined using a maximum-likelihood estimation to minimize the error using a non-linear method, such as Levenberg-Marquardt optimization [53]. Other optimizations are possible, such as Powell’ dog-leg non-linear least squares technique [54]. Further reading can be conducted on [39], [40], [44], [55].
Bouguet in 2010 provided a Camera Calibration toolbox for MATLAB®
Figure 2.7: Chessboard calibration pattern with the detected corners overlapped. Source: [52].
as the basis for many camera calibration toolboxes nowadays, such as OpenCV, which is also based on the work of Zhang [51].
2.3.3 Image Processing Libraries for Computer Vision
In the field of computer vision, several tools and libraries are available, capable of performing low image operations, image detection, filtering, among others. This sub-section will list and briefly describe these tools.
• BoofCV [56]: written in Java, this library is oriented to real-time image operations, such as low-level image processing, camera calibration and feature/object detection, tracking and recognition;
• Dlib [57]: modern C++ toolkit containing Machine Learning algorithms and tools, some on the field of computer vision;
• MATLAB® Computer Vision Toolbox™ [58]: toolbox for computer vision
algo-rithms, 3D vision, and video processing systems. Can perform object detection and tracking. Detects, extracts and matches features;
• Open Source Computer Vision Library (OpenCV) [51]: open source library for C++, Python and C that implements the state of the art algorithms on computer vision; • Vlfeat [59]: open source library for C that implements many state of the art algorithms
on computer vision;
• SimpleCV [60]: Open source framework, written in Python, that can be used to implement computer vision software using other libraries;
Since a goal of this work is to use Open-Source tools as far as possible (instead of using closed source tools or code that is harder to integrate with other libraries), which is common among the automotive industry on the field of autonomous vehicles, MATLAB®
Computer Vision Toolbox™ does not suit. The code of this work is mainly developed in C++, so Python and Java libraries will not be considered, resulting only in OpenCV and Dlib, since Vlfeat uses plain C.
OpenCV has a large and active community and is considered by many researchers and industry leaders as the de facto standard library regarding computer vision. Dlib is a robust library, but no significant differences can be seen when comparing with OpenCV. Therefore, the final choice for a computer vision library is OpenCV, also because of the already implemented compatibility with ROS.
2.4
Automotive LiDAR
LiDAR sensors map their surroundings thanks to their high spatial resolution and capability of precisely measuring depth. Already used in topography, spectrography and air pollution studies, LiDAR found its importance on the automotive industry as one of the key sensors of autonomous cars and ADAS [19].
The maps produced by LiDAR, commonly represented as point clouds or mesh clouds models (Figure 2.8), are one of the preferred method for SLAM algorithms, which allow a vehicle without previous knowledge of its surroundings to autonomously navigate them, a crucial task for ADAS on self-driving vehicles.
Point clouds represent the spatial data measured by the LiDAR as points on the tridimen-sional space. The objects represented on a point cloud are formless, as they have no relation with their “adjacent” points, as depicted in sub-Figure 2.8a. A Mesh cloud representation of the point cloud in sub-Figure 2.8b, produced from by the point cloud in sub-Figure 2.8a, is a representation of the same points united by lines, forming a mesh. A mesh cloud conveys form, as can be verified by comparing the point cloud with the mesh representation of the same data. A mesh cloud, contrary to a point cloud, can also represent a close form and therefore having a volume.
(a) Point Cloud representation. (b) Mesh Cloud representation. Figure 2.8: Stanford bunny [61] point cloud (a) and mesh cloud (b)