Fault Injection, Detection and Handling in Autonomous Vehicles

(1)

Fault Injection, Detection and Handling

in Autonomous Vehicles

Daniel Luís Gonçalves Garrido

Mestrado Integrado em Engenharia Informática e Computação Supervisor: Prof. Doutor Daniel Augusto Gama de Castro Silva

Co-Supervisor: Mestre Leonardo Silva Ferreira

(2)

(3)

Vehicles

Daniel Luís Gonçalves Garrido

Mestrado Integrado em Engenharia Informática e Computação

(4)

(5)

Autonomous vehicles have received a lot of focus and attention in the last few years due to their capability to execute tasks in previously unreachable places or that are just too dangerous for a human to perform. While a high level of autonomy brings advantages to the users of these vehicles, it means that a human operator is not always present to monitor their status. If a serious fault arises in the vehicle, this can cause a malfunction that can pose danger to human lives. The field of fault detection and recovery as been responding to these problems in the past few years as these vehicles gained popularity. But the existing contributions in this field are designed specifically for a certain autonomous vehicle. These systems are so specialised that integrating another vehicle in it would be comparable to starting over. This project presents a more open design system that can easily accommodate new vehicles, new faults to be detected and treated and additional detectors in a much faster and easier way than was possible before, to facilitate and encourage the development of these systems in new and old autonomous vehicles. With this idea in mind, a fault detection and recovery system was implemented and integrated in an already existing multi-agent platform capable of coordinating teams of multiple autonomous vehicles of varying types in high-level missions.

To achieve this, several tools, modules and systems were developed for this simulation plat-form that uses in its core Flight Simulator X. A comprehensive test to the capabilities of the simulation platform revealed that it is not as feature rich for the project requirements as previously though, but contained enough to implement the solution. A fault categorization for unmanned autonomous vehicles was created, which classifies potential faults in terms of severity and recom-mends which treatment approach should be used, even in the event of multiple faults being present. A fault injector for the simulation platform was also developed, which is capable of injecting faults in many different configurations, all of which can be fine tuned by the user via a graphical user interface that greatly simplifies this process.

This was followed by the implementation of the fault detection and treatment modules. The fault detection module was developed to detect engine, brakes and communications related faults, separated in three different detectors. Each used a different data driven approach centred around trend and limit checkers that analysed data sent from the simulator to the detectors. The fault treatment modules then listens to the output of the detectors and when a fault is detected it utilizes the fault categorization system to ascertain its severity and pick the best action to handle it.

With the development of the system terminated, tests were performed to the developed solu-tion. These focused on the two mentioned modules and were split in two phases. The first phase focused on testing the fault detection module, with the fault treatment module disable. The second phase focused on the fault treatment module, where some of the conducted tests in the first phase were performed again to verify if this module was beneficial to safety. The detection module was able to detect all the injected faults in the tests with excellent to good detection times. The tests from the second phase also show that the treatment module was effective in preventing dangerous situations caused by the injected faults, while barely consuming computational resources.

(6)

(7)

Veículos autónomos têm recebido muita atenção e notoriedade nos últimos anos devido à sua capacidade de executar tarefas em sítios previamente inacessíveis ou que simplesmente são de-masiado perigosos para um ser humano realizar. Enquanto que um alto nível de autonomia trás vantagens aos utilizadores destes veículos, isto significa que um operador humano não está sempre presente para monitorizar o seu estado. Se uma falha séria surge no veículo, isto pode causar um mal funcionamento que pode por em risco vidas humanas. Com o aumento da popularidade destes veículos, o campo da deteção e recuperação de falhas tem respondido a estes problemas. Mas as contribuições existentes nesta área são especificamente desenhadas para um veículo autónomo apenas. Estes sistemas são tão específicos que integrar outro veículo seria como começar de novo. Este projeto apresenta um design mais aberto que pode facilmente acomodar novos veículos, no-vas falhas a serem detetadas e tratadas e detetores adicionais de uma forma mais rápida e fácil do que era possível antes.Com esta ideia em mente, um sistema de deteção e recuperação de falhas foi implementado e integrado numa plataforma multiagente já existente capaz de coordenar equipas de múltiplos veículos autónomos de tipos variados em missões de alto nível.

Para alcançar isto, várias ferramentas, módulos e sistemas foram desenvolvidos para esta plataforma que usa como centro o Flight Simulator X. Um compreensivo conjunto de testes às capacidades da plataforma de simulação revelou que esta não inclui tantas funcionalidades necessárias para o projeto como as esperadas, mas continha as suficientes para implementar a solução. Um sistema de categorização de falhas para veículos autónomos foi criado, capaz de classificar potenciais falhas em termos de gravidade e recomendar qual a melhor abordagem para as tratar, mesmo quando várias falhas estão presentes no veículo. Um injetor de falhas para a plataforma de simulação foi também desenvolvido. Este é capaz de injetar falhas com diversas configurações, as quais podem ser configuradas pelo utilizador através de uma interface gráfica que simplifica o processo.

Isto foi seguido pela implementação do módulo de deteção de falhas e do módulo de trata-mento de falhas. O módulo de deteção foi desenvolvido para detetar falhas nos motores, nos travões e nos sistemas de comunicação, que foram separadas em três detetores diferentes. Cada um destes usa uma abordagem baseada em dados diferente, centrada à volta de verificadores de limites e tendência que analisam dados enviados pelo simulador para o detetor. O módulo de trata-mento de falhas analisa os resultados dos detetores e quando uma falha é registada este utiliza o sistema de categorização de falhas para determinar qual é a sua gravidade e escolher a melhor ação para lidar com a falha.

Com o desenvolvimento do sistema terminado, testes foram realizados à solução desenvolvida. Estes focaram-se nos dois módulos mencionados, sendo divididos em duas fases. A primeira focou-se em testar o módulo de deteção com o módulo de tratamento desativado enquanto que a segunda focou-se no módulo de tratamento, onde alguns dos testes da primeira fase são repeti-dos para verificar se o tratamento aplicado foi benéfico para a segurança. O módulo de deteção conseguiu detetar todas as falhas injetadas nos testes com tempos de deteção que variam entre

(8)

(9)

1 Introduction 1

1.1 Context and Motivation . . . 1

1.2 Objectives . . . 2

1.3 Methodology and Expected Results . . . 3

1.4 Document Structure . . . 4

2 Contextualization 5 2.1 Faults . . . 5

2.2 Autonomous Vehicles . . . 7

2.2.1 Levels of Autonomy . . . 8

2.2.2 Types of Autonomous Vehicles . . . 9

2.3 UAV Reliability . . . 11

2.4 Aircraft Accident Severity . . . 13

3 Literature Review 17 3.1 Fault Detection Methods . . . 17

3.1.1 Model Free Methods . . . 19

3.1.2 Model-Based Methods . . . 19

3.1.3 Residuals . . . 20

3.1.4 Auxiliary Methods to Fault Detection . . . 21

3.2 Model-Based Diagnosis Tools . . . 22

3.3 Fault Detection in Unmanned/Autonomous Vehicles . . . 23

3.4 Conclusion . . . 25 4 Proposed Solution 29 4.1 Technology Requirements . . . 29 4.1.1 Simulation Platform . . . 29 4.1.2 Flight Simulators . . . 31 4.1.3 Flight Simulator X . . . 33 4.2 Problem Approach . . . 34 4.3 Task Division . . . 36 4.4 Risk Analysis . . . 38 5 Implementation 41 5.1 Flight Simulator X Capabilities . . . 41

5.1.1 Communicating with Flight Simulator . . . 41

5.1.2 Injecting Faults Through SimConnect Events . . . 42

(10)

5.1.4 Conclusion . . . 44

5.2 Fault Categorization System . . . 44

5.3 Fault Injection Tool . . . 46

5.3.1 Mission Description Language Changes . . . 46

5.3.2 Mission Fault Configurator . . . 50

5.3.3 Control Panel Fault Injector . . . 51

5.4 Fault Detection Module . . . 54

5.4.1 Detecting Engine Related Fault . . . 55

5.4.2 Detecting Brake Related Faults . . . 56

5.4.3 Detecting Communications Faults . . . 56

5.5 Fault Treatment Module . . . 57

5.5.1 Emergency Situations . . . 57

5.5.2 Monitoring Emergency State . . . 58

5.5.3 Mayday Situations . . . 58

6 Experimental Results 59 6.1 Test Configuration and Scenarios . . . 59

6.2 Test Results . . . 63

6.2.1 Control Test . . . 63

6.2.2 Total Brakes . . . 63

6.2.3 Communications . . . 65

6.2.4 Engines . . . 65

6.2.5 Detection and Treatment Modules Performance Analysis . . . 69

6.2.6 Resource Utilization Impact . . . 70

6.2.7 Comparison with Previous Works . . . 70

7 Conclusions and Future Work 73 7.1 Conclusion . . . 73

7.2 Future Work . . . 75

(11)

2.1 Concepts associated to faults and fault handling process . . . 6

2.2 Categorization of faults . . . 7

2.3 Scales of autonomy . . . 8

2.4 Examples of AUVs and USV . . . 9

2.5 Examples of UAVs . . . 10

2.6 Examples of UGVs . . . 11

2.7 Mishap rate with cumulative flight hours comparison . . . 12

3.1 Isermann’s fault detection methods division . . . 18

4.1 Original platform architecture . . . 30

4.2 Organization of available failures in Flight Simulator X . . . 33

4.3 Gantt chart with project tasks and schedule . . . 36

5.1 SimConnect Architecture . . . 42

5.2 MDL file structure . . . 46

5.3 Created fault schema . . . 47

5.4 Fault Random Picker schema . . . 47

5.5 Activation Condition schema . . . 48

5.6 Fault Ending schema . . . 49

5.7 Fault Behaviour schema . . . 49

5.8 Propagation schema . . . 50

5.9 Fault Configurator Window . . . 50

5.10 Fault Manager Injector State Machine . . . 53

5.11 Communications fault detector: normal operation (top) vs fault detected (bottom) 57 6.1 KCLE airport diagram1 . . . 60

6.2 Beechcraft Baron 58 taxied at KCLE airport in Flight Simulator X . . . 61

6.3 Test Flight Scenario . . . 61

6.4 Speed comparison after touchdown with and without brake fault . . . 64

6.5 Total brakes test scenario with treatment . . . 64

6.6 Communications fault treatment flight path . . . 66

6.7 Propeller speed after ramping fault injection in tests #11 and #12 . . . 66

6.8 Recovery path after engine failure during takeoff . . . 68

(12)

(13)

2.1 Sources of system failures for several UAV . . . 12

2.2 Description of included systems for each failure type used in Table 2.1 . . . 13

2.3 Results of the various engine faults . . . 14

2.4 Number of small UAV mishaps by cause and consequent number of fatalities . . 14

2.5 Accidents and Fatalities by Type of Equipment Failure . . . 15

2.6 Number of accidents and fatalities by system and component failure . . . 15

3.1 Summary of reviewed Fault Detection Methods . . . 26

3.2 Summary of most relevant fault detection literature in AVs . . . 27

4.1 Comparison of commercial flight simulators . . . 32

4.2 Project’s risk factors with probability and impact . . . 38

5.1 Summary of toggle failures through Simconnect and their impact in user and AI aircraft . . . 42

5.2 Usability of SimConnect events to inject faults . . . 43

5.3 Readable data from SimConnect for user and AI aircraft . . . 44

5.4 Fault severity scale . . . 45

5.5 Fault Classification Table . . . 45

5.6 Implemented faults in injection tool and respective injection methods . . . 53

5.7 Non implemented faults . . . 54

6.1 Details of tests to be performed to the fault detection module. . . 62

6.2 Results of intermittent communications fault . . . 65

6.3 Results of the various engine faults . . . 67

6.4 Resume of the results obtained in the tests. . . 69

(14)

(15)

ATC Air Traffic Control

API Application Programming Interface AUV Autonomous Underwater Vehicle

AV Autonomous Vehicle

CPU Central Processing Unit

DAME Drilling Automation for Mars Environment DDL Disturbances Description Language

EM Expectation-Maximization

FLA Fast Lightweight Autonomy

FDIR Fault Detection Isolation and Reconfiguration FIPA Foundation for Intelligent Physical Agents

FSX Flight Simulator X

FFS Full Flight Simulator GUI Graphical User Interface

HAUV Hovering Autonomous Underwater Vehicle HyDE Hybrid Diagnostic Engine

LYDIA Language for sYstem DIAgnosis MDL Mission Description Language

MBD Model Based Diagnosis

MMAE Multi Model Adaptive Estimator

NASA National Aeronautics and Space Administration PCA Principal Component Analysis

RSTA Reconnaissance, Surveillance, and Target Acquisition ROV Remotely Operated underwater Vehicle

SDK Software Development Kit SDL Scenario Description Language TDL Team Description Language TIC Theil Inequality Coefficient

UAV Unmanned Aerial Vehicle

UGV Unmanned Ground Vehicle

(16)

(17)

Introduction

The present chapter contains the general information about the project and the document itself. It starts by presenting the context and motivation, which transitions to the project and report ob-jectives. Then a brief explanation of the methodology and expected results is given as well as an overview of this document’s structure.

1.1 Context and Motivation

The technology that we use every day to perform even simple tasks is getting more and more au-tonomous. Increasing autonomy has been at the centre of research and advancement for centuries and each year that passes innovations in autonomy is tackling increasingly difficult problems. With the increase in autonomy of the technology we use, more responsibility is transferred from the user to the machines/systems [Parasuraman et al., 2000].

A perfect representation of this is road vehicles. Over the last few decades several systems have been developed that increase autonomy in cars such as automatic transmission, adaptive cruise control, automatic emergency breaking, etc, and lately fully autonomous driving vehicles [Rödel et al., 2014]. The same type of progression was observed in all types of vehicles.

Unmanned vehicles like remote controlled and especially autonomous ones have received a lot of attention in the last few years thanks to their promising capabilities in performing tasks in places where humans can’t reach or are too dangerous to guarantee the safety and are particularly useful in information gathering and object transportation and manipulation [Schoenwald, 2000].

Currently there exist autonomous vehicles capable of exploring the depths of the ocean and performing airborne missions without the direct control of a human being. Like any other vehicle or piece of technology, autonomous vehicles are susceptible to failures and with no human oper-ator to diagnose the system in real-time, automated fault detection and recovery systems must be integrated in them. These systems have the potential of preventing failures with varying degrees

(18)

of severity ranging from preventing damage to the vehicle (soft failures) to prevent injuries and even death to people in case of a crash (catastrophic failures) [Petritoli et al., 2018].

Detection and Diagnosis of failures can be done rather simply using redundant systems like it is currently made in commercial aircrafts where in the case of failure the remaining nominal redundant systems can diagnose the fault and take over [Chen and Patton, 1999]. However, most autonomous vehicles can’t afford having redundant systems for every critical component due to weight, space and autonomy constraints. An alternative is to analyse the data from the vehicle sensors to detect faults. This approach is commonly known as analytical redundancy [Chen and Patton, 1999].

Researching fault detection systems in autonomous vehicles carries some challenges. Op-erating real vehicles can be expensive and time consuming and deliberately triggering faults in real vehicles can lead to crashes and potential loss of the vehicle. To prevent these scenarios, researchers have been using computer simulations to study fault handling systems in autonomous vehicles as demonstrated in [Freeman et al., 2013] [Baskaya et al., 2017b] [Heredia and Ollero, 2009] [Cork et al., 2005] [Purvis et al., 2015]. Using simulations makes testing the developed systems faster, safer and systematic.

Using simulation environments to perform scientific research is common in many fields of sci-ence and has been evolving alongside the increase of computational power in the last few decades [Knuuttila et al., 2006]. More recently, the use of computer game engines in research is gaining popularity. The pursuit of realism has driven game developers to base their engines in real world physics and their modular and online natures makes integrating external systems and implement-ing distributed systems very easy [Lewis and Jacobson, 2002].

This project continues the development of a multi-agent system platform for coordination of autonomous vehicles based in Flight Simulator X (FSX) [Silva, 2011], a game developed by Microsoft and now being used in scientific research. Currently the platform has no fault handling system for the autonomous vehicle who perform the missions, which, as said above, is crucial to prevent vehicle crashes/loss and keep human lives safe.

Currently there exist many approaches and contributions to perform fault detection and treat-ment in real and simulated aircraft, but none of them was designed and developed to work with several autonomous vehicles of very diverse types at the same time.

1.2 Objectives

The objective of this project is to develop a fault detection and treatment system for autonomous vehicles being simulated in Flight Simulator X and integrate it in an already existing multi-agent platform that simulates high-level missions with multiple autonomous vehicles. This system must be capable of working with several AVs of different types and achieve a high level of user cus-tomization in its main components. To facilitate the comprehension and development of the project, this main objective was divided in smaller sequential goals as detailed below.

(19)

The first goal is to develop a fault categorization system based on the reviewed literature and capabilities/limitations of the failure system of Flight Simulator X. It will be responsible for evaluating the detected faults in order to facilitate the fault treatment process.

Then a fault injection tool needs to be developed. This tool will aid the user setting up what faults should be activated for each vehicle in the simulation, as well as defining the timing on the fault. This tool will simplify the process of testing the system by systematizing the process of injecting faults. Additionally, a random fault mode must also be implemented, that simulates the real world nature of faults, where it isn’t know when and which faults will manifest in he vehicles. Afterwards comes the main objectives, starting with developing the fault detection module. It will be responsible for utilizing analytical methods to detect and identify the faults injected by the tool. It is essential that the module detects most of the faults as fast as possible and is impervious to false positives. This will lead to the development of the fault treatment module that must be capable of reconfiguring the behaviour of the affected vehicle to mitigate the negative effects of the fault. As these modules will run inside the agent responsible for controlling the vehicle, they must take the least amount of computational resources as possible while maintaining a good level of detection and reconfiguration performance.

In the end a test suite should be developed to evaluate the performance of the developed system and validate that it functions as expected. It must be possible to run the test suite in two modes, with the developed modules operational and without, with the purpose of comparing the results and determine if the system increased the reliability of the autonomous vehicles. Several metrics must be analysed to analyse the detection, treatment and computational expense of the system.

1.3 Methodology and Expected Results

Before starting the development effort, it is important to study the project and present an approach to the problem alongside a work plan. This includes performing a full literature review on the project topics, studying the already develop framework that serves as development base and de-tailing how the problem will be tackled and how the work is going to be separated. The end result should provide a strong basis to proceed to the development phase with a clear idea of what to do and why. To devise the work plan an approach to the problem was conceived. The work was divided in several tasks and scheduled with the help of a Gantt diagram.

The fault detection method utilized must be carefully select to be the most adequate for the purposed of this project. To help in this decision a comparison of all the discussed methods will be made, evaluating the methods complexity and computational cost in a qualitative way by comparing the existing literature on this topic.

To evaluate the success of the developed system several performance metrics will be analysed during testing. These include the percentage of correctly detected faults (true positives) and the rate of false-positives, the time elapse between the fault injection and detection and the amount of resources being used by the vehicle agent to perform these tasks.

(20)

It is expected that a fully functional fault injection tool and fault detection and treatment mod-ules will be developed and integrated in the already existing platform. It should be able to detect faults accurately with a high rate of true positives, a low rate of false negatives and fast enough to mitigate the effects of the fault. It should also positively impact the outcome of a failure situation, with minimal damage to the vehicle, and minimal impact to the vehicle’s mission performance. Overall the reliability of the autonomous vehicles should increase with a minimal increase in the usage of computational resources.

1.4 Document Structure

The remainder of this document is organized as follows:

Chapter2provides some definitions and clarifications about the topics discussed in this doc-ument to accommodate the reader to the field of study. It focuses on introducing the concepts of faults and autonomous vehicles, the main areas of research of this project.

Chapter3is dedicated to the literature review performed in the context of this project. It starts by presenting the main methods used for analytical fault detection and then presenting some of the latest and most relevant literature about the application of said methods in autonomous vehicles as well as contributions in the field using simulators.

Afterwards, chapter4 focuses on the planning of the upcoming work to be developed. An overview of the necessary technologies is given, with special attention granted to the previous developed platform that this project is based on. The approach to the problem is then presented, followed by a description of the planned tasks accompanied by a Gantt chart. An analysis to the risks associated with the development is also detailed.

After the literature review and planning, Chapter 5gives a detailed look at all the developed systems that together help achieve the proposed objectives. It starts by describing how to work with the existing platform, in particular with SimConnect. After, the fault categorization system is presented followed by the description of the fault injection tool, which details the modifications made to the platform to accommodate faults and how they are injected in the vehicles. Then, the focus switches to the fault detection and treatment modules and the various algorithms used by them.

Chapter 6 contains all the testing related information. It begins with a description of the created test scenario and the several configurations that were tested. Then the results are presented, followed by a critical analyse in which comparisons are drawn between the performance of the presented system with similar studied ones.

Finally, in chapter 7an evaluation of the documented work is made, based on the outlined objectives. A brief comment about the future work is also present.

(21)

Contextualization

This chapter introduces the main topics discussed in the document that are essential for the reader to know and understand before-hand. It examines the definitions and types of faults and au-tonomous vehicles and explores the levels of reliability of some UAVs.

2.1 Faults

Isermann describes a fault as “an unpermitted deviation of at least one characteristic property (feature) of the system from the acceptable, usual, standard condition.” [Isermann, 2006]. He further elaborates that a fault is a state of the system and that the unpermitted deviation is the difference between the value attributed to the fault and the established threshold. He also points out that faults that are directly caused by a human, such as design or software faults, are called errors. Gertler’s definition is similar: “(. . . ) faults are deviations from the normal behaviour in the plant or its instrumentation” [Gertler, 1998]. As cited by [Venkatasubramanian et al., 2003c], Himmelblau in 1978 defined a fault as “a departure from an acceptable range of an observed variable or a calculated parameter associated with a process”.

All these definitions agree that a fault is a divergence from a predicted, regular behaviour. In the same book [Isermann, 2006], Isermann explains that a fault that was directly caused by a human is called an error. He also defines a failure as “a permanent interruption of a system’s ability to perform a required function under specified operating conditions” and explains that a failure is an event that results from one or more faults. Additionally, he adds that a malfunction "is an intermittent irregularity in the fulfilment of a system’s desired function" and that they are also caused by one or more faults. This relationship between fault/error and failure/malfunction is depicted in Fig.2.1a.

Several steps must be taken to properly handle faults, including, in order, detection, isolation and identification. Fault detection is the task of determining if a fault is currently present in the system. Fault isolation is the determination of the type, location and timing of the detected fault.

(22)

(a) Relationship of fault related terms (b) Fault handling process

Figure 2.1: Concepts associated to faults and fault handling process

Finally, fault identification consists of determining the scale and extent of the fault [Gertler, 1998] [Isermann, 2006]. These last two tasks, fault isolation and identification, are often group together in a task referred to as fault diagnosis [Gertler, 1998]. In some literature the fault handling process is denominated as FDIR (Fault Detection, Isolation and Reconfiguration). The reconfiguration step, also known as recovery, comprises of adapting the system to the fault, so it can maintain its functional integrity. Figure 2.1bsummarizes the fault handling process.

Many times, a fault in one subsystem can cause faults in other subsystems, in a process known as fault propagation. A fault is propagated from one system to the other through a connection. When this connection is compromised by the fault behaviour of one system, the other connected system may develop a fault of its own [Kong et al., 2017]. It is important to understand how faults propagate within a system when dealing with fault detection and diagnosis as it helps identifying the origin of a fault and determine the potential risk a fault can pose to the system.

As can be observed in Fig. 2.2, many different things can categorize a fault such as its form, time behaviour, extent and type of component [Isermann, 2006]. The form of a fault can be either random or systematic. Random faults occur unpredictably and are usually associated with the wear and tear of the mechanical components and electronic faults such as temperature and radiation spikes, electric discharges, variations in power delivery and proximity to strong magnetic fields [Malaiya and Su, 1979]. Systematic faults occur in a deterministic way and result from design and implementation flaws. The time behaviour refers to how the fault manifests itself in the system through time. It can be classified as permanent, transient, intermittent, noise and drift. Permanent faults, once present in the system, never cease to manifest themselves. Transient faults appear for a time and then disappear and are often linked with temporary environment changes such as a strong gust of wind. Intermittent faults come and go, commonly in random intervals and are usually hard

(23)

to isolate. Drift and noise are associated with loss of sensor precision and accuracy, respectively. The extent of a fault relates to how much of the system is affected by it and can either be local or global.

Further differentiation between faults can be made by classifying the type of component where the fault is occurring and the influence in the measured variables [Isermann, 2005]. The affected components can be sensors (e.g. gyroscopes, fluid pressure sensors, temperature sensors, etc), actuators (e.g. motors, valves, hydraulics, etc) and process components (e.g. pipes, structures, gears, etc). Additionally faults can originate in eletrical components (e.g. capacitors, batteries, heating elements), electronic hardware (e.g. micro-controllers) and software [Isermann, 2006], with the most common ones of the latter originating from requirement faults, coding faults, and data problems [Hamill and Goseva-Popstojanova, 2009]. The measured variables can be affected in two ways by a fault, by addition or by multiplication. Additive faults affect a variable by adding to it the value of the fault and multiplicative faults by adding the product of the the same value with another variable [Isermann, 2006].

Figure 2.2: Categorization of faults

2.2 Autonomous Vehicles

According to the Oxford Dictionary, autonomy is “the ability to act and make decisions without being controlled by anyone” and vehicle is “a thing that is used for transporting people or goods from one place to another” [Hornby, 2005]. From this it would be natural to define an Autonomous Vehicle (AV) as something that transports things from place to place without the control of a human being.

Cox and Wilfong defined autonomous vehicle as “(. . . ) vehicles that are capable of intelligent motion and action without requiring either a guide to follow or a teleoperator control.”[Cox and

(24)

Wilfong, 1990]. Following this definition, the requirements needed to classify a vehicle as au-tonomous is for it to move and act intelligently without the help of a predefined path/plan or the input of a human operator.

While these two definitions are close, Cox and Wilfong specify that the movement must be of intelligent nature and not reactive, random or follow some predefined set of instructions. They also mention that just because it is called a vehicle, it doesn’t mean that it can’t interact with the environment or perform other tasks other than transporting.

2.2.1 Levels of Autonomy

Autonomy is not a black and white concept. A machine can’t be classified as just autonomous or not. Several levels of autonomy exist between the two extremes [Parasuraman et al., 2000], which can be seen in Fig. 2.3a. In the lowest levels (1 to 5) the system is always dependent on the decision of a human to act, with the difference being how many suggestions it provides. In the higher levels (7 to 10) the system is capable of acting on its own, with each increment changing the amount of information the human receives. Level 6 acts as a cross between these two, where the human is given a limited time imposed by the computer to make his decision, or the machine acts autonomously.

In the context of Autonomous Vehicles, a scale of Autonomous Control Levels was proposed by the U.S. DoD [United States Department of Defence, 2002] to measure the progress of UAVs autonomous capabilities and can be found in Fig. 2.3b. The first 4 levels consider only a single UAV while the rest consider the control of groups of UAVs. This indicates that the future of autonomous UAVs is in controlling groups.

(a) Levels of Automation [Parasuraman et al., 2000](b) Autonomous Control Levels [United States De-partment of Defence, 2002]

(25)

2.2.2 Types of Autonomous Vehicles

Three main types of Autonomous Vehicles can be distinguished by the environment in which they operate, those being water (Autonomous Underwater Vehicles (AUVs) and Unmanned Surface Vehicles (USVs)), land (Unmanned Ground Vehicles (UGVs)) and air (Unmanned Aerial Vehi-cles/Systems (UAVs/UASs)).

AUVs and USVs

AUVs or Autonomous Underwater Vehicles are submersible vehicles that can operate in a fully au-tonomous way. They can be considered an evolution of the ROV (Remotely Operated underwater Vehicle). They can be controlled via internal logic and decision making or follow a preprogramed path. They are usually classified by their size, propulsion power and depth ratings. These vehicles have been used for research, commercial and military purposes such as sensor data and sample gathering, inspections, oil and mineral search, mine hunting, etc. [Christ and Wernli, 2014] [Seto and Bashir, 2017].

The Bluefin HAUV1 (Hovering AUV) (Fig. 2.4a)is a small size AUV designed to perform autonomous underwater ship inspections, but can also be used for mine ordnance at a maximum depth of 100 m. An example of a large AUV is the Autosub6000 (Fig. 2.4b)which is used for deep ocean scientific research and can reach depths of 6000m [McPhail, 2009].

(a) Bluefin’s HAUV1 (b) Autosub60002 _{(c) StingRay}3

Figure 2.4: Examples of AUVs (a, b) and USV (c)

However, not all sea vehicles are designed to submerge. A separate category exists for vehicles that move on the surface of the water, the Unmanned Surface Vehicles. This type of vehicles are known to posses limited autonomy [Bertram, 2008], as their operation must always be supervised to guarantee the safety of others and prevent loss of property [Yan et al., 2010]. These vehicles are used primarily by countries’ navies to perform missions such as surveillance, mine hunting and anti-submarine warfare in places deemed too dangerous for humans [Yan et al., 2010]. One of the most capable small USVs is the Stingray (Fig. 2.4c) developed in Israel and capable of perform-ing missions such as reconnaissance and surveillance, coastal object identification, underwater searches, etc [Yan et al., 2010].

1_Source _and _more _information _at: _{https://gdmissionsystems.com/products/}

underwater-vehicles/bluefin-hauv

2_Source:_{https://noc.ac.uk/files/logos/Autosub6000DSC_01350786.jpg}

(26)

UAVs

Like AUV’s, UAV or Unmanned Aerial Vehicles are an evolution of previous aircraft technology. They are usually classified by the means of propulsion and their size. UAVs can utilize multiple rotors such as quadcopters, a single rotor, like a helicopter, a fixed wing design like an airplane or use a hybrid mix of rotors and fixed-wing designs. Rotor based designs are capable of hovering on a specific location but are less efficient and slower than fixed-wing designs. These are not capable of hovering but can cover large distances very quickly, making them suitable for reconnaissance missions. Hybrid system combine the benefits of these two by both having rotors for stable hover-ing and a fixed whover-ing for long distance flight. Additionally, UAVs are also classified by their weight and size [Ferreira, 2018] [Vergouw et al., 2016]. The US Air Force X-474 is a fixed-wing UAV that has been developed as a proving concept that autonomous aerial vehicles can be integrated in fast and demanding mission scenarios. It is capable of landing and taking off autonomously from an aircraft carrier and refuel while in mid-air. DARPA’s Fast Lightweight Autonomy5(FLA) program developed a multi-rotor UAV capable of navigating indoors while moving at speeds of 20 m/s.

(a) US Air Force X-474 (b) DARPA’s FLA5

Figure 2.5: Examples of UAVs

4_More _information _at: _{http://www.northropgrumman.com/Capabilities/X47BUCAS/Pages/}

default.aspx/a

(27)

UGVs

Unmanned Ground Vehicles or UGVs are any mechanical equipment that has the ability to move along the surface of the ground as well as transport something, except human beings [Nguyen-Huu and Titus, 2009]. They are mainly distinguished by their traction method which is usually tank-like tracks, wheels and legs. These vehicles have been used in various missions such as explosive ordnance disposal, RSTA (Reconnaissance, Surveillance, and Target Acquisition), planetary ex-ploration, object transportation, etc [Gage, 1995]. NASA’s Mars Exploration Rovers6are a prime example of an autonomous UGV, being able to operate in the harsh conditions of Mars on its own. Boston Dynamics’ Spot7 is a four legged robot capable of autonomous navigating both indoors and outdoors, walk up and down stairs as well as inclined, rough terrain.

(a) NASA’s Spirit8 (b) Boston Dynamics’ Spot9

Figure 2.6: Examples of UGVs

2.3 UAV Reliability

Comprehensive records of reliability of AV that include a significant sample size of vehicles are not readily available. This can be caused by the small commercial use of these systems, which tends to be mostly limited to research and military applications. Research usually deploys few systems due to budget constraints and small team sizes. On the other hand, the military of countries like the United States have large budgets and great incentives to innovate and research in the field of autonomous vehicles.

The USA has released several reports about their UAV use in the military throughout the 2000s which included reliability studies of their UAVs with the most recorded flight hours [United States Department of Defence, 2002]. Figure 2.7 shows a graphic that correlates the cumulative flight hours and the mishap rate per 100000 hours of several UAVs. It is possible to see with the more flight hours an UAV accumulates, the less accident prone it becomes [Austin, 2010]. The Global

6_{More information at:}_{https://mars.nasa.gov/programmissions/missions/past/2003/} 7_{More information at:}_{https://www.bostondynamics.com/spot}

8_Source:_{https://bit.ly/2HXfNic}

(28)

Hawk, Hunter and Predator UAVs all reached similar rates of mishaps as the manned fighter jet F16, also represented in the graphic. It is also good to consider that the Pioneer and Shadow UAVs are used in more hostile environments than the rest of the aircraft, which can result in some failures not being caused by problems with the UAV but by being deployed in dangerous zones where the more expensive UAVs would not be positioned [United States Department of Defence, 2002] [Austin, 2010].

Figure 2.7: Mishap rate with cumulative flight hours comparison [Austin, 2010]

These reports also present some statistics about the failure modes in some of the UAVs that lead to mission abortion or cancellation. One report in particular is only about the reliability progress and statistics of some of the earlier models of UAVs [United States Department of Defence, 2003]. Table 2.1splits the aircraft failure modes into the categories power-plant, flight controls, commu-nications, human errors and miscellaneous, explained in detail in Table 2.2according to [United States Department of Defence, 2003]. It is important to note that both the Predator B and Pioneer 2B are iterations of the Predator A and Pioneer 2A, respectively.

Power Plant Flight Control Communications Human Errors Misc.

Predator A 23 39 11 16 11 Predator B 53 23 10 2 12 Pioneer 2A 29 29 19 18 5 Pioneer 2B 51 15 13 19 2 Hunter 5A 29 21 4 29 17 Shadow 50 0 19 6 25 Average 39 21 13 15 12

Table 2.1: Sources of system failures for several UAV (in percentage) [United States Department of Defence, 2003]

(29)

Category Included Systems

Power-Plant Engine, fuel supply, gearing, propeller, related electronics

Flight Controls Avionics, actuators, control surfaces, flight software, navigation, aerodynamics Communications Communications between the aircraft and the ground control

Human Errors Human errors and maintenance problems

Miscellaneous Other problems not included above, excluding weather related failures Table 2.2: Description of included systems for each failure type used in Table 2.1

From the data it might seem that with both the Predator and Pioneer, the power plant reliability reduced with the B iteration of these vehicles. This is not due to an increase in failures, but to a decrease in failures in the other categories while the power plants stayed constant, resulting in a misleading increase of power plant failure percentage.

Focusing on the 3 main categories available in Table 2.1, the power-plant is the one responsible for the most failures, followed by flight controls and with communications in the bottom. These numbers vary greatly between the studied UAVs even though they are all propeller driven fixed wing single engined UAVs (the Hunter has two) operated by the same institution. This discrepancy shows that each system has its own particular problems and that development should be tailored to adapt to each system weaknesses. These 3 categories account for most of the failures (about 3 in every 4), and as such they should be the ones that harvest the most effort when it comes to fault prevention.

When studying reliability of aircraft, several metrics are used to help understand the raw data, including the MTBF (Mean Time Between Failure), availability, reliability and the mishap rate per 100000 hours. The MTBF is a prediction of the time between failures, which is calculated by averaging all all the times between recorded failures. The availability is the ratio between the times an aircraft was ready to be used when called upon over the total number of times it was summoned and is used to indicate how often an aircraft is ready to be deployed. Reliability, determined by subtracting from 100 the percentage of times a mission was cancelled or aborted due to failures, expresses how trusted upon the aircraft can be to perform its mission from start to end.

Table 2.3show the evolution of these metrics for the Predator, Pioneer and Hunter UAVs. The effects of testing and accumulating flight hours in reliability are again visible, with every metric improving in the latter version of the represented systems [United States Department of Defence, 2003].

2.4 Aircraft Accident Severity

The topic of fault related incidents with UAVs is currently underinvestigated. Only one source is available with data that relates mishaps with the cause and severity of the accident, shown in Table 2.4 [Belcastro et al., 2017a]. In this case it is included the number of fatal accidents with small UAV systems by primary cause of the mishap. The dataset, based in United States government accident reports and media reports, is limited to 100 entries and only 2 fatal accidents

(30)

MTFS (hrs) Availability Reliability Mishap Rate per 100,000 hrs RQ-1A/

Predator

Requirement n/a n/a n/a n/a

Actual 32.0 40% 74% 43 RQ-1B/ Predator Requirement 40 80% 70% n/a Actual 55.1 93% 89% 31 RQ-2A/ Pioneer Requirement 25 93% 84% n/a Actual 9.1 74% 80% 363 RQ-2B/ Pioneer Requirement 25 93% 84% n/a Actual 28.6 78% 91% 139 RQ-5/Hunter (pre-1996) Requirement 10 85% 74% n/a

Actual n/a n/a n/a 255

RQ-5/Hunter (post-1996)

Requirement 10 85% 74% n/a

Actual 11.3 98% 82% 16

Table 2.3: Results of the various engine faults

were registered. In the years to come more data should be available to further investigate this topic. Just like the UAVs used by the United States military, the main failure categories are flight controls, propulsion and communications. The only fatal accident with known cause was due to human error, which also accounts for a big part of mishaps in relative terms.

Primary Cause Incidents Accidents Fatal

Accidents Total Flight Controls 15 15 Flight Crew 11 2 1 14 Propulsion 9 9 Lost Link 8 8 Software 6 6 Sensors 2 2 Remote Control 2 2 Wind Shear 2 2 Other 10 10 Undetermined 31 1 32 Total 96 2 2 100

Table 2.4: Number of small UAV mishaps by cause and consequent number of fatalities [Belcastro et al., 2017a]

On the other hand, this type of data has been studied to great lengths for traditional aviation. In [Strong et al., 2010], a comprehensive study of 700 fatal aviation accidents that expand from 1990 to 2006 is presented. These mishaps were grouped in all sorts of categories, including the failure for those cases where a component failure led to the accident.

Another study with a similar scope can be found in [Belcastro et al., 2014]. It presents a detailed analysis of 275 cases of aircraft loss of control with the goal to find the worst case com-binations and temporal sequences of factors in terms of resulting fatalities. The goal of this study

(31)

was to include these loss of control scenarios in future aircraft tests to detect potential fatal prob-lems and improve safety.

Table 2.6 and Table 2.5 where extracted from the studies mentioned above. From inves-tigating both tables the accidents that result from control surface, engine and sensor/instrument failure are the most responsible for fatalities in conventional aviation. While these numbers can’t be directly applied to UAV systems, they can indicate which type of faults have more probability of causing catastrophic failures.

Table 2.5: Accidents and Fatalities by Type of Equipment Failure [Strong et al., 2010]

Table 2.6: Number of accidents and fatalities by system and component failure [Belcastro et al., 2014]

(32)

(33)

Literature Review

This chapter covers the literature review performed for the subject of the document. The litera-ture review is divided in three sequential parts. The first covers the most relevant fault detection methods from a theoretical perspective. Next, a review of the scientific papers that applied those fault detection techniques to autonomous vehicles is presented. And finally a section dedicated to previous work about using flight simulators to develop fault detection systems in AV.

3.1 Fault Detection Methods

Over the years there has been an increasing demand for better safety and reliability of all types of machines. As technology advances so does the complexity of mechanical systems along with the demand for better performing fault detection algorithms. With the increase of computational power more complex algorithms have been developed and studied.

Several surveys documenting the available technologies and advancements in the field of fault detection have been published in the last few decades.

In [Inseok et al., 2010] a survey about the recently developed methods for FDIR is presented. For detection and isolation several model-based techniques are examined as well as statistical methods to apply to the generated residuals. Additionally, two methods of fault reconfiguration are explored. In the end a comparison of all the different methods presented is made comparing performance and trade-offs.

The survey presented in [Miljkovi´c, 2011] presents a more general view of the different meth-ods of fault detection from the early basics to the modern complex methmeth-ods. These were grouped in three groups: Data Methods and Signal Models, which utilize real-time data and statistical methods; Process Model Based Methods, which use a model of the system and compare its values to the real system; and Knowledge Based Methods, which utilize previously gathered information to discover patterns to apply in the system in real-time. Some real-world examples are given but no comparisons are made between them.

(34)

Figure 3.1: Isermann’s fault detection methods division [Isermann, 2006]

More specifically, several surveys targeting model-based fault detection have been published in the 80’s and 90’s. In these decades the research of these types of methods was growing with observer and parameter estimation methods being used in nearly 70% of the contributions in this area [Isermann and Ballé, 1997]. These surveys can be found in [Isermann, 1984] [Gertler, 1988] [Frank and Ding, 1997].

Venkatasubramanian et al. pubished three papers reviewing fault detection methods with each one focusing on a different group. The first [Venkatasubramanian et al., 2003c] focuses on quan-titative model-based methods. The second [Venkatasubramanian et al., 2003a], qualitative models and search strategies are reviewed. In the last one [Venkatasubramanian et al., 2003b] process history-based methods are discussed.

Fault detection methods can be categorized in different groups. Gertler divided them in meth-ods that don’t make use of a model and methmeth-ods that do [Gertler, 1988]. Miljkovi´c had a similar approach. He grouped the methods in three categories: data methods and signal models; process model-based methods; and knowledge-based methods [Miljkovi´c, 2011]. These first two groups are comparable to Gertler’s groups in the presented order. Venkatasubramanian et al. divided them in: quantitative model-based methods; qualitative models and search strategies; and process history-based methods [Venkatasubramanian et al., 2003c]. Isermann developed a more compre-hensive categorization system that can be found in [Isermann, 2006] and visualized in Fig. 3.1.

Generally, model based approaches require more resources to implement, as developing an accurate model of the system can be time consuming. Relying on the simulation of a full detailed model also requires more computational power to obtain results in real time, meaning that these methods tend to scale poorly with several concurrent systems being analysed. Model free methods prevents all these problems by utilising data analyses to avoid using a model altogether and are generally less resource intensive. Where this category can’t usually compete is in the quality of the results, since having a full model of the system is a great advantage when the objective is to detect anomalies.

Below are presented multiple methods for fault detection, which were divided in two categories following Gertler’s classification: model free and model-based methods.

(35)

3.1.1 Model Free Methods

Model free methods utilize the data retrieved from the sensors and make use of statistical methods to process it and verify if a fault is currently present or not. Below are presented the main model free methods used for fault diagnosis.

Limit and Trend Checking Limit checking is the simplest and the most used method for fault detection. It consists of monitoring a variable and comparing its value to an upper and/or lower limit. If it exceeds one of these previously established values then a fault has occurred [Isermann, 2006]. Similar to limit checking, trend checking utilizes the first derivate of the variable to un-derstand its rate of change. It is then checked if this value is between the established thresholds [Isermann, 2006]. Some systems that employ these methods have two levels per limit, the first that serves as a caution, and the second as the fault warning [Gertler, 1988].

Change Detection The broad idea of change detection is similar to limit checking but utilizes additional statistic methods to accomplish its goal. In stochastic systems, one of the ways to perform change detection is with an estimation of the variable mean and variance [Isermann, 2006]. The estimations of two time frames are then compared. A fault is detected when the result of this comparison overtakes a certain threshold. Then, more complex statistical tests can be applied to achieve better results when the change in the mean is small when compared to the change in the variance. These test methods include RunSum tests, t-test and F-test [Isermann, 2006]. For cases where a fault happens gradually and not abruptly, fuzzy thresholds can be used to simulate more realistic change detection [Isermann, 2006].

Neural Networks / Clustering Data mining has been used for the purpose of fault detection for some years now. There are two main types of learning algorithms: supervised (e.g. Neural Networks) and unsupervised (e.g. Clustering). Supervised methods utilize data that has been previously labelled as indicating a fault or not. Unsupervised learning utilizes unlabelled data and group them in clusters based on their similarities. Isermann and Ballé found that more and more published works about fault detection were using Neural Networks [Isermann and Ballé, 1997]. These have been primarily used for the classification of the generated residuals, but in some applications, neural nets are responsible for the residual generation [Isermann and Ballé, 1997]. When dealing with unlabelled data, clustering is a popular choice, with k-means and Kohonen feature maps being the most used algorithms [Venkatasubramanian et al., 2003b] [Isermann, 2006] [Miljkovi´c, 2011].

3.1.2 Model-Based Methods

Model-based methods use a model of the system being analysed and feed the data retrieved from the sensors into the model to generate and estimate the state of the system. This estimated state is

(36)

usually compared to the sensed state and the difference between them is used to generate residuals. The most common model-based methods are presented and explained below.

Parity Equations Parity Equations (also called parity relations) is a straightforward method to apply fault detection based on a model. It revolves around feeding the system inputs through a model that describes the normal system behaviour and comparing the generated outputs with the measured outputs. This comparison leads to the generation of residuals [Isermann, 2006].

Parameter Estimation When a process model can’t be obtained or changes with time, an es-timation of the model must be made [Venkatasubramanian et al., 2003c] [Miljkovi´c, 2011]. The model is then obtained from the input and output of the system and then used to calculate residuals. However, this is a very computational heavy process that scales poorly with the number of vari-ables making it not suitable for real-time fault detection when a high model accuracy is required [Venkatasubramanian et al., 2003c] [Cimpoesu et al., 2013].

State Observer A state observer works by utilizing the measured inputs and outputs to calculate an internal state. The outputs are calculated using a model of the system and the inputs, but no model is perfect, which results in a discrepancy between the measured output and the estimated output from the model which in turn affects the state estimation negatively. A negative feedback loop is then utilized to reduce the difference between the outputs and consequently, the estimated state becomes more accurate. After the calculated state stabilizes, the calculated output and state become solely sensitive to faults. The difference between the real state and the calculated state can be used as residuals for fault detection [Isermann, 2006] [Inseok et al., 2010].

Output Observer Output Observers, also called Diagnostic Observers or Unknown Input Ob-servers, generate their residuals only from the outputs, making them independent from the inputs. This reduces the effect of the input noise and modelling uncertainties in the generated residuals [Venkatasubramanian et al., 2003c] [Inseok et al., 2010]. It works by building multiple observers, each responsible for generating residuals that detect a subset of faults. When a fault occurs, the residuals from the observers sensitive to that fault cross the threshold. The observers are built in such a way that the pattern of observers with high residuals can only correspond to a single fault [Venkatasubramanian et al., 2003c] [Isermann, 2006].

3.1.3 Residuals

Isermann and Ballé define residuals as "A fault indicator, based on a deviation between measure-ments and model-equation-based computations" [Isermann and Ballé, 1997]. In other words, a residual is a measure of faultiness that is calculated based on the difference between estimated values and measured values. A residual close to zero indicates that the system is fault free, while a big residual indicates a fault. In perfect conditions the residual would always be zero, but in real scenarios noise in the signals, small model inaccuracies and other causes make the residuals

(37)

non-zero even if no fault is present, resulting in a need to adjust the residual threshold to be able to detect faults but not trigger when no fault is present [Isermann, 2006] [Venkatasubramanian et al., 2003c].

However, small faults can remain undetected because the thresholds were set too high. Iser-mann lists several ways to counter this, such as enhancing the residuals for specific faults, filtering noise in the signals, increasing the residuals sensitivity to faults, increase the model robustness and utilizing adaptive thresholds [Isermann, 2006].

Venkatasubramanian et al. mention that for the fault isolation process to be successful, it requires that the residuals must be generated in a way that they are orthogonal to different faults. This means that residuals used to identify one fault can’t be correlated with the residuals used to identify a different fault [Venkatasubramanian et al., 2003c].

3.1.4 Auxiliary Methods to Fault Detection

Methods from other scientific areas can be applied to fault detection methods to improve their accuracy, robustness and computational performance. Below some of these methods are presented.

Principal Component Analysis (PCA) When there are multiple variables that are highly cor-related between them, it might be difficult to apply the methods previously presented as too much effort would be spent to detect a fault. Principal component analysis can significantly reduce a large dataset with many correlated values into a new set of variables called principal components. These principal components retain most of the original data variation [Isermann, 2006]. In other words, PCA tries to find a reduced number of factors that describe the main characteristics of the original data [Venkatasubramanian et al., 2003b]. Then the new data can be interpreted with the mentioned model free methods. The reduced amount of variables decreases the computational cost compared to using the entire dataset.

Kalman Filters In stochastic systems, the constant noise present in the measurements prevents the feedback loop of the state observer to stabilize the output and therefore prevents a good esti-mation of the state. To fix this, Kalman Filters can be used [Inseok et al., 2010] [Isermann, 2006] [Venkatasubramanian et al., 2003c]. Kalman Filters follow the same principal of State Observers but combine both the measured and predicted values into a single, more accurate, estimate. When multiplying these variables, more weight is given to the most accurate one, based on their variance [Faragher, 2012].

Residual Filtering The fault sensitivity of the residuals can be improved by filtering them. To do this, the higher frequencies related to noise are removed, while the lower ones are kept (where faults can be detected) [Gertler, 1988]. It is important to note that this method works well in static residual equations, but in the case of dynamic ones, it will only work if there is a proportional relation between the faults and the residuals. In other cases the filtering might reduce the sensitivity

(38)

of the residuals to failures which can be solved by utilizing multiple parallel filters with varying parameters [Gertler, 1988].

3.2 Model-Based Diagnosis Tools

Model-based Diagnosis (MBD) applies model-based reasoning to the detection and diagnosis of faults in complex systems. Mislevy et al. (2017) said that it “(. . . ) consists of cycles of proposing, instantiating, checking, revising to find an apt model for a given purpose in a given situation, and reasoning about the situation through the model.”[Mislevy et al., 2017]. In other words, it is an expert system where inference methods are used to achieve something, based on a model of a real system. As the name and definition indicate, these types of methods fall in the model-based category of fault detection.

Diagnosis systems based in this method have been used in a wide variety of systems such as satellites [Adams et al., 2011], UAV’s [Memela, 2016], large telescopes [Feldman et al., 2006], etc. Currently two freely available tools for MBD exist: LYDIA and HyDE.

LYDIA (Language for sYstem DIAgnosis) [Feldman et al., 2006] was developed following the increase in the successful use of model-based diagnosis to address the main challenges with the currently used systems: expressing the system being diagnosed in a formal model and infer-ring the diagnosis through the observations. It also tried to solve the high computational cost that accompanied this type of systems by exploiting hierarchical information in the model and compiling selected parts of it, effectively transforming some of the real-time computation into a pre-processing task. In [Feldman et al., 2006] the system is modelled to detect and diagnose faults in the fuel system of a Piper light aircraft.

HyDE (HYbrid Diagnostic Engine) [Narasimhan and Brownston, 2007] was developed by NASA and used in some of their projects such as DAME (Drilling Automation for Mars Environ-ment) and the International Space Station electrical power system. The objective of this framework was to create a flexible MBD that could easily be integrated in any system and perform any fault diagnosis task. To achieve this, the framework includes modelling paradigms and reasoning algo-rithms for a variety of strategies. This allows the diagnosis system designer to try multiple technics without having to recreate the whole system from the ground. If not available, the framework also supports the creation of new, user created models and reasoning algorithms. HyDE is the succes-sor of Livingstone 2, a discrete only MBD also developed by NASA and doesn’t include all the algorithms that HyDE does.

While these inference based frameworks are very powerful, they are also the most difficult to implement. First the framework software of the chosen MBD would have to be integrated with the already existing platform which could result in compatibility issues that would take even more resources to fix. Then, all the systems of the aircraft would have to be modelled using the description language of the framework, which would require knowledge of how the several components of the system are interconnected which is not feasible when dealing with simulated aircraft that use a model created by a third party.

(39)

3.3 Fault Detection in Unmanned/Autonomous Vehicles

Several examples of literature about fault detection in unmanned vehicles can be found. Various methods have been applied to very different vehicles such as aircraft, helicopters and ROVs. In many instances simulations were utilized to aid the process of creating or testing the fault detection systems.

Fixed-wing drones have been one of the most common platforms to implement fault detection in the past years. A very detailed implementation can be found in [Hansen, 2012] where a Danish military drone called Banshee was the subject. The author targeted three common faults that had been observed while the drone was operational. In all of them analytical redundancy methods were implemented instead of hardware ones to preserve the aircraft endurance. For each fault a different method was utilized. For the pitot tube1 failure two redundancies were created, one by comparing air speed with GPS speed and another by creating an observer that estimated the aircraft speed with an engine and thrust model. For loss of control surface action, a simple model-based approach with change detectors was used. Finally, for loss of GPS signal a redundant system utilizing the available positioning sensors such as the gyroscope, accelerometer, altimeter, pitot tube etc. Their application was successful except for the GPS redundant system because the estimates quickly drifted from realistic values.

Another fault detection system in a fixed-wing UAV is found in [Freeman et al., 2013]. In this case two approaches were considered and developed: one based on a model of the UAV that generated residuals via an observer and a data-driven one, which used change detection with z-test. In the end these two were compared with results generated from real flight and simulation testing. To evaluate the model-based method a metric called TIC (Theil Inequality Coefficient) was used. TIC measures the overall difference between the model estimation and real data and outputs a score from 0 to 1 in which a lower score is better. The data-driven method was evaluated through the time of detection. Both methods could detect various aileron faults, with the model-based method achieving an average TIC of 0.143 and the data-driven one a detection time of 0.8s on average. However, the model-based approach was significantly better in some cases, due to having in its design information about the system. Although, it is noted that the downside of the model-based approach is the effort required to develop a model good enough to achieve this performance level.

One of the most recent contributions exploring fault detection in a fixed-wing drone is found in [Baskaya et al., 2017a]. A MAKO UAV was used as a base to create a model to generate nominal and faulty flight data with PCA being used for easy visualization of said data [Baskaya et al., 2017b]. Then, the machine learning algorithm Support Vector Machine was used to classify the data as faulty or nominal. It is claimed that the accuracy and response time of the fault detection system were very good in simulated testing.

1_{The pitot tube is a device that measures the air speed by comparing the air pressure in two different point of the}

(40)

A noteworthy approach to fault detection and accommodation for a fixed-wing UAV’s elevator is found in [Panitsrisit and Ruangwiset, 2011]. Instead of relying in the sensors included in the UAV to maintain stable flight, additional light-weight sensors were added to specific parts of the aircraft to monitor the elevator control system. A current sensor was used to evaluate the elevator servo motor activity, a flex sensor used to measure the angle of the elevator and a pressure sensor used to measure the force being applied in the elevator. By monitoring the output of these sensors, failures in the servo motor, linkage and surface of the elevator can be detected and identified. When a fault happens, the aircraft controller system is adjusted to perform the pitch adjustments with the ailerons in replacement of the faulty elevator.

Although not as popular as fixed-wing drones, a fault detection system for an autonomous helicopter, called MARVIN, can be found in [Heredia et al., 2008]. A model-based system with observer generated residuals was used to detect faults in the vehicle’s sensors. The system was tested using simulations and real-world experimentations. In the end it was concluded that errors that stuck sensors in a constant value were easy to detect, but additive or multiplicative errors were difficult if the deviations were too small. Overall, the fault detection time was, on average, 0.545s, the number of false-negatives was 6 and 9 false-positives were registered. A year later, the same system was further improved, this time using Kalman filters [Heredia and Ollero, 2009]. These performed slightly better and were more robust when compared to the previous experiment.

With the increase in commercial use of multi-rotor AUV’s it is expected that research would focus in those types of drones. Recently, several contributions that utilize the so called “Thau Observer” for fault diagnosis in quadcopters can be found [Freddi and Longhi, 2012] [Cen et al., 2014] [Hasan and Johansen, 2018]. Thau developed an observer able of achieving higher stability for special non-linear systems that conform to a specific set of rules [Thau, 1973]. One of these special cases is the one of the quadrotor systems [Chelly et al., 2016]. This method has been adapted to diagnose faults in sensors [Freddi and Longhi, 2012] as well as actuators and is also capable of estimating how bad the fault was [Cen et al., 2014]. The Thau observer was also paired with a Kalman filter to detect actuator faults [Hasan and Johansen, 2018] . The combination of the two methods extracted the best of both, the global stability of Thau and he optimal filtering of the Kalman filter.

An example of the implementation of an MBD system in a UAV can be found in [Memela, 2016]. A model of the Meraka Modular UAV was built and integrated in the HyDE MBD frame-work with the purpose of identifying faults in the UAV critical systems like the control surfaces, battery and receivers. In the end the system achieved an average fault detection rate of 95.7%, a false-positive rate of 0.17% (which was credited to windy test conditions) and an average detection delay of 2.3s for the UAV control surfaces. These results were then compared to a previous fault detection study [Hallouzi, 2008], in the same control surfaces of a model of Boeing 747, that used a MMAE (Multi Model Adaptive Estimator) with Kalman filters which only managed to achieve an average of 86% fault detection rate with similar detection times.

But the literature is not all about UAVs. In [Corradini et al., 2011] a fault detection system was proposed for an underwater ROV. Residuals were generated using the vehicle model and a sliding