Automating ISO 26262 Hardware Evaluation Methodologies

(1)

Faculdade de Engenharia da Universidade do Porto

Automating ISO 26262 Hardware

Evaluation Methodologies

João Manuel Cardoso Tavares

Dissertation Developed under the Scope of the

Integrated Master in Electrical and Computers Engineering

Major Automation

Supervisor: David Parker (PhD), University of Hull

Co-Supervisor: Rui Esteves Araújo (PhD)

(2)

(3)

(4)

Abstract

The constant evolution of vehicles, incorporating more commodities and functionalities, has been made possible by electric, electronic and programmable electronic components. To assure the reliability of the safety-related systems that use those components, functional safety standards are created to define processes and guidelines for their development. Some of those systems control important functions such as steering and braking in road vehicles. For that purpose, the Automotive Functional Safety Standard, ISO 26262, was launched.

Manual Safety Analysis for complex electronic systems can be a difficult and time-demanding task, with no assurance of success, as it is error prone. With that in mind, the University of Hull developed a safety and reliability tool, to automate some of the safety analysis steps. Hierarchically-Performed Hazard Origin and Propagation Studies (HiP-HOPS) is that state-of-the-art tool.

For the development of hardware systems, ISO 26262 proposes a set of evaluation metrics, which can be used to compare system’s designs. Furthermore, it guides the designer through the process of choosing safety mechanisms to assure the targets for those metrics are achieved. Currently, the safety analysis that is being done using HiP-HOPS does not consider those safety mechanisms. It also does not perform the metrics calculation.

The aim of this thesis is to extend HiP-HOPS capabilities, so it can incorporate those safety mechanisms and perform metrics calculations automatically, to allow the user to test different hardware implementations and point to the best solution.

The topics here stated, namely safety analysis, HiP-HOPS and the standard ISO 26262, are explored in the firstly looked at Literature Review. Then, the changes made to HiP-HOPS are explained and validated through the analysis of a chosen hardware system. The last part is dedicated to result analysis, final considerations and further developments on the metrics software program.

(5)

(6)

Resumo

A evolução constante dos veículos, incorporando mais comodidades e funcionalidades, tem sido possível graças a componentes elétricos, eletrónicos e de eletrónica programável. De modo a assegurar a fiabilidade de sistemas relacionados com segurança que usam esses componentes, normas de segurança funcional têm sido criadas para definir processos e metodologias para o seu desenvolvimento. Alguns desses sistemas controlam funções importantes como direção e travagem em automóveis. Para esse propósito, a norma de segurança funcional automóvel, ISO 26262, foi lançada.

Realizar análise de segurança manualmente, em sistemas eletrónicos complexos, pode ser uma tarefa difícil e demorada, sem garantia de sucesso, visto que é suscetível ao erro humano. Com esta problemática em mente, a University of Hull desenvolveu uma ferramenta de segurança e fiabilidade, para automatizar algumas das etapas da análise de segurança. Essa ferramenta designa-se por Hierarchically-Performed Hazard Origin and Propagation Studies (HiP-HOPS).

Para o desenvolvimento de sistemas de hardware, a norma ISO 26262 propõe um conjunto de métricas de avaliação que são utilizadas para comparar diferentes implementações. Além disso, guia o utilizador ao longo de um processo de escolha de mecanismos de segurança que permitem assegurar o cumprimento dos objetivos quantitativos das métricas. Neste momento, a análise de segurança feita através do HiP-HOPS não tem em consideração esses mecanismos de segurança, nem faz o cálculo das métricas.

O objetivo desta tese é extender as capacidades do HiP-HOPS, de modo a incorporar os mecanismos de segurança na sua análise e fazer o cálculo das métricas automaticamente, de forma a permitir que o utilizador teste implementações de hardware diferentes e apontá-lo na direção da melhor solução.

Os tópicos aqui mencionados, nomeadamente análise de segurança, a ferramenta HiP-HOPS e a norma ISO 26262, serão explorados na revisão de literatura. Seguidamente, as mudanças feitas no HiP-HOPS serão explicadas e validadas através da análise do sistema de hardware escolhido. A última parte será dedicada à análise dos resultados obtidos, às considerações finais e a futuros desenvolvimentos do software para as métricas.

(7)

(8)

Acknowledgments

First, I would like to express my gratitude towards Rui Esteves, PhD, my supervisor at Faculdade de Engenharia da Universidade do Porto. He granted me the opportunity to work on this thesis and was, always, a fantastic guiding voice and valuable help throughout this adventure.

I am equally grateful to David Parker, PhD, my supervisor with the University of Hull. From the first day he was incredibly supportive and gave me critical advice and help which was key to successfully complete this thesis.

My very special thanks goes to Luís Azevedo, PhD student at the University of Hull. He was the force that carried me, everyday, through the difficulties of the project. I will never forget his fantastic guidance and friendship.

I would also like to acknowledge my friends Carina Treffurth and Sara Teter for their amazing work as editors on this thesis.

A special cheers to all the friends I met in Hull. Distinctively to my Family at 80-82 Cottingham Road (and Guests) and the Portuguese Crew. Rémi, Lucas, Carina, Katie, Simone, Sara, Sarah, Till, Simon, Marie, Anke, João, Franciscos, Hélène, Bernardo, Pedro, Tiago, you were the ones that made this semester abroad an unforgettable experience. And thanks for giving me infinite reasons not to work on my thesis. “Come on…Make good choices”.

To all my friends in Portugal, I thank you for accompanying me throughout my life. You are the reason why it has been amazing.

To my beloved family, I am infinitely grateful. My father, mother, sisters, grandparents and remaining family are the most important people in my life and the ones that shaped me into the person I am today.

To my dearest friend and companion, Manuel Tavares, my grandfather, I dedicate my most special thanks. Thank you for all the advice, support, patience, confidence and friendship. I just hope that one day I become half the man you were.

This thesis is dedicated to all of you, João Tavares

(9)

(10)

List of figures

FIGURE 2-1:OVERVIEW OF ISO26262 8

FIGURE 2-2:DOMINATED AND NON-DOMINATED SOLUTIONS 12 FIGURE 2-3:HIP-HOPS PHASES’ OVERVIEW 14

FIGURE 2-4:MATLAB COMPONENT FAILURE EDITOR 15 FIGURE 2-5:INSERTING A COMPONENT’S BASIC EVENT 16

FIGURE 2-6:INSERTING A COMPONENT’S OUTPUT DEVIATION 16 FIGURE 2-7:CREATION OF A HAZARD FOR THE SYSTEM 17

FIGURE 2-8:CONVERSION OF FAULT TREES TO FMEA[2] 19 FIGURE 2-9:FAULT TREE ANALYSIS SUMMARY 20

FIGURE 2-10:FAULT TREE RESULTS 20 FIGURE 2-11:CUT-SET RESULTS 21 FIGURE 2-12:FMEA TABLE 21

FIGURE 3-1:STEPS OF PRODUCT DEVELOPMENT AT HARDWARE LEVEL 24 FIGURE 3-2:GENERIC EMBEDDED SYSTEM [1] 25

FIGURE 3-3:METAMODEL FOR THE SAFETY MECHANISMS IMPLEMENTATION PROPOSED IN [10] 29 FIGURE 3-4:FAILURE MODES OF AN HW ELEMENT 30

FIGURE 3-5:FLOW DIAGRAM FOR FAILURE MODE CLASSIFICATION [1] 30 FIGURE 3-6:SINGLE-POINT FAULT METRIC [1] 33

FIGURE 3-7:LATENT FAULT METRIC [1] 34

FIGURE 3-8:EVALUATION PROCEDURE FOR SINGLE-POINT AND RESIDUAL FAULTS [1] 37 FIGURE 3-9:EVALUATION PROCEDURE FOR DUAL-POINT FAILURES [1] 38

FIGURE 3-10:SAFETY MECHANISM ELEMENT 40 FIGURE 3-11:COVERAGE OF A COMPONENT 40

FIGURE 3-12:FAULT TREE OF A COVERED ELEMENT 42

FIGURE 3-13:FAULT TREE FOR THE FAILURE OF A COVERED SAFETY MECHANISM 42

FIGURE 3-14:FAULT TREE THAT REPLACES THE BASIC EVENT OF A COVERED COMPONENT 43 FIGURE 4-1:FUNCTION 1 ELECTRONIC CIRCUIT 46

FIGURE 4-2:FAILURE PROPAGATION MODEL 48

FIGURE 4-3:SYSTEM MODEL –TEMPERATURE SENSOR 50

FIGURE 4-4:SYSTEM MODEL –SIGNAL ACQUISITION CIRCUIT 51 FIGURE 4-5:SYSTEM MODEL –MICROCONTROLLER AND WATCHDOG 53

(13)

FIGURE 4-8:FAULT TREE -SINGLE-POINT FAILURES 58

FIGURE 4-9:FAULT TREE –COMBINATION OF FAILURES (MULTIPLE-POINT) 58 FIGURE 4-10:FAULT TREE –COVERED FAILURE 59

FIGURE 4-11:HIP-HOPS–FAULT TREE 61 FIGURE 4-12:HIP-HOPS–FAULT TREE 62 FIGURE 4-13:HIP-HOPS–FAULT TREE 62 FIGURE 4-14:HIP-HOPS–FAULT TREE 62 FIGURE 4-15:HIP-HOPS–CUT SETS 63

FIGURE 4-16:HIP-HOPS–CUT SETS (CONTINUATION) 64

FIGURE 4-17:EXAMPLE OF A COMPONENT (C13) IN THE XML FILE 65

FIGURE 4-18:EXTRACT OF THE FUNCTION TO KNOW THE SAFETY MECHANISMS – PART 1 66 FIGURE 4-19:EXTRACT OF THE FUNCTION TO KNOW THE SAFETY MECHANISMS – PART 2 66 FIGURE 4-20:EXTRACT OF THE SETUP FOR THE DOCUMENT 67

FIGURE 4-21:EXTRACT OF THE WORKBOOK STYLE SETUP 67

FIGURE 4-22:CREATION OF THE TABLE AND THE COLUMN FOR THE COMPONENTS NAMES 68 FIGURE 4-23:COMPONENTS LOOP AND WRITING OF THE COMPONENT NAME 68

FIGURE 4-24:CALCULATIONS FOR RESIDUAL FAULTS 69 FIGURE 4-25:CALCULATIONS FOR SINGLE-POINT FAULTS 70

FIGURE 4-26:CALCULATIONS FOR MULTIPLE-POINT FAULTS (NO SAFETY MECHANISM) 70 FIGURE 4-27:CALCULATION OF THE METRICS 71

(14)

(15)

List of tables

TABLE 2.1:LEVELS OF SEVERITY OF POTENTIAL HARM [1] 8 TABLE 2.2:LEVELS OF PROBABILITY OF EXPOSURE [1] 9 TABLE 2.3:LEVELS OF CONTROLLABILITY [1] 9

TABLE 2.4:ASIL ASSIGNMENT, FUNCTION OF SEVERITY,EXPOSURE AND CONTROLLABILITY [1] 9

TABLE 3.1:EXTRACT OF TABLE D.1 IN ISO26262-ANALYSED FAULTS OR FAILURES MODES IN THE DERIVATION OF DIAGNOSTIC COVERAGE [1] 26

TABLE 3.2:TABLE D.7 IN ISO26262–RECOMMENDED SAFETY MECHANISMS FOR ANALOGUE AND DIGITAL I/O[1] 26

TABLE 3.3:EXTRACT OF FIGURE E.3 IN ISO26262– PART 5, ANNEX E 28

TABLE 3.4:POSSIBLE SOURCE FOR THE DERIVATION OF THE TARGET “SINGLE-POINT FAULT METRIC” VALUE [1] 35

TABLE 3.5:POSSIBLE SOURCE FOR THE DERIVATION OF THE TARGET “LATENT-FAULT METRIC” VALUE [1] 35 TABLE 3.6:POSSIBLE SOURCE FOR THE DERIVATION OF THE RANDOM HARDWARE FAILURE TARGET VALUES 36 TABLE 3.7:TARGETS OF FAILURE RATE CLASSES OF HARDWARE PARTS REGARDING SINGLE-POINT FAULTS [1] 37 TABLE 3.8:MAXIMUM FAILURE RATE CLASSES FOR A GIVEN DIAGNOSTIC COVERAGE OF THE HARDWARE PART – RESIDUAL

FAULTS [1] 38

TABLE 3.9:TARGETS OF FAILURE RATE CLASS AND COVERAGE OF HARDWARE PART REGARDING DUAL-POINT FAULTS [1] 39

TABLE 4.1:FAILURE INFORMATION FOR THE SYSTEM 47

TABLE 4.2:MODEL FUNCTION FAILURES AND ASSIGNED ASILS 49 TABLE 4.3:FAULT ANNOTATIONS –I71 50

TABLE 4.4:FAULT ANNOTATIONS –TEMPERATURE SENSOR 51 TABLE 4.5:FAILURE ANNOTATIONS –C13 51

TABLE 4.6:FAILURE ANNOTATIONS –R23 52 TABLE 4.7:FAILURE ANNOTATIONS –R13 52

TABLE 4.8:FAILURE ANNOTATIONS -MICROCONTROLLER 54 TABLE 4.9:DIAGNOSTIC COVERAGE -MICROCONTROLLER 54 TABLE 4.10:FAILURE ANNOTATIONS -WATCHDOG 54 TABLE 4.11:FAILURE ANNOTATIONS –TRANSISTOR T71 55 TABLE 4.12:DIAGNOSTIC COVERAGE –T71 55

TABLE 4.13:FAILURE ANNOTATIONS –R72 56 TABLE 4.14:FAILURE ANNOTATIONS –R71 56

(16)

TABLE 4.15:FAILURE ANNOTATIONS –C71 57 TABLE 4.16:FAILURE ANNOTATIONS –R74 57

TABLE 4.17:FAILURE PROPAGATION EXAMPLE –I71 57 TABLE 4.18:MINIMAL CUT-SETS OF THE SYSTEM FAILURES 60

(17)

Symbols and Abbreviations

List of Abbreviations

HiP-HOPS - Hierarchically-Performed Hazard Origin and Propagation Studies FMEA - Failure Modes and Effects Analysis

FTA – Fault Tree Analysis SILs - Safety Integrity Levels

PMHF - Probabilistic metric for Random Hardware Failures PLD – Programmable Logical Device

FPGA – Field Programmable Gate Array ECU – Electronic Control Unit

E/E/EP – Electric and/or Electronic and/or Programmable Electronic system CENELEC – European Committee for Electrotechnical Standardization ISO – International Organization for Standardization

IEC – International Electrotechnical Commission FPTC - Fault Propagation and Transformation Calculus SEFT - State-Event Fault Tree

CFT - Component Fault Tree CSA - Compositional Safety Analysis DSPN - Deterministic Stochastic Petri Net

NSGA - Non-Dominated Sorting Genetic Algorithm ADC – Analog-to-Digital Converter

LED - Light-Emitting Diode HW – Hardware

(18)

(19)

Chapter 1 Introduction

This initial chapter describes the work presented in this document. It begins with the motivation of the thesis, then the aim of the project and the different steps required to achieve it and finally, the organization of the document.

1.1. Problem

As engineering systems get more complex, new and more elaborate system failure modes are introduced due to systematic and random hardware failures. Classic manual safety and reliability analysis techniques become more difficult and error prone due to that complexity. To solve this problem, computerized tools which simplify the analysis process, need to be developed. One of those tools is called HiP-HOPS (Hierarchically-Performed Hazard Origin and Propagation Studies) and is capable of automating the synthesis and analysis of Fault Trees and FMEAs (Failure Modes and Effects Analyses). It only requires the initial component failure data to be provided, which can be reused. Furthermore, it is possible to use HiP-HOPS to optimize system models, reliability vs. cost. In this case, genetic algorithms are used to evolve initial non-optimal systems.

Some of the most complex electric/electronic systems nowadays are inside of our most common vehicles, which can contain dozens or even hundreds of ECUs. To combat the automotive industry’s critical need to protect these systems, a standard has been launched in 2011. The standard ISO 26262, is an adaptation of the universal safety standard IEC 61508 for road vehicles and contains indications for the development of security systems with E/E/EP components. Currently there is little guidance and tool support regarding the standard and its hardware evaluation metrics, so it is sensible choice to incorporate the procedures of the standard, extending HiP-HOPS capabilities and providing the Automotive Industry with an even more interesting tool.

(20)

The aim of this Thesis is to propose ways to change the HiP-HOPS to better support the hardware architecture verification procedures of ISO 26262.

Firstly, there is the need to understand how the most used manual safety analysis techniques, FTA and FMEA, work has they are used by HiP-HOPS.

In the following stage, the HiP-HOPS methodology must be explored. HiP-HOPS uses a failure model, which is different from the functional one.

The next step will be to study ISO 26262, especially Part 5 - Product development at the hardware level, to understand how ISO 26262 hardware architecture verification procedures work, and then figure out how it is possible to include these procedures in HiP-HOPS.

To validate the verification procedures a model will be necessary, so a simple electronic system will be chosen, the failure model of that system will be designed and the failure behavior information of its components inserted.

Finally, a program will be made to apply the evaluation metrics to the model and safety mechanisms will be incorporated in HiP-HOPS. These are components or procedures implemented to prevent safety goal violation.

The required major tasks to be performed during this dissertation are the following:

 Establishment of the differences between FTA in HiP-HOPS and FTA in ISO 26262;

 Establishment of how to map Safety Mechanisms in FTA;

 Analysis of ISO 26262 hardware evaluation techniques;

 Development of a failure model for a simple electronic system;

 Establishment of the failure behavior of the model and its components;

 Creation of a fault tree, manually, to verify if the hardware evaluation techniques work;

 Development of a program or/and functions in HiP-HOPS that reads information from the model and applies the hardware evaluation techniques;

1.3. Document Outline

The structure of this document is as follows:

 In the current chapter the content of this document and motivation of the project are introduced.

 Chapter 2 is devoted to the Literature Review: Functional Safety and brief introduction to SILs, safety and reliability analysis techniques, introduction to HiP-HOPS and overview of the ISO 26262 standard.

 Chapter 3 will focus on ISO 26262 part 5- Product development at the hardware level, specifically in what are considered safety mechanisms and how the hardware architectural evaluation techniques are implemented. Then, there will be the proposition of methods to implement these concepts in HiP-HOPS.

 In Chapter 4, the HiP-HOPS changes will be explained and the concepts found in ISO 26262 part 5, analyzed in chapter 3 and implemented in HiP-HOPS will be validated through a case study.

 Chapter 5 is the general review of what has been achieved and the final conclusions reached.

(21)

(22)

Chapter 2 Literature and Concepts review

This Chapter is to review the thesis main concepts. Safety Analysis is introduced and the concept of SILs introduced. The main techniques to perform Safety Analysis are exposed with special emphasis to FTA. The state-of-the-art safety tool, HiP-HOPS methodology is explored and its capabilities reviewed and, finally, the safety standard ISO 26262 is briefly introduced.

2.1. Functional Safety and SILs

Functional Safety standards aid system designers with guidelines for the development of E/E/EP systems with the ability to perform safety functions. SILs are used to allocate safety requirements to the system’s components in order to prevent unacceptable risk. This section clarifies what is safety and Functional Safety and introduces the SILs concept.

2.1.1. Safety, Functional Safety and other important concepts

According to safety standards such as IEC 61508, and CENELEC safety is defined as: “freedom from unacceptable risk of physical injury or of damage to the health of people, either directly or indirectly as a result of damage to property or to environment”.

Functional Safety is defined as: “absence of unreasonable risk due to hazards caused by malfunction behavior of E/E/EP systems” [1].

Other important concepts connected to Functional Safety are presented below [1]:

 Harm: “Physical injury or damage to the health of persons”;

 Hazard: “A potential source of Harm”;

 Risk: “Risk can be described as a function of a frequency of occurrence of hazardous events, the ability to avoid specific harm or damage through timely reactions of the persons involved (controllability) and the potential severity of the resulting harm or damage”;

 Residual Risk: “Risk remaining after protective measures have been taken”;

 Unreasonable Risk: “Risk judged to be unacceptable in a certain context, according to valid societal moral concepts”;

(23)

 Failure: “Termination of the ability of an element to perform a function as required”;

 Failure Mode: “Manner in which an element or an item fails”;

 Systematic Failure: “Failure related in a deterministic way to a certain cause, that can only be eliminated by a change of the design or of the manufacturing process, operational procedures, documentation or other relevant factors”;

 Random Hardware Failure: “Failure that can occur unpredictably during the lifetime of a hardware element and that follows a probability distribution”;

 Safety-Related Systems: “Systems that perform a function or functions that ensure that risks are kept at an acceptable level”;

 Safety Integrity: “The probability of a safety-related system satisfactorily performing the required safety functions under all the stated conditions within a stated period of time”;

 Safety Goal: “Top-level safety requirement as a result of the hazard analysis and risk assessment”;

 Safety Measure: “Activity or technical solution to avoid or control Systematic Failures and to detect Random Hardware Failures or control Random Hardware Failures or mitigated their harmful effects”;

2.1.2. Functional Safety Standards

The international standard of Functional Safety, applicable to all industries, is IEC 61508, entitled “Functional Safety of Electrical/Electronic/Programmable Electronic Safety-Related systems”. It provides requirements and guidelines that enable the development of safety-related systems with a safe framework.

However, the need for different standards, applicable to specific industries, led to the creation of different IEC 61508 branches. Examples of these adaptations are present bellow:

 ISO 26262 – Automotive Industry;

 EN 5012X – Railway Industry;

 IEC 61511 – Process Industry.

This thesis is intended to use the concepts in the ISO 26262 – “Road Vehicles – Functional Safety” adaptation, since it is a recent standard (2011) and there is little guidance and tool support.

2.1.3. SILs

Safety-related systems are used to make sure that risks are on an acceptable level. The E/E/EP systems in which they are applied have Safety Goals and Safety Requirements, that state which is that acceptable level of risk.

Safety requirements are derived through a process which uses functional hazard assessment and risk analysis techniques. This process aims to determine: (1) critical system functions, (2) safety requirements for hazards that cannot be avoided, related to those functions and (3) demands for additional safety functions, to achieve acceptable levels of risk.

Safety Integrity Levels (SILs) are classification levels used in safety-critical systems. They were developed in the UK as Safety Health & Safety guidelines and adopted by IEC 61508 and

(24)

other standards. Five levels of SILs are used: SIL0 (no additional safety requirements) to SIL4 (high safety requirements demand). In SIL decomposition, architectural elements are assigned lower or equal SILs, which combined should fulfil the SIL of their parent functions. This is a manual and time demanding process and as networked architectures get more complex, with multiple safety functions, an automated solution is needed to keep up with the increased difficulty.

SILs are used to allocate functional safety requirements to a system and give requirements for the implementation of critical functions. A function with lower required failure probability has to have a higher SIL. The distribution of the integrity levels in subsystem components can be arranged in different ways. For example in a subsystem with SIL four, the components can have three level two SIL or a single level four SIL. Elimination of common cause failure is required when fault tolerant architectures are employed to achieve the required SIL. Evidence must be given that the software and hardware elements of each function meet their allocated SILs. The Standards specify that failure rate prediction to assess if SILs have been met is only possible in random hardware failures. The techniques used are probabilistic assessment using failure rates and reliability prediction models (FMEA, FTA, and Markov). Standards also consider the systematic faults in software and hardware introduced by humans in specification, design, manufacturing and installation, although these faults cannot be quantified.

That said, this section describing SILs is merely introductory, as it is important for safety analysis. Although, it does not have a big relevance for this thesis.

2.2. ISO 26262

This section is to introduce the recent Functional Safety Standard for the automotive industry, ISO 26262. Automotive SILs (ASILs) will also be mentioned. The thesis focuses on this Standard.

2.2.1. ISO 26262 Introduction

It is the adaptation of IEC 61508 to electric, electronic and software systems for road vehicles. Safety is extremely important in automotive development. Such functions as driver assistance, propulsion, vehicle dynamic control and passive/active safety systems are in further development. The need for safe system development processes and the ability to verify if the system safety objectives are satisfied is increasingly important.

ISO 26262 consists of the following parts, under the general title Road vehicles – Functional safety [1]:

 Part 1: Vocabulary – terms and definitions used throughout the standard;

 Part 2: Management of functional safety – specifies requirements for the management of functional safety for automotive applications;

 Part 3: Concept phase – specifies requirements for risk analysis and assessment and for the definition of a functional safety concept;

(25)

 Part 4: Product development at the system level – specifies the requirements for product development at system level;

 Part 5: Product development at the hardware level - specifies the requirements for product development at hardware level;

 Part 6: Product development at the software level - specifies the requirements for product development at software level;

 Part 7: Production and operation – specifies the requirements for production, operation and decommissioning services;

 Part 8: Supporting processes – specifies the requirements for support processes, such as documentation and qualification tools;

 Part 9: Automotive Safety Integrity Level (ASIL)-oriented and safety-oriented analyses – specifies the requirements for ASIL-oriented analysis and introduces the ASIL decomposition approach;

 Part 10: Guideline on ISO 26262 – overview of ISO 26262 as well as insights into the other parts of the standard.

Technological complexity is also on the rise so, there are increased risks from systematic failures and random hardware failures. ISO 26262 provides guidelines to avoid these risks, through appropriate requirements and processes.

System safety is achieved through safety measures, implemented in several technologies (mechanical, hydraulic, electrical...) and used in different levels of the development process. It provides a framework:

a) Provides an automotive safety lifecycle (management, development, production, operation, service, decommissioning) and supports the development of the necessary activities in these phases);

b) Provides automotive risk-based approach to determine ASILs;

c) Uses ASILs to specify requirements of ISO 26262 to avoid unreasonable residual risk; d) Provides requirements for validation and confirmation measures to ensure an

acceptable level of safety;

e) Provides requirements for relations with suppliers.

Functional safety is influenced by the development process (activities: requirements, specification, design, implementation, integration, verification, validation and configuration), the production, service and management processes.

In Figure 2-1, the structure of ISO 26262 is shown. Phases of product development are based on a V-model.

(26)

Figure 2-1: Overview of ISO 26262 [1]

2.2.2. ASILs

These are the integrity levels in ISO 26262, similar to SILs. There are 5 ASILs: QM (no integrity requirements), A, B, C and D (the highest safety requirement).

ASILs are assigned to hazards that may contribute to the violation of the safety goals, according to the following parameters:

 Severity of potential harm (S) – degree of physical damage caused to each endangered individual, including the driver, passengers and other traffic participants.

Table 2.1: Levels of Severity of Potential Harm [1]

 Probability of exposure (E) – the frequency and how long individuals are exposed to the hazardous event.

(27)

Table 2.2: Levels of Probability of Exposure [1]

 Controllability (C) – the ability of the driver or other traffic participants Table 2.3: Levels of controllability [1]

Then the assigned ASIL to each hazard is inferred with the rules presented in table 2.4. Table 2.4: ASIL assignment, function of Severity, Exposure and Controllability [1]

These ASILs will be used to figure out the target values for the hardware evaluation metrics that are going to be presented in chapter 3.

The events safety analysis have ASILs. The next section (2.3) introduces the two most important safety analysis techniques.

2.3. Classic safety analysis techniques: FTA and FMEA

FTA (Fault Tree Analysis) is a top-down (deductive) graphical representation of logical combinations of failures. It consists of a top event (system failure), connected to basic event(s) using logical gates, like AND or OR. Basic events generally are component failures or events expected to happen as normal operation of the system. The analysis is composed by two parts: qualitative (logical) and quantitative (probabilistic) analysis. The objective of the first is to reduce the logical expressions represented in the Fault Tree into a set of minimal cut sets, which are the smallest combination of failures required to cause the top event. Quantitative

(28)

analysis is used to calculate the probability of the top event, with the probability of each of the basic events, this is only possible if the failure rate of the basic components is known [6].

FMEA is a bottom-up (inductive) technique, in which lists are compiled for all the possible failure modes, using failures of the various parts of the system to infer the effects on the rest of the system. These effects are usually evaluated according to a number of criteria, such as severity, probability and detectability that are then combined into an overall estimate of risk. All the information is combined in a table which allows to quickly see what the effects of each failure mode are [6].

Both techniques are useful and provide important information about the systems, but both suffer from the same flaw, they are manual techniques and the process to perform them can be difficult and time-demanding, especially for large and complex systems. This makes the whole process error prone or the results too numerous to interpret efficiently. Despite the manual process being still predominant, there have been for some time now tools that support the analysis, automating certain aspects of it [6].

2.4. Modern Safety Analysis Tools and Techniques

These appear as the need for automating parts or all the process of FTA and FMEA increases. There are two categories for these tools. The first is compositional safety analysis approach and consists in the development of formal and semi-formal languages to enable composition, specification and analysis of system failure behavior, based on safety information about the components. The majority of these tools are semi-automated, as they need manual entry of the components failure information and can be useful in a model-based design process (e.g.: HiP-HOPS). The second category involves more rigorous models to enable the analysis of the effects of failures by simulating them and checking if the system meets the safety goals. To achieve this, the tool adapts formal verification methods to support safety analysis [6].

2.4.1. Compositional Safety Analysis Techniques (CSA)

The first technique of this type is the Failure Propagation and Transformation Notation (FPTN), a graphical description of the failure behavior of the system. This uses component modules connected by inputs and outputs to other modules, allowing combination and propagation of failures, also they can be decomposed by subsystems. Its purpose is to create a connection between deductive FTA and inductive FMEA, in order to study cause and effect. The inconvenience is the need to create an error model separate from the system model that can become desynchronized from the original system. That being said, this technique is not used to system analysis or design optimization, but only for describing specifications of failure.

Next comes Fault Propagation and Transformation Calculus (FPTC) that links the failure model to the system model. It defines failure classes (e.g.: omission, commission and value errors) that are specified in annotations in the components of the system model. Then failure information is transmitted to the rest of the system by a set of expressions that detail how failures are transformed and propagated from input to output (mitigation is possible by transforming failure into natural behavior). This a “token-passing network”, determining the effects of component failure in the entire system. It also allows quantitative analysis by

(29)

including probabilities in the expressions. The advantage over FPTN is the use of system models, so any small changes to it do not require a new failure model, but only the update of some failure expressions. The disadvantage is that FPTC is inductive, because it relies on injecting failures into the system, so it is difficult to achieve the information given by an FTA, also it is prone to combinatorial explosion.

Other techniques based on FPTN are State-Event Fault Trees (SEFTs) and Component Fault Trees (CFTs). The last technique is a failure logic of components defined graphically as interconnected Fault Trees, the HiP-HOPS tool is similar to this approach. These are not affected by combinatorial explosion as much as FPTC. SEFTs are based on CFTs and allow analysis of dynamic systems as it distinguishes between a system in a state (over a period of time) and an event that triggers a state transition (instantaneous). The failure behavior is modelled at component level, but enables the representation of sequences and allows negation (e.g.: event that has not happened yet) with the NOT gate. By being more complex, SEFTs can be analyzed with FTA, so it uses a conversion to Deterministic Stochastic Petri Nets (DSPNs).

2.5. Automatic optimization of system reliability

As said before, manual safety analysis to evaluate different designs is highly time-consuming and restricts the number of design candidates. This process needs to be automated, to examine the most potential designs and choose the best suited for the objectives (cost, time, quality...). Even with modern computers, it is not possible to evaluate the total design space, particularly if multiple methods of variability are taken into account (e.g.: swapping a system architecture for another and substituting one component with an alternative architecture). Another problem is the main concern of system designers, cost. A balance between the two conflicting goals of increasing reliability and cost reduction is required.

This section is just introductory, as HiP-HOPS allows the automatic optimization of system designs. However, it will not be further developed, as it is not truly important to this thesis.

2.5.1. Different optimization approaches

The goal of the optimization is to find optimal solutions without the problem of getting stuck in a local optimum. To reach this goal, there are different algorithms and almost all use meta-heuristics.

One of this algorithms is tabu search, which is based on evaluation functions (e.g.: if a solution has better characteristics, it is used for the next iteration), that can have multiple objectives. This algorithm has memory, to prevent looping or getting stuck in a local optimum (tabu solutions) [6].

Another technique is to use genetic algorithms. In this approach, a population of different candidates is evaluated and the best individuals are chosen to reproduce and form the basis of the next generation of candidates. Each one of these candidates has a genetic encoding, which encloses its characteristics, then happens the crossover of the encodings. Random mutation also happens to ensure a greater portion of search space and to avoid getting stuck in a local optimum. There are different forms of genetic algorithms, including penalty-based approach, in which the multiple objectives are combined into one function. One objective is optimized, but the others are imposed constraints and if infringed, a penalty is subtracted from the fitness score of the candidate. Another approach of genetic algorithms is the Non-Dominated Sorting

(30)

Genetic Algorithm (NSGA-II), a objective technique which works by constructing a multi-dimensional graph, where the current solutions within a population are shown (Figure 2-2). Given a solution B, if a solution A is better in at least one objective and no worse in any of the others, then solution B is dominated by solution A. The Non-Dominated solutions are named Pareto front and they are the optimal solutions. These solutions represent trade-offs between all the objectives [6].

Figure 2-2: Dominated and Non-Dominated Solutions [6]

HiP-HOPS uses a modified version of the NSGA-II genetic algorithm technique.

2.6. HiP-HOPS and Safety Analysis

HiP-HOPS is a model-based semi-automatic compositional safety and reliability analysis tool that was created and continues to be further developed at the University of Hull by the Dependable Systems group. In these sections, the potentialities of the tool will be discussed and a brief example of its operation in combination with MatLab Simulink will be demonstrated.

2.6.1. HiP-HOPS as a state-of-the-art tool

HiP-HOPS requires a set of local component failure data in a system model, describing how combinations of internal failure modes and deviations at the components inputs generate output failures. Using that information, HiP-HOPS automatically synthesizes a network of interconnected Fault Trees which shows the propagation of failures through the system and a Failure Mode and Effects Analysis (FMEA) demonstrating component’s failure modes and their effects on high level system functionalities.

The capability of obtaining those results automatically is extremely important as the electronic systems tend to get more complex. The classical manual techniques tend to be performed only once, either after the system has been designed to check if it is reliable, or after the system is in operation and fails, to find out the error. This approach does not use the full potential of FTA or FMEA. The ideal scenario is to use FTA and FMEA during the design process itself, so that a system is designed with safety and reliability in mind. By executing safety analysis as part of an iterative process, it is possible to find and remedy potential flaws much earlier, saving time, effort and money and producing a better product.

(31)

HiP-HOPS can be used as soon as a system’s concept can be turned into a model that identifies components and material, energy or data transactions amongst them. Models can be arranged hierarchically to deal with complexity. The tool can be used to analyse a variety of systems, such as fluidic, electrical, electronic or mechanical [3].

Other capabilities of HiP-HOPS that contribute to the field of dependability include:

 Temporal Fault Trees Generation:

It has become apparent that standard FTA and FMEA are poorly suited to analyse systems in which time plays an important role [11]. Those systems may have multiple states of operation, or a specific sequence of events may need to occur to originate a fault. To analyse the failure behaviour of those systems, multiple fault trees are usually created to account for each of the system’s states. Another solution is to enclose the temporal constraints within event description. The two solutions can be unsatisfactory; the first can lead to complex and fragmented analysis and the second can hide important temporal relations [11]. To overcome the limitations of fault trees and FMEA concerning the referred systems, HiP-HOPS assesses sequences of failures via synthesis of temporal fault trees and FMEAs. This technique is a result of the integration of HiP-HOPS and Pandora. The latter is a method that enables modelling and analysis of dynamic failure behaviour systems via extensions to Boolean Logic [11].

 Multi-Objective optimization:

When there are various alternatives for a system’s architecture, they must be assessed in accordance with the specific developer’s interests. This is a difficult, multi-objective problem that can only be undertaken with the aid of computerized algorithms that allow effectively optimal solutions to be found in large design spaces [3]. HiP-HOPS is capable of evaluating a set of possible design alternatives according to user defined parameters such as cost and dependability. The assessment of the different solutions is performed using a multi-objective genetic algorithm that exploits the automated fault tree and FMEA synthesis algorithms in order to find Pareto optimal architectures. These represent the optimal trade-offs between the objective parameters considered by the system’s designer [3].

HiP-HOPS has the capability to consider preventive maintenance to evaluate dependability and availability and through it, using also a genetic algorithm, the tool is capable of performing multi-objective optimization of preventive maintenance schedules [5].

 Linguistic concepts for representation and reuse of component failure patterns: Well-established failure data can be stored in a library and reused posteriorly. This process saves time and effort and it can be seen as a means for analysts who are unfamiliar with a system to access viable information. Recurring patterns, like fail-silent behavior components, for example, are the most promising elements of this technique [9]. The limitation faced by the failure behavior reuse is the lack of a robust and machine-readable method to specify failure data [9]. HiP-HOPS has the capability to solve this problem using Generalized Failure Logic (GFL). Through a combination of Boolean Logic and generalized references to component

(32)

ports and failure classes, GFL allows writing expressions that are applicable to multiple components with the same failure behavior, even if their interfaces differ [9].

2.6.2. Safety Analysis with HiP-HOPS

The HiP-HOPS safety analysis results in the capture of a system’s behavior through the generation of fault trees (and posterior analysis) and FMEAs. This process is divided in three different stages [6]:

 1st_{Modelling: system modelling and failure annotation;}

 2nd_{Synthesis: fault tree generation;}

 3rd_{Analysis: FTA and FMEA synthesis.}

The three phases are described below.

Figure 2-3: HiP-HOPS phases’ overview [2]

2.6.2.1. Modelling Phase

This is the phase in the HiP-HOPS methodology that remains manual. To build fault trees, HiP-HOPS must be inputted with information about the relationships between components and the ways that they can fail [6]. The user has to provide this information on the system model. Currently, HiP-HOPS is capable of working with modelling tools such as Matlab Simulink, Eclipse-based UML tools and ITI SimulationX [6]. In the model, the sub-systems are conceptual blocks and data flow diagrams. Developing these models in a modular manner is important to allow easy modification and re-use in the future.

The component’s or subsystem’s failure data is a set of logical expressions that express how their outputs can be deviated. Those logical failure expressions should include information about the component’s internal failures (basic events in the fault trees) and the inputs that hold any type of deviation. The type of deviations can differ, for instance, they can be related to a component failing to provide an output when requested (omission) or providing one, when

(33)

it was not requested (commission); deviations of value also exist, with either outputs with a different value (higher or lower) than the correct one or temporarily incorrect (late or early). As said before, the set of logical expressions used to illustrate the component’s failure behavior, can be stored and re-used. A generalized example of a component’s commission deviated output is presented below:

𝐶𝑜𝑚𝑚𝑖𝑠𝑠𝑖𝑜𝑛 − 𝑂𝑢𝑡1 = (𝐶𝑜𝑚𝑚𝑖𝑠𝑠𝑖𝑜𝑛 − 𝐼𝑛1 𝐴𝑁𝐷 𝐶𝑜𝑚𝑚𝑖𝑠𝑠𝑖𝑜𝑛 − 𝐼𝑛2)𝑂𝑅 𝐼𝑛𝑡𝑒𝑟𝑛𝑎𝑙𝐹𝑎𝑖𝑙𝑢𝑟𝑒1 The Commission of output1 (Out1) is caused either by the combination of the Commission of input1 and input2 or by an internal failure of the component.

Generalizing the HiP-HOPS terminology rules:

 Inputs and outputs: “type of deviation” – “name of the I/O in the model”;

 Internal failures can assume the form of valid identifiers. For example: “short circuit”.

Figure 2-4: Matlab Component Failure Editor [2]

So, to begin component failure annotation, the user must select the component and open the failure editor, seen in figure 2-4. Then then component’s internal failures or basic events can be added. In this example, the component has one basic event, named EMI (Electro Magnetic Interference – Figure 2-5).

(34)

Figure 2-5: Inserting a component’s basic event [2]

The following step is to insert the failure expressions for the component’s output deviations. In this case, the output deviation of the component is an omission, caused just by its internal failure,

Figure 2-6: Inserting a component’s output deviation [2]

This is also the phase, when the system’s output ports should be identified, as they will relate to the top events (hazards) of the fault trees generated in the synthesis phase. In case of SIL usage, hazards must be identified and assigned SILs. These can be associated with one or several of the system’s output ports. Figure 2-7 shows an example of hazard annotation. This hazard occurs with the combination of the two system output ports and has a level 4 SIL.

(35)

Figure 2-7: Creation of a Hazard for the system [2]

2.6.2.2. Synthesis Phase

The second phase is the fault tree synthesis. HiP-HOPS automatically creates a set of fault trees with the system’s output port deviations as the top events or, in case of SIL allocation, the system’s hazards, by combining failure data for individual components and sub-systems. The result is a network of interconnected fault trees, which define the relationships between failures of system outputs and their causes in the failure of components. This is a deductive process, working backwards from system outputs to determine which individual components cause those failures with logical combinations, such AND and OR ports.

The generated component fault trees are then combined into one fault tree per deviated output or hazard. These fault trees can then be read by the software Isograph Fault Tree++, to make the visualization clearer.

2.6.2.3. Analysis Phase

In the second phase, the result of the synthesis process is one or more interconnected Fault Trees, so the next stage is to analyse them using FTA. Fault Trees tend to be large and complex, so by reducing them to their minimal cut-sets, we have the relationship between the minimal combinations of basic events that lead to the top level failure, without the intermediate propagation paths. [2]

The cut-set generating algorithm used by HiP-HOPS is a modified MICSUP (Minimal Cut Sets Upwards), it is a bottom-up algorithm. [2]

The following Boolean laws are applied to obtain the minimal cut-sets: NOTE: “E1” and “E2” represent basic event 1 and basic event 2.

 Law of absorption:

(36)

The cut set “E1.E2” was removed as the event “E1” alone is sufficient to cause the top event. [2]

 Laws of idempotence:

E1. E1 = E1

Repeated events in the same cut-set are removed [2]. 𝐸1 + 𝐸1 = 𝐸1

Repeated cut-sets are removed [2].

Once the minimal cut-sets are identified, HiP-HOPS uses the results to perform the allocation of safety integrity levels; also, a qualitative analysis can be applied to generate FMEAs. The minimal cut sets have the non-redundant propagation of failure in the fault tree and an algorithm is used to catalogue each component failure mode and note which system failure they cause [2]. The FMEA shows the direct relationship between component failures and system failures, so it is possible to quickly verify how a failure of a given component affects everything else in the system and the likelihood of that happening. A classic FMEA only shows the direct effects of single failure modes, but because HiP-HOPS FMEA’s is generated from Fault Trees, the further effects of the failure are also shown. This concept is illustrated in figure 2-8 [2].

(37)

Figure 2-8: Conversion of Fault Trees to FMEA [2]

In Figure 2-8, “F1” and “F2” are system Failures and “C1” to “C9” are component Failures. For “C3”, “C4”, “C6” and “C7” there are no direct effects on the system, but only if a single one of this components fails, if, for example, “C3” and “C4” both happen, “F1” will occur.

A quantitative analysis can also be done, to calculate the system unavailability, QS (if basic

events have quantitative data [2]):

Q𝑆= 1 − ∏(1 − 𝑄𝐶𝑆𝑖) 𝑛

𝑖 =1

𝑛 − 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑐𝑢𝑡 𝑠𝑒𝑡𝑠 𝑄𝐶𝑆𝑖− 𝑢𝑛𝑎𝑣𝑎𝑖𝑙𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑐𝑢𝑡 𝑠𝑒𝑡 𝑖

The resultant fault trees, cut sets, FMEAs and SIL allocations are presented in html format. Currently, the files generated can only be opened with Internet Explorer [16].

(38)

Figure 2-9: Fault tree analysis summary [2]

Each fault tree has a name, which is constructed from the name of the output deviation they come from. Clicking on a fault tree name opens that fault tree results (figure 2-10) [2].

Figure 2-10: Fault tree results [2]

The user can also see the cut-sets of basic events that cause the hazard or output deviation and the FMEA results. Figure 2-11 shows an example of cut-set results and figure 2-12 an example of a FMEA table.

(39)

Figure 2-11: Cut-set results [2]

(40)

2.7. Summary

In this chapter it became clear that safety analysis plays a major role in hardware development, and that with the increasing complexity of E/E/EP systems, automatic safety analysis tools like HiP-HOPS are increasingly important to produce more reliable products. It was also evident how easy it is to use HiP-HOPS.

HiP-HOPS lacks some of the ideas and rules in ISO 26262, for hardware architecture evaluation. The next chapter will be an in-depth analysis of ISO 26262 – part 5, particularly, how are safety mechanisms modelled according to the standard and how to use the hardware verification metrics present there.

(41)

Chapter 3 Hardware evaluation techniques

This chapter is a full analysis of ISO 26262 – part 5: product development at hardware level. The intention is to understand how safety mechanisms should be modelled according to the standard and what hardware evaluation metrics are. Then, changes to HiP-HOPS will be proposed, to accommodate these concepts.

3.1. Introduction

This part of the standard specifies requirements for product development at hardware level for automotive applications, such as [1]:

 Requirements for the initiation of product development;

 Specification of the hardware safety requirements;

 Hardware design;

 Hardware architectural metrics;

 Evaluation of violation of the safety goal due to random hardware failures.

This requirements are applicable to non-programmable and programmable elements, like FPGA and PLD [1].

The steps required for the product development process, according to ISO 26262, are resumed in figure 3-1. The phases, especially explored in this thesis, are highlighted, “evaluation of the hardware architectural metrics” and “evaluation of safety goal violations due to random hardware failures”. There are two sections dedicated to those phases, sections 8 and 9 in ISO 26262 – part 5, which describe two alternatives to evaluate if the residual risk of safety goal violations is sufficiently low. They accomplish this by either using a global probabilistic approach or by using a cut-set analysis to study the impact of each identified fault of a hardware element upon the violation of the safety goals [1].

(42)

Figure 3-1: Steps of product development at hardware level

3.2. Safety Mechanisms

Following the ISO 26262 product development method, the designer has to specify hardware safety requirements, which are no more than design constraints. These safety requirements shall include:

 Safety mechanisms to control relevant internal failures of the hardware element;

 Safety mechanisms to ensure the element is tolerant to external failures;

 Safety Mechanisms to detect and signal internal or external failures, to prevent faults from being latent;

 Other requirements not related to safety mechanisms that assure the avoidance of dangerous behaviour.

3.2.1. Safety Mechanisms Definition

According to ISO 26262, a Safety Mechanism is a technical solution implemented by E/E functions or elements, or by other technologies, to detect faults or control failures in order to achieve or maintain a safe state [1]. Safety mechanisms are implemented to prevent faults from leading to single-point failures or to reduce residual failures and to prevent faults from being latent [1]. Random hardware failures (relevant in this case) result from aging/wear-out, aggressive environment and manufacturing process variations.

Currently there is little work and tools that quantify the effect of the safety mechanisms and make use of the ISO 26262 architectural metrics. In the design phase, design choices should

(43)

be compared with the safety mechanisms included, as these contribute to the final failure rate of the system. In ISO 26262-part 5, safety mechanisms are recommended to certain component faults.

Other important concepts, related to Safety Mechanisms are presented below:

 Diagnostic Coverage: “proportion of the hardware element failure rate that is detected or controlled by the implemented safety mechanisms [1]”;

3.2.2. Choice of Safety Mechanisms

ISO 26262 offers a guideline to choose appropriate safety mechanisms to be implemented in the E/E architecture to detect failures of elements [1].

The Annex D, in ISO 26262 part 5 is intended to be used as [1]:

a) An evaluation of the Diagnostic Coverage to produce a rationale for: 1) The compliance with the single-point and latent faults metrics;

2) The compliance with the evaluation of the safety goal violations due to random hardware failures:

b) A guideline in order to choose appropriate safety mechanisms to be implemented in the E/E architecture to detect failures of elements. [1]

Table 3.1 is an extract of Table D.1 in ISO 26262 part 5, which shows typical faults of the hardware elements of a generic embedded system (Figure 3-2) and provides guidelines which are adapted based on analysis of the system elements.

Three levels of achievable diagnostic coverage are considered: low- 60%, medium- 90% and high- 99%. Then Table D.1 indicates, for each element, the table (table D.2 to D.14) where guidelines for safety mechanisms are given. Other techniques can be used, if provided evidence is available to support the claimed diagnostic coverage. [1]

(44)

Table 3.1: Extract of table D.1 in ISO 26262- Analysed faults or failures modes in the derivation of diagnostic coverage [1]

Table 3.2: Table D.7 in ISO 26262– Recommended Safety Mechanisms for Analogue and Digital I/O [1]

After choosing the Safety Mechanism technique for the indicated elements, ISO 26262 provides an overview of that technique. The next few lines contain an extract of those overviews, for different types of elements.

(45)

For electrical elements, such as relays or sensors, the common safety mechanism selected for this study is the Comparator [1]:

 Aim: To detect (non-simultaneous) failures in independent hardware or software.

 Description: The output signals of independent hardware are compared cyclically or continuously by a comparator. For example, two processing units exchange data reciprocally. A comparison is carried out using software in each unit and detected differences lead to a failure message.

Processing Units are some of the most complex elements in a hardware architecture, they can have hardware or software safety mechanisms, sometimes even both. I have selected the self-test supported by hardware [1]:

 Description: additional special hardware to support self-test functions to detect failures in the processing unit and other sub-elements (e.g.: EDC coder/decoder) at a gate level. Typically it only runs at the initialization or power-down of the processing unit due to its intrusive nature. It is usually used for multipoint fault detection.

 Example: For sub-elements like EDC coders/decoders, a special HW mechanism (e.g.: logic BIST) can be added to generate inputs to the coder-decoder and check the results. These inputs are usually generated by pattern generators (e.g. MISR).

NOTE: Logic BIST - Logic built-in self-test (or LBIST) is a form of built-in self-test (BIST) in which hardware and/or software is built into integrated circuits allowing them to test their own operation, as opposed to reliance on external automated test equipment.

For actuators, the example is a technique named Monitoring [1]:

 Aim: detect incorrect operation of an actuator

 Description: The operation of the actuator is monitored. Can be done at the actuator level by physical parameter measurements but also at system level regarding the actuator failure effect.

 Example: A cooling fan, monitoring at system level uses a temperature sensor to detect failure. Monitoring of physical parameters measures the voltage, current or both on the inputs of the cooling fan.

3.2.3. How to Model Safety Mechanisms

In the sections above, safety mechanisms were introduced and examples of guidelines to properly select them according to the E/E element in cause were given. The remaining issue is how a designer can model a safety mechanism in a system model.

ISO 26262 - part 5, annex E is an example of metrics calculation. That example can help understand, along with other parts of the standard, the constraints of safety mechanism design.

The following conclusions were reached concerning safety mechanism insertion in system’s models:

 One Safety Mechanism can cover several failure modes of different components, considered in the safety analysis for the same safety goal.

(46)

This point is evident in Table 3.3. SM2 (safety mechanism 2) is a comparator that compares the values of two inputs. The components that are covered by that safety mechanism influence, somehow, the values checked by it;

Table 3.3: Extract of Figure E.3 in ISO 26262 – part 5, annex E

 Safety Mechanisms should have a flexible description, regarding its particular attributes, as they can assume different forms (software, hardware, different algorithms…) [10];

 The applicability of specific safety mechanisms to a certain component has to be resolved by component class hierarchy [10];

 Architectural evolution must be considered by safety mechanisms [10];

 A safety mechanism can be a component or a part of a component. For example, a microcontroller can have self-check hardware in its architecture or an external watchdog.

In [10] the authors propose an interesting way to model safety mechanisms. They consider a model represented by a set of computing nodes, where functional networks are responsible for the execution of each item and network or bus connections between those [10]. A node is modelled by a list of component and a list of safety mechanism instances [{Cj}, {SMk}]. A component Ci is a source of a set of failure effects {FECj,k} that can lead to the violation of the

safety goal on the top level. These failure effects are included into the Component’s faultHypotesis. A failure effect FE can be caused by a number of different failure modes {FMi}, which are associated by FaultHypothesis with a fraction KFMi,Cj as percentage of the failure rate

for an effect caused by a specific failure mode [10].

A safety mechanism is modelled as possessing one or more detection capabilities. An object of the class DetectionCapability characterizes mechanism’s coverage DC of specific failure modes, of specific component classes CC, that can be represented by [{CCi, FMi, DCi}]. When a safety mechanism is instantiated in a model, it is applied to one or more components by adding it to the mechanismsApplied reference list, or to the implicitMechanisms reference list, if the SM is implicit to that component [10].

(47)

The proposed metamodel, described by the paragraphs above can be seen in figure 3-3.

Figure 3-3: MetaModel for the safety mechanisms implementation proposed in [10]

The next chapter will contain a detailed explanation of how safety mechanisms were exactly modelled in the system model to account for the hardware evaluation metrics and for fault tree analysis with HiP-HOPS.

3.3. Fault classification of a hardware element

Before being able to evaluate hardware architectures through the metrics or safety goal violations by the other methods, the user must know how to classify hardware failure modes.

There are six different types of faults in a component:

 Safe Fault: “fault whose occurrence will not significantly increase the probability of violation of a safety goal [10]”;

 Single Point Fault (SPF): “fault in an element that is not covered by a safety mechanism and that leads directly to the violation of a safety goal [1]”;

 Residual Fault: “portion of a fault that by itself leads to the violation of a safety goal, occurring in a hardware element, where that portion of the fault is not covered by safety mechanisms [1]”;

 Multiple-Point Fault (MPF): “individual fault that, in combination with other independent faults, leads to a multiple-point failure [1]”;

o Detected MPF: “Multiple-Point Fault that is detected, within a prescribed time, by a safety mechanism, that prevents it from being Latent;

o Perceived MPF: “Multiple-Point Fault whose presence is deducted by the driver within a prescribed time interval”;

o Latent MPF: “Multiple-Point Fault whose presence is not detected by a safety mechanism nor perceived by the driver within the multiple-point fault detection interval [1]”;

NOTE 1: Multiple-Point fault detection interval: “time span to detect a multiple-point fault before it can contribute to a multiple-point failure [1]”.

NOTE 2: Multiple-Point faults with more than two elements are not considered in the analysis, except if it can be proven that they are relevant.

(48)

These different types of fault classifications are demonstrated in figure 3-4.

Figure 3-4: Failure Modes of an HW element

Knowing the different types of fault classifications, the user has to understand how to attribute them to a failure mode. Figure 3-5 is a very useful flow diagram, which points the user in the right direction for failure mode classification.

(49)

3.4. ISO 26262 Hardware Metrics

As it has been said before, the 5th part of the ISO 26262 presents two metrics to evaluate the effectiveness of hardware architectures to cope with random hardware failures [1]. The results are then compared to target values, if those targets are not achieved, then the architecture must be improved by changing components or safety mechanisms. The hardware architectural metrics can be applied iteratively during the design phase and are dependent on the whole hardware of the item. [1]

There are also two other evaluations, that check the safety goal violations due to random hardware failures [1]. These are complementary to the metrics and are also going to be described.

To proceed with the metric evaluation, the designer has to estimate the failure rate of the components in the architecture that are relevant to the analysis. The estimated failure rate for the hardware parts of the item shall be determined:

a) Using hardware failure rate data from a recognized industry source (e.g.: IEC/TR 62380, IEC 61709, MIL HDBK 217 F notice 2,…), or [1]

b) Using statistics based on field tests. The estimated failure rate should have an adequate confidence level, or [1]

c) Using expert judgment founded on an engineering approach based on quantitative or qualitative arguments. A structured criteria should be the base for this judgment [1]

3.4.1 Architectural Metrics

There are two metrics described in ISO 26262, Single-Point Fault Metric and Latent Fault Metric. These evaluate the system’s robustness, through the coverage from safety mechanisms. Higher metric values reflect safer systems, therefore low percentage of critical faults (single-point faults and residual/latent faults).

3.4.1.1. Calculating failure rates

It is also important to be aware of how the metric failure rates, single-point, residual and latent are calculated. The overall failure rate of the hardware element or component (safety-related) is: 𝜆 = 𝜆𝑆𝑃𝐹+ 𝜆𝑅𝐹+ 𝜆𝑀𝑃𝐹+ 𝜆𝑆 𝜆𝑆𝑃𝐹→ 𝑡𝑜𝑡𝑎𝑙 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 𝑟𝑎𝑡𝑒 𝑓𝑜𝑟 𝑠𝑖𝑛𝑔𝑙𝑒 𝑝𝑜𝑖𝑛𝑡 𝑓𝑎𝑢𝑙𝑡𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 𝜆𝑅𝐹 → 𝑡𝑜𝑡𝑎𝑙 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 𝑟𝑎𝑡𝑒 𝑓𝑜𝑟 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑓𝑎𝑢𝑙𝑡𝑠 𝜆𝑀𝑃𝐹→ 𝑡𝑜𝑡𝑎𝑙 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 𝑟𝑎𝑡𝑒 𝑓𝑜𝑟 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒 𝑝𝑜𝑖𝑛𝑡 𝑓𝑎𝑢𝑙𝑡𝑠 𝜆𝑆→ 𝑡𝑜𝑡𝑎𝑙 𝑓𝑎𝑖𝑙𝑢𝑟𝑒 𝑟𝑎𝑡𝑒 𝑓𝑜𝑟 𝑠𝑎𝑓𝑒 𝑓𝑎𝑢𝑙𝑡𝑠

Automating ISO 26262 Hardware Evaluation Methodologies

Faculdade de Engenharia da Universidade do Porto