Data-Driven approach to parametric configuration of industrial alarms

(1)

UNIVERSIDADEFEDERALDO RIO GRANDE DO NORTE

UNIVERSIDADEFEDERAL DORIOGRANDE DO NORTE

CENTRO DETECNOLOGIA

PROGRAMA DEPÓS-GRADUAÇÃO EMENGENHARIAELÉTRICA E DECOMPUTAÇÃO

Data-Driven Approach to Parametric

Configuration of Industrial Alarms

Yuri Thomas Pinheiro Nunes

Advisor: Prof. D.Sc. Luiz Affonso Henderson Guedes de Oliveira

Master’s Dissertation presented to Pro-grama de Pós-Graduação em Engenharia Elétrica e de Computação of UFRN (focus area: Computer Engineering) as part of the requirements to obtain the title of Master of Science.

Order Number of PPgEEC: D546

Natal, RN, January 2019

(2)

Nunes, Yuri Thomas Pinheiro.

Data-Driven approach to parametric configuration of

industrial alarms / Yuri Thomas Pinheiro Nunes. - Natal, 2019. 76 f.: il.

Dissertação (mestrado) - Universidade Federal do Rio Grande do Norte, Centro de Tecnologia, Programa de Pós-Graduação em Engenharia Elétrica e de Computação.

Orientador: Prof. Dr. Luiz Affonso Henderson Guedes de Oliveira.

1. Configuração de alarmes - Dissertação. 2. Alarmes

industriais - Dissertação. 3. Sintonia de alarme - Dissertação. 4. Automação industrial - Dissertação. 5. Sistemas de

gerenciamento de alarmes - Dissertação. I. Oliveira, Luiz Affonso Henderson Guedes de. II. Título.

RN/UF/BCZM CDU 681.5

Universidade Federal do Rio Grande do Norte - UFRN Sistema de Bibliotecas - SISBI

Catalogação de Publicação na Fonte. UFRN - Biblioteca Central Zila Mamede

(3)

(4)

(5)

Abstract

Industrial plants are composed of processes that add up to thousands of variables. To ensure safety and quality of operation, these processes are monitored and alarms are con-figured to indicate a possible malfunction. Among the most common problems associated with industrial alarms we can mention: high occurrence of false alarms, missed alarms and chattering alarms, operator overload and alarm flooding. These problems are related to the process of selection of the monitored variables, the techniques of activation and deactivation of alarms, among other characteristics of the process and the alarm system. This work focuses on defining an approach to configure efficient and significant alarms for the operator. The approach proposed here is inspired by the workflow of a data scien-tist who initially needs to identify the characteristics of the databases used to then apply transformations that make the data more suitable allowing the extraction of valuable infor-mation. Many times the scientist is interested in creating a model that describes the data or makes predictions possible. This is a very similar task of alarm configuration where it is necessary to select the relevant variables and to configure the settings of each alarm in order to classify the operation of the process as appropriate or not and to help identify the fault. The approach proposed here consists of four parts: description of data, selection of variables, tuning and performance evaluation. During the description step, relevant infor-mation about the data is obtained, such as the presence of events, the number of different events, the duration of events, etc. In the selection stage, the relevant variables for detec-tion of abnormalities are defined. The tuning of alarms is similar to a training process, where a model is built to describe the behavior of the data. Finally during the evaluation, the settings found are applied to a process history to asses whether the settings behave in a way that meets security and quality constraints. In order to validate the proposal, a case study for industrial alarm configuration was carried out using the Tennessee Eastman Process, which is a benchmark simulator widely used by the academic community.

Keywords: Alarm configuration, Industrial alarms, Alarm tuning, Industrial automa-tion, Alarm management system.

(6)

(7)

Resumo

As plantas industriais são compostas por processos que somam milhares de variáveis. Visando garantir segurança e qualidade à operação, esses processos são monitorados e alarmes são configurados de modo a indicar algum possível mau funcionamento. Dentre os problemas mais comuns associados aos alarmes industriais podemos citar: alta ocor-rência de falsos alarmes, alarmes perdidos e alarmes ruidosos, sobrecarga dos operadores e enxurrada de alarmes. Estes problemas estão relacionados com o processo de seleção das variáveis monitoradas, as técnicas de ativação e desativação de alarmes, entre outras características do processo e do sistema de alarme. Este trabalho tem como foco definir uma abordagem para configurar alarmes eficientes e significativos para o operador. A abordagem aqui proposta tem inspiração no fluxo de trabalho de um cientista de dados que inicialmente precisa identificar as características das bases de dados utilizadas para então aplicar transformações que tornem os dados mais adequados possibilitando a ex-tração de informações valiosas. Muitas vezes o cientista está interessado em criar um modelo que descreve os dados ou possibilita predições. Esta é uma tarefa muito similar a de configuração de alarmes onde é necessário selecionar as variáveis relevantes e definir as configurações de cada alarme de maneira a classificar o funcionamento do processo como adequado ou não e auxiliar a identificar a falha. A abordagem proposta aqui é constituída de quatro partes: descrição dos dados, seleção de variáveis, sintonia e avali-ação de desempenho. Durante a etapa de descrição são obtidas informações relevantes sobre os dados como presença de eventos, quantidade de eventos distintos, duração de eventos, etc. Na etapa de seleção são definidas as variáveis relevantes para detecção de anormalidades. A sintonia dos alarmes é similar a um processo de treinamento, onde um modelo é construído para descrever o comportamento dos dados. Finalmente durante a avaliação, as configurações encontradas são aplicadas a um histórico do processo para aferir se as configurações se comportam de maneira a atender restrições de segurança e qualidade. Como estudo de caso, uma configuração de alarmes industriais foi obtida para o Tennessee Eastman Process que é um simulador amplamente usado pela comunidade acadêmica.

Palavras-chave: Configuração de Alarmes, Alarmes Industriais, Sintonia de Alarme, Automação industrial, Sistemas de gerenciamento de alarmes.

(8)

(9)

List of Figures

2.1 General alarm classifier . . . 6

2.2 Alarm signal definitions . . . 7

3.1 Probability distribution of x in normal conditions, p(x), and under distur-bance, q(x) . . . 11

4.1 Illustration of approach steps . . . 16

4.2 Process data with small abnormal to normal ratio . . . 18

4.3 Process data with large abnormal to normal ratio . . . 19

4.4 Process state clustering with simulated data . . . 20

4.5 Simulated example of process data with two typical abnormal states . . . 21

4.6 Data sections for normal and abnormal states in simulated data . . . 22

4.7 Simulated process with alarm alood . . . 23

4.8 Data sections for chattering regions from simulated process . . . 23

4.9 Example of overlapping data sections . . . 24

4.10 Simulated example of run length distributions . . . 30

5.1 Process trends under disturbance . . . 37

5.2 Full data from all process variables from process data . . . 40

5.3 Constrained process variables . . . 41

5.4 Process profile for abnormality analysis . . . 42

5.5 Split point on abnormality correction . . . 43

5.6 Split point on abnormality correction . . . 44

5.7 Process variable XMEAS01 distribution according to each abnormality . 46 5.8 Process variable XMEAS10 distribution according to each abnormality . 46 5.9 Process variable XMEAS20 distribution according to each abnormality . 46 5.10 Process variable XMEAS21 distribution according to each abnormality . 46 5.11 Process variable XMEAS22 distribution according to each abnormality . 47 5.12 Alarm per day using obtained alarm configuration . . . 48

5.13 Alarm per day using new alarm configuration . . . 50

5.14 Alarm per day using refereed alarm configuration . . . 52

A.1 Tennessee Eastman Process Diagram . . . 61

(12)

(13)

List of Tables

4.1 Permutation importance simulated results . . . 25

4.2 ISA standard performance metrics . . . 27

5.1 Tennessee Eastman Process variables . . . 35

5.2 Tennessee Eastman Process operating constraints . . . 36

5.3 Tennessee Eastman Process disturbances . . . 38

5.4 Duration State Distribution . . . 38

5.5 Normal sections . . . 39

5.6 Abnormal sections . . . 41

5.7 Active disturbance and returning to normal sections . . . 42

5.8 Alarm configurations . . . 47

5.9 Standard metrics for alarm configuration . . . 48

5.10 Academic metrics for alarm configuration . . . 49

5.11 Final alarm configuration . . . 49

5.12 Standard metrics for new alarm configuration . . . 50

5.13 Academic metrics for new alarm configuration . . . 51

(14)

(15)

List of Symbols and Abbreviations

AAD Average Alarm Delay

FAR False Alarm Rate

MAR Missed Alarm Rate

D_O Overall Average Alarm Delay

OFAR Overall False Alarm Rate

OMAR Overall Missed Alarm Rate

µ_X Sample mean of process variable x(t)

ψ Chattering Index

σX Sample standard deviation of process variable x(t)

t0 The instant of onset of an abnormality

t_a The instant of the announcing of the alarm

x_H High threshold of normal region of process variable x(t)

xL Low threshold of normal region of process variable x(t)

x_Hdb High deadband of normal region of process variable x(t)

x_Ldb Low deadband of normal region of process variable x(t)

ANR Normal Time to Abnormal Time Ratio

TEP Tennessee Eastman Process

(16)

(17)

Chapter 1 Introduction

The industry processes constantly generate data regarding their operations. These data demand adequate monitoring and storage politics. Inside this context, there are the industrial alarm data, which assist the operation for security and efficiency maintenance. Industrial alarm data are used to inform a human operator, in real time, about the occur-rence of any abnormality in the current state of process operation. However, many factors can make it difficult for human action to deal with intrusions, such as improper configu-ration, alarm floods, nuisance alarms, false alarms, and so on. One of the critical aspects for a good alarm configuration is exactly the determinations of its attributes.

1.1 Contextualization

Large industrial processes are subjected to failures in their thousands of components at any time during their operation, which can lead to non-scheduled stops, product quality impairment, equipment damage or accidents. It is estimated that only in the United States of America accidents cost around 20 billions of dollars per year, with production losses, equipment damage, and environmental penalties. Hence, the use of alarm management systems with the purpose of monitoring the operation of processes is of extreme impor-tance to enable the rapid identification of the abnormality, allowing to avoid or mitigate the consequences of these failures (Consortium n.d.).

In this context, the alarm plays a central role, since it has the function of notify-ing the industrial process operators when an abnormality occurs. An alarm is defined as an announcement to the operator, initiated by a malfunctioning condition of equip-ment system, process deviation or abnormal condition that requires an operational action (ANSI/ISA 2009).

Nevertheless, in the last decades the adoption of control and supervision digital sys-tems, and because of the lack of a formal methodology to manage alarms, the alarm systems have an accelerated disbelieve against the high quantity of false alarms and un-necessary alarm reported to the control rooms (EEMUA 2007, Habibi & Hollifield 2006). This problem became more evident in the early 1990s. At that time, some industrial accidents occurred and the inefficiency in diagnosing abnormal situations from alarm sys-tems became clear (Bransby 1998). One of the main motivations was the Texaco Refinery explosion in Milford Haven, Wales in 1994. Investigations pointed to several

(18)

deficien-2 CHAPTER 1. INTRODUCTION

cies in the alarm system. After this episode, the activity of alarm management began to gain relevance in the industrial and scientific community generating diverse standards and good practices guides such as EEMUA 191, ANSI/ISA 18.2 and IEC 62682 (Bransby & Jenkinson 1998).

Among several deficiencies of these systems and scientific problems still open, the correct configuration of alarm attributes is one of the more relevance (Wang et al. 2016). Industrial alarms have parameters to be well-configured, however this is not a trivial task. Historically, these parameters are defined by engineering teams based on experience and knowledge about the plant operation. However, the plants are increasingly complex and with more restrictive operating regions. Thus, the empirical method is inefficient and highly dependent on the experience and sensitivity of the team.

Consequently, the lack of a correct adjustment of the alarm attributes is one of the main causes of alarm overloading to the operators (Wang et al. 2016). Several approaches have been proposed in recent years in order to find a theoretically based way of properly defining thresholds for industrial alarms.

1.2 Dissertation Objectives

There are in the literature methods to define alarms attributes based on process data. These methods define attributes for an alarm individually, in other words, the algorithm define the attributes for an alarm isolated from others. Although, the industrial plants have thousands of configured alarm. In this dissertation an approach to define the alarm attribute for all alarms in the alarm system based on data is proposed. The alarms are configured systematically considering if an attribute will improve the performance, other wise it would not be defined.

The proposed approach is in essence data driven, therefore, prior knowledge on the process is not mandatory. If the data itself express the real nature of the process any prior information on the process will speed up process. However, there are situations where prior knowledge is essential, for example, indication of which variables must be considered in order to assure safety.

To select the data by its relevance the there are two steps. The first is focused on describe the data by sectioning the process data by the present events. The second select the variables that could be used to describe the process state. These two steps provide the base for the next.

A step to define the alarm attributes themselves using well known algorithms from literature is essential to the approach. During this step the process variable are analyzed to define which attributes should be defined and which algorithms to used. The resulting alarm configurations should be validated by a process expert.

Assure quality to the alarm system is the main goal for any method to define alarm attributes. Therefore, the approach will present a step to measure the performance accord-ing to standards and academic metrics. This will be the final step for the approach.

In order to validate the proposed approach, a case study will carry out using the Ten-nessee Eastman Process (TEP) simulator, that is a well-established benchmark for evalu-ating of alarm system performance.

(19)

1.3. DOCUMENT STRUCTURE 3

1.3 Document Structure

An overview on the industrial alarm subject is presented in Chapter 2. There are discussed terms and definitions that will be foundation of this work. The chapter is based on international standards and academic literature.

Related works on methods to specify alarm attributes are discussed in Chapter 3. The methods that will be used to define the alarm attributes are detailed.

A data driven approach is proposed in Chapter 4 considering the foundation and re-lated work. Initially, the value of prior information is presented. The following sections describes each steps and main goals.

A case study will carry out to validate the step of the approach in Chapter 5. The discussion on the benchmark used and the case study characteristics are presented. All the results obtained in each step of proposed approach is compared with a previous work. Finally, the work is concluded in Chapter 6. A discussion on the main characteristics of the proposed approach and the principle results are presented. The principal contribu-tions are highlighted as proof of the value. Future works related to the proposed approach are discussed in the last section.

(20)

(21)

Chapter 2 Industrial Alarms

The relevance of alarm management has led to the creation of good practices and stan-dards for the standardization of alarm systems. These practices and stanstan-dards serve as a guide in the definition of what is an alarm, a threshold, deadband, chattering alarm, among others. Examples of these standards and manuals are the ANSI/ISA (2009), EEMUA (2007) and the Petrobras N-2900.

The process of setting up industrial alarms has been gaining a place in the academic scenario and relevance within the industry. This chapter will define what an alarm is and ways to manipulate it. In the end, performance metrics of the alarm systems will be discussed to support the analysis process.

2.1 Alarms

An alarm is an audible and/or visual means of indicating an equipment malfunction, process deviation or abnormal condition for the operator requiring a response, in an ad-equate time, for actuation and subsequent reversal of the situation (ANSI/ISA 2009). Hence, the alarm signal can be considered a continuous time binary signal where the state 0 is associated with normal operation and the state 1 to an abnormal state. For simplicity, in this work, the alarm signal is often considered as a digital signal.

An alarm is a classifier to determine whether a certain monitored industrial process is in an abnormal situation or not. In figure 2.1 a general diagram for this problem. The alarm is the box in the center with the process data as input and also the last values of alarm state. The unique output is the current alarm state.

A simplified model considers a single instant of x(t). Another simplification com-monly found is a classifier of the type “major (minor) than”. The activation of the alarm, given these two simplifications, is an alarm using the threshold attribute only.

2.1.1 Mathematical Definitions

As already mentioned, an alarm signal may be viewed as a digital signal. Considering a high alarm, the alarm signal can be defined by Equation 2.1 where xa(t) is the alarm

(22)

6 CHAPTER 2. INDUSTRIAL ALARMS Alarm z⁻¹ z⁻n x(t) x(t-n) xa(t)

Figure 2.1: General alarm classifier

signal has its instantaneous value based on the current value of the process variable and the alarm threshold, xt p. This work will refer to this type of signal as a alarm signal.

x_a(t) =1, x(t) ≥ xtp

0, x(t) < xtp (2.1)

However, in some cases, it is interesting to maintain precise control of the instants in which the system changes from an operational mode considered normal to an abnormal mode of operation. In this case, the Equation 2.2 is used. Here, the state 1 means that the alarm was triggered. In a situation where the process variable, x(t) crosses the alarm threshold, xt p, the signal transits from the state 0 to the state 1. At the next instant, the

signal returns to the state 0, even though the process maintains abnormal condition. This signal will be referred to as an alarm activation signal, in this work.

x_a(t) =1, x(t) ≥ xtp and x(t − 1) < xtp

0, otherwise (2.2)

In the previous model, the instants in which the process returns to the operation are disregarded. Thus, it is not possible to extract relevant information, for example, the du-ration of alarms, the period in abnormal opedu-ration, the period in normal opedu-ration, among others. Taking this into account, a ternary signal can be defined to represent activation (1), deactivation (-1) and maintenance of the previous state (0). This signal can be obtained through the equation 2.3 and will be referred to as the alarm transition signal.

x_a(t) =    1, x(t) ≥ xtpand x(t − 1) < xtp −1, x(t) ≤ xtpand x(t − 1) > xtp 0, otherwise (2.3)

Figure 2.2 presents an example of a process variable and of alarm signals generated from the definitions of Equations 2.1, 2.2 and 2.3. Figure 2.2a contains a sample of a

(23)

2.1. ALARMS 7

(a) Process variable (b) Alarm signal

(c) Alarm activation signal (d) Alarm transition signal Figure 2.2: Alarm signal definitions

process variable in blue color and the threshold for a high alarm in green color. Figure 2.2b it is the alarm signal generated by the application of Equation 2.1 to the process variable of Figure 2.2a. Figure 2.2c it is the alarm signal generated by the application of Equation 2.2 to the process variable of Figure 2.2a. Figure 2.2d it is the alarm signal generated by the application of Equation 2.3 to the process variable of Figure 2.2a. It is important to note that the representations of the alarm transition signal and the alarm signal are easily switchable. This is not true when dealing with the alarm activation signal, since it is not possible to represent the alarm clearance.

Generally the alarm signals will be treated according to the definition of Equation 2.1 during processing and at certain stages of the generation process.

2.1.2 Standard Alarm definitions

In this section, definitions related with alarm are present according to ANSI/ISA (2009). These definitions are presented to support the work.

Alarm System

The collection of hardware and software that detects an alarm state, communicates the indication of that state to the operator, and records changes in the alarm state.

(24)

8 CHAPTER 2. INDUSTRIAL ALARMS

Alarm management

The processes and practices used for determining, documenting, designing, operating, monitoring, and maintaining alarm systems.

Allowable Response Time

The maximum time between the annunciation of the alarm and the time the operator must take corrective action to avoid the consequence.

Alarm Clear

An alternate description of the state of an alarm that has transitioned to the normal state.

Alarm Message

A text string displayed with the alarm indication that provides additional information to the operator (e.g., operator action).

Chattering Alarm

Chattering Alarm is an alarm that repeatedly transitions between the alarm state and the normal state in a short period of time. Fleeting alarms are short-duration alarms that do not immediately repeat. In both cases, the transition is not due to the result of operator action

Alarm Suppression

Any mechanism to prevent the indication of the alarm to the operator when the base alarm condition is present (i.e., shelving, suppressed by design, out-of-service).

(25)

Chapter 3 Related Works

Tuning or configuration of alarms is a procedure to set parameters for detection of abnormalities in the process for the purpose of notification of operators. In recent years several data-based techniques have been proposed for the definition of attributes for in-dustrial alarms. The simplest approach is to detect outliers of process variables. The outlier limits are used to delimit regions of normal and abnormal operation. An example of this approach is the 3σ method. The use of probabilistic methods are more sophisti-cated approaches that aim to mathematically define regions of normality and abnormality based on estimates of probability density functions.

Analyzing the process variables in isolation may lead the alarm system to perform poorly. This is due to not considering the relationships between process variables due to conservation laws, and the nature or physical connections of the industrial plant. In this way, a group of algorithms for alarm configuration considers properties found in sets of variables, for example, correlation and delay of occurrence.

An important tool for several of these methods is optimization. Using information extracted from the data is possible to optimize the alarm attributes. The Correlation Con-sistency and Performance Optimization design use this information to set alarm threshold solving an optimization problem.

3.1 Implemented Algorithms

This sections presents the methods implemented in this dissertation to define alarm at-tributes. They are outlier detection, performance optimization and correlation c2onsistency. All these methods define the attributes for one alarm per process variable, therefore, they are considered univariate algorithms. However, the Correlation consistency considers the correlation information of all the variables to define the alarm threshold.

Considering the characteristics of the algorithms, it is possible to define thresholds and deadband using Outlier Detection and Performance Optimization. While the Correlation Consistency is limited to define only the threshold. These methods will be used in the approach to define the desired attributes.

(26)

10 CHAPTER 3. RELATED WORKS

3.1.1 Outlier Detection

To determine alarm thresholds it is necessary to identify limit values for the process variable that represent deviation in the normal operation of the industrial plant. For this purpose, a statistical sampling method based on a normal process operation history is of-ten used. This normal operating sample is used to generate a sample mean, µX, and a

sample standard deviation, σX. Assuming a normal distribution for the values of the

pro-cess variable under analysis, x(t), within the selected history, we can define the thresholds by the region containing 99% of the possible values in normal operation. Thus, these thresholds can be defined by equation

l= µX± 3σX (3.1)

where µX is the sample mean and σX is the sample standard deviation of the process

variable x(t) in the normal operating region.

From the equation (3.1), a region of normality can be defined by the inequality (3.2). Thus, the limits for a process variable x(t) can be defined by the equations in (3.3), where x_L is the lower limit and xH is the upper limit.

µx− 3σx< x(t) < µx+ 3σx (3.2)

x_L= µ_x− 3σx

xH= µx+ 3σx (3.3)

In order to define a deadband a more strict threshold for limit the normal region should be selected. When the process variable is under this strict threshold the process can con-sidered normal with higher confidence. The suggested deadband assuming a normal dis-tribution is 2σ, which considers 95% of the process variables as normal. The set of equa-tions (3.4) defines the deadband in terms of µX and σX.

x_Ldb= µx− 2σx

x_Hdb= µx+ 2σx

(3.4)

3.1.2 Performance Optimization

The main goal of the Performance Optimization is guarantee that the system achieve a specified performance. For this reason a set of estimated performance metrics are cal-culated from the process variable and a hypothetical alarm attributes. For example false alarm probability (FAP), missed alarm probability (MAP) and average alarm delay (AAD), which will be discussed later, are important metrics for evaluating the performance of alarm systems. The alarm thresholds directly affect the result of these metrics, so it pos-sible to set the thresholds to meet FAP, MAP, and AAD constraints(Xu et al. 2012).

However, to calculate these metrics it is necessary to know the probability density functions for the normal operating region, p(x), and for the abnormal region, q(x). With the pdfs is possible to estimate the values of the metrics based on the alarm attributes. So

(27)

3.1. IMPLEMENTED ALGORITHMS 11

there is no necessity of process all the alarm data to obtain these metrics.

From the density functions it is possible to extract boundaries for the thresholds through the equations (3.7), (3.8), (3.11) and (3.12) in order to meet performance con-straints. The problem becomes a minimization within the constrained region. For the minimization problem a weighted sum loss function is defined in the equation (3.5) (Xu et al. 2012). This equation is an example of a cost function for the optimization problem.

J= ω1 FAP RFAP+ ω2 MAP RMAP+ ω3 AAD RAAD (3.5)

where RFAP, RMAP and RAAD are the desired values for FAP, MAP and AAD are the estimated performance.

A simpler objective function considering only the FAP and MAP based on ROC anal-ysis (Izadi et al. 2009):

J=pFAP2+ MAP2 (3.6)

Performance Metrics

In the creation of an alarm, it is necessary to define a limit (threshold). This limit aims to create a separation zone between the normal and abnormal region of the process operation (Naghoosi et al. 2011). Thus, consider a process variable x, its probability distribution in normal operation (p(x)) and its distribution at period of disturbance (q(x)).

15

10

5

0

5

10

15 x

0.00

0.05

0.10

0.15

0.20

0.25 p(

x)

an

d

q(

x)

p(x)

q(x)

x

tp

Figure 3.1: Probability distribution of x in normal conditions, p(x), and under disturbance, q(x)

By the analysis of Figure 3.1 it is possible to perceive that the regions of disturbance and normality can overlap. Hence, there is a probability of the alarm be triggered within the normal region, as well as the probability of not sounding even under disturbance.

(28)

indexes are intended to measure the probability of such unwanted or missing alarms occur. These indexes are defined in the following equations, considering only the threshold:

FAP(xtp) = Z ∞ xt p p(x)dx (3.7) MAP(xtp) = Z xt p −∞ q(x)dx (3.8)

Considering the deadband attributes, the estimation of FAP and MAP are described in equations (3.9) and (3.10) (Izadi et al. 2009). Both definitions of FAP and MAP in the Equations (3.9) and (3.10) are dependent of the threshold, xt p, and deadband, xdb.

FAP(xtp, xdb) = R∞ xtpp(x)dx R∞ xtpp(x)dx + Rx_db −∞p(x)dx (3.9) MAP(xtp, xdb) = Rxtp −∞q(x)dx Rxtp −∞q(x)dx + R∞ xdbq(x)dx (3.10)

In addition to the FAP and MAP indexes, the Average Alarm Delay (AAD) is another important performance index of alarm systems (Xu et al. 2012). Which is designed to measure the delay time between the onset of an abnormality and its presentation by the alarm system. For its calculation, consider the time instant t0 as being the exact instant

where the process begins an abnormal behavior. Also consider taas the instant where the

alarm was actually triggered. The difference between taand t0is called alarm delay (Td),

that is:

Td= ta− t0 (3.11)

As a consequence, the AAD can be defined as the expected value of Td, that is:

AAD= Td = E(Td) (3.12)

An estimation based on abnormal probability density function is present in Equation (3.13) (Xu et al. 2012):

AAD(xtp) = h

MAP(x_tp) 1 − MAP(xtp)

(3.13)

where h is the sampling period for the process variable x(t).

A estimation for AAD considering deadband is not defined in the literature. Therefore, the cost function from equation (3.6) will be considered when defining threshold and deadband together.

3.1.3 Correlation Consistency

Within an industrial plant, thousands of components are physically interconnected and related by plant process properties. This can be observed when correlation measures are

(29)

3.1. IMPLEMENTED ALGORITHMS 13

extracted from process data. In a configuration model where each process variable is analyzed individually, the alarm data may lose this information. In order to preserve this relationship, an approach for the optimization of alarm thresholds based on consistency between correlations was explored in Han et al. (2016), Yang et al. (2010), Gao et al. (2016) and Gao et al. (2015).

This method aims to generate data of alarms consistent with the data of processes with respect to the measurement of correlation. For that is defined an optimization problem for the following objective function in Equation (3.14) which minimizes the difference between correlation of alarm data and correlation of process data. In this case, xtp is

the threshold vector for process variables, rx_ia,x_ja is the correlation between alarms of the

variables xiand xj, and rxi,xj is the correlation between the process variables xiand xj.

J(xtp) = m−1

∑

i=1 m

∑

j=i+1 |rx_ia,x_ja− rxi,xj| (3.14)

The correlation between the process variables is calculated through the Pearson cor-relation, via equation (3.15). The process data was used with a sampling approach.

r_x,y= cov(x, y) σxσy

= E(xy) − E(x)E(y) σxσy

(3.15)

For the calculation of the correlation of the alarm variables, an analytic approach is more appropriate. The calculation will be performed using the equation (3.16), which uses the probabilities of occurrence of alarm given a threshold.

r_x_a_,y_a = P(x > x0, y > y0) − Px0Py0 q Px0− P 2 x0− q Py0− P 2 y0 (3.16)

The terms Px0 and Py0 are the probabilities of variables x and y are in an abnormal

state. This probability is calculated by the integral of the probability density function of the process variable, as in the equation (3.17).

P_x₀ = P(x > x0) =

Z ∞

x0

f(x)dx (3.17)

The probability density function, f (x), is estimated using the process variable data x and x0section the x domain in normal when x < x0and abnormal when x > x0, considering

the tuning of a high alarm.

The joint probability density function used to calculate the probability of the variables xand y are simultaneously in an abnormal state is used as in the following equation: (3.18)

P(x > x0, y > y0) = Z ∞ x0 Z ∞ y0 f(x, y)dydx (3.18)

All the density functions can be estimated from the process variables and enable the correlation calculation of the alarm data, without having to be regulated for each new set

(30)

of thresholds in the process. optimization.

This method has the limitation of optimizing only one alarm threshold for each pro-cess variable. For setting the alarm variable type, the threshold will be compared to the expected value of the normal operating region, when the threshold is higher than the ex-pected value, this alarm will be of the type high, otherwise it will be of type low.

3.2 Comparison

In this chapter, three methods for setting alarm attributes were described: 3σ, correla-tion consistency and performance optimizacorrela-tion. Although these methods are data-driven, each has its own approach. The 3σ method is a method that evaluates a single variable at a time. However, in order to maintain consistency between the correlation in the pro-cess data and the correlation between the alarm data, an analysis of at least two variables is necessary, making a multivariate analysis. Different from the above approaches, the performance optimization based project is a probabilistic approach for the definition of attributes considering each variable separately.

In order to analyze the performance of the alarm tuning methods, a comparison was made considering the definition of the alarm threshold (Nunes et al. 2018). The Tennessee Eastman Process, which is a benchmark in the area of control and detection of industrial process failures was the benchmark used. The performance indicators of the methods for this comparison are FAR, MAR and AAD measurements, which will be calculated from generated alarm histories and discussed later.

The comparison indicated that each method improves a specific metric. The 3σ re-duced significantly the false alarm rate for the generated alarm. While the correlation has a lower missed alarm in order to produce higher correlation similarity. Finally, the perfor-mance optimization guaranteed the lowest average alarm delay. Although each methods presented superior performance mainly on one metric, the performance optimization met-rics are not distant from the best values.

The Performance Optimization in this work indicates that an alarm threshold and deadband will be configured for a given process variable. In case 3σ is choose, consider the configuration only for threshold. Different use of these methods will be explicitly proposed.

(31)

Chapter 4 Proposed Approach

The problems associated with alarm configuration are many and can be related with any part of the alarm management. An incorrect setting for an alarm threshold may pro-duce false alarms or even not detect an abnormality, causing a missed alarm. The non application of an alarm deadband or delay timers to a noisy process variable produces several unwanted alarms in form of chattering and fleeting alarms. In an industrial pro-cess where thousands of variables are monitored continuously, it is important to correctly select relevant variables for alarming. Failing at selecting a good set of process variables for alarming may result in an alarm flood, operator overload, complicate the diagnosis process, for example.

Currently, the alarm configuration is made according to the by engineering teams. This work proposes a approach to data-driven configure the alarms. To adequately con-figure the alarms is important to know the data and necessities of the process. It is impor-tant to know if a process variable is suitable to have an alarm, then if the alarm needs a deadband or delay timer. Later it is important to guarantee the alarms are not redundant, the configuration assists the operator to take decisions, etc. This approach aims to assist experts to manage the alarms.

In order to progress in the approach, it is important to know as much as possible about the data before doing any kind of alarm configuration or reconfiguration. But it is common to not have the needed information beforehand. At this stage some questions like: What is the ratio of normal and abnormal time? How many abnormalities occurred? At which points the data must be split to separate normality and abnormality? In order to address questions like these a Data Description step is important. With the answers to these questions and possibly new insights further step are easier, for example, knowing the split points for the normality and abnormality is possible to analyze which variables are significantly affected by the present disturbances.

At the point where the data is well-known is important to select the relevant pieces. These relevant data is related with variables that are affected by the disturbances, the windows where the disturbances are present, some sample of normal operation, etc. But these are not the only concerns that must be taken. It is important that the resulting alarm configuration to be clear and help the operator to make decision. In that regard, a selection of the variable that can be used to distinguish the different abnormalities is also very important during the Variable Selection step.

(32)

indus-16 CHAPTER 4. PROPOSED APPROACH

trial process. In order to do this, information on the data was gathered and the relevant data selected. Then, the Alarm Strategy Setting is made to define which types of alarm will be define to each variable and its correspondent configuration. In this step, it is decided the use of a threshold only or to use a deadband along and also if the alarm is an alarm of type high or low. Then the parameters of each alarm is defined and it is implemented in the alarm system.

Until here, the data were described, the variables are selected, the strategies tuned. But how good these settings perform? It is important to measure the performance of the configuration based on relevant metrics. The new alarm has to guarantee enough time for response, detection as soon as possible, does not raise false alarm, does not overload operators, be meaningful for diagnosis and captures the effects of operator actions. All these characteristics must be measured to define the quality of the alarm configuration and guarantee the proper working of the alarm system.

Finally, the alarm configuration can be implemented. Using a data-driven approach it is possible to reduce the need of expert knowledge on the process and making the process of obtaining information more efficient. The proposed approach has the following steps represented in Figure 4.1 by the rounded dashed squares and with indications of the desired information to be obtained. At the final step of the approach is possible to return to any previous step depending of the resulting performance.

Data Description Data Sections Disturbance Information Process Statistics Variable Selection Selected Variables Priority Levels Alarm Conﬁguration Alarms Conﬁgurations Performance Evaluation Standards metrics Academic metrics Process Data

Figure 4.1: Illustration of approach steps

The following sections described what are done in each step of the proposed approach and how the outputs will affect the following step. Firstly, in 4.1 the influence and im-portance of previous information on the data will be discussed. Starting with the Data Description in section 4.2, its focus will be gather information about the process and the data itself. The following, 4.3 discusses how to select the variables and the data to be used for alarm configuration. The strategy and configurations on the alarms are discussed in 4.4 and the performance evaluation in 4.5. Finally, the final considerations on the proposed approach are presented in 4.6.

(33)

4.1. PREVIOUS INFORMATION 17

4.1 Previous Information

The proposed approach aims to define the alarm configurations for an alarm system in a data-driven manner. In each step the data is analyzed and new insights are obtained. But there is information about the process that is not present in the data, e.g., the oper-ation limits. These knowledge should be provided along the process data history. Other information that improves the alarm configuration process is listed below:

• Process variable target operation limits • Process variable trip point

• Process variable normal operation limits • Previous alarm configuration

• Alarm data history

• Previous performance evaluation

These previous information on the process and alarm system will be useful to the proposed approach. The use of the limits are helpful to define data sections that are discussed during the Data Description. With the Alarm data history is possible to define data sections of high alarm occurrence, alarm flood, etc. Also the Alarm Configuration will use the limits to define regions where the alarm attributes may be. The previous alarm configuration is useful as a start point for the Alarm Configuration.

With previous information on the process is possible to define new configurations more accurately. Although the previous information is not essential and some can be estimated. For example, the target point could be estimated from a full history of the process as the most common values for each variable.

There are some situations where previous information is indispensable. In process where the trip conditions are close to the target operating points this information is needed to ensure safety. Even considering the supervision of an expert to implement the alarm configurations.

The previous information is mandatory when the data do not express the events in a way that they are distinguishable. For example, when the process has multiple operation points that changes to meet a specific demand. Considering the data itself, there is no way of clearly specify the different operating modes and the different abnormalities.

In the next section the Data Description will be discussed with simulated examples. Mainly, the following sections will not consider the previous information, unless explic-itly mentioned.

4.2 Data Description

The Data Description step is important to provide information and possibly new in-sights to help in alarm configuration. Going through this step the further step are easier, for example, knowing the split points for the normality and abnormality is possible to analyze which variables are significantly affected by the present disturbances.

As introduced earlier the Data Description step is mainly focused in gather infor-mation that characterizes the data. There are more amount of inforinfor-mation that can be

(34)

18 CHAPTER 4. PROPOSED APPROACH

gathered like the ratio of normal and abnormal time or the number of abnormalities. The use of algorithms to found information is the ideal scenario but the information can be gathered manually by inspection.

4.2.1 Abnormal to Normal Ration

It is expected that the process has an unbalanced history in regard of the abnormal to normal ratio (ANR). A good process shows a history with short periods of abnormality and long periods of normality. In Equation (4.1) describes the ANR where Abtime is the

portion of time that the process is in abnormal condition and Ntime is the portion of time

the process is in normal condition.

ANR= Abtime Ntime

(4.1)

But it is also possible to have a cut of history with a short period of normality to only analyze the selected abnormality. Figure 4.2 illustrates a process ran of a 100 hours with roughly 5 hours of abnormality. The ANR for this process data is approximately 0.0.

00:00

20:00

40:00

60:00

80:00

100:00

Time (HH:MM)

2.5

0.0

2.5

5.0

7.5 Pr

oc

es

s V

ar

iab

le

x(

t)

Process Data

Figure 4.2: Process data with small abnormal to normal ratio

In figure 4.3 is presented a cut of 10 hours from the process in 4.2. Where all the abnormality is captured and an equal size of the normal process was also captured. An ANR of 1 is obtained for this cut in the process data.

The information of the ANR is relevant to create confidence about the behaviour of the process. While both process data show information on the abnormality, the data in figure 4.2 brought excessively more information on the normality that may not be really useful and the unbalanced nature may interfere in some algorithms. On the other hand, the process data in figure 4.3 may not have enough information on normality to fully characterize the region or to classify as normality without previous information. The definition of ANR it is helpful for define how much data of normality and abnormality may be relevant and to evaluate if the alarm configuration will produce a similar ANR considering the alarm occurrences.

(35)

4.2. DATA DESCRIPTION 19

47:30

50:00

52:30

55:00

57:30

Time (HH:MM)

2.5

0.0

2.5

5.0

7.5 Pr

oc

es

s V

ar

iab

le

x(

t)

Figure 4.3: Process data with large abnormal to normal ratio

4.2.2 Abnormality Description

The data may have many occurrences of distinct abnormalities with distinct properties. For this reason an important point in the Data Description is to properly describe the present abnormalities. Some relevant information on the abnormalities is listed below.

• Number of abnormalities occurrences • Number of distinct abnormalities • Average time of each abnormality • Set of variables affected

The number of distinct abnormalities assists the evaluation process in regard the mean-ingfulness of the proposed configuration. Because the alarm sets that may raise must give information on the kind of abnormality that occurs. Since no expert knowledge is needed in the approach for defining root cause or the nature of the disturbance only the number is important. This information is hard to detect by algorithms but at simple situations a clustering may help to create a visualization like in figure 4.4. The algorithm used for this example was DBSCAN from Scikit-learn (Pedregosa et al. 2011). Mainly the information of number of distinct abnormalities is obtained by visual inspection.

It is possible to estimate some aspects of the process using the number of occurrences of abnormalities. For example, a probability of occurrence of an abnormality can be calculated from the number of the occurrences and help to prioritize the variables to be selected. The probability follows the equation 4.2.

P(ai|AB) =

|ai|

|AB| (4.2)

Where the aiis the event of the abnormality i occurs and AB is the event of any

abnormal-ity occurs. The number of occurrences of the abnormalabnormal-ity i in the data is defined as |ai|

and |AB| is the number of all occurrences of any abnormality on the data.

The average time can exclude one abnormality from the alarm system for not give enough time response for the operator. For example, a variable that not give enough

(36)

20 CHAPTER 4. PROPOSED APPROACH 0 5 10 x1 (t) 0 10 x2 (t) 00:00 09:00 18:00 27:00 36:00 45:00 54:00 Time (HH:MM) 0 10 x3 (t) Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5

Figure 4.4: Process state clustering with simulated data

allowable response time for the operator should not has an alarm. This situation does not give the operator enough time to take corrective action. In this situation another safety measure must be placed.

In process where the control loop can compensate an abnormality occurrence also has a low average abnormal time. For that reason, the affected variables for that abnormality should not have an alarm for that abnormality.

Each abnormality is distinguished by the way it affects the process. The effects of the abnormality in a process are reflected by the behaviour of the variables. It is expected that the detectable effect on the variables distinguish each abnormality. The set of affected variables for each abnormality is valuable to help to classify the abnormalities. It also may influence in the variable selection where the set of variables must be as small as possible without causing the system to fail in detecting any disturbance.

A simulated example is provided to exemplify the behaviour of a process in presence of different abnormalities. The set of equations (4.3) shows how a fictional process is defined where the notation X ∼ N (µ, σ2) indicates a random generation based on a normal distribution, N , with mean µ and standard deviation σ. There two abnormalities present in the simulated example: abnormality 1 and abnormality 2. In abnormality 1, x1 has

a mean change and the change affects x3. While in abnormality 2, the variables x1 and

x2 have mean change but x2 also has a variance change. The effects of the variables x1

and x2 have the respective lags of 6 minutes, l1, and 12 minutes, l2, on x3 simulating a

phenomenon of transport delay in simulated example.

x₁(t) =    X∼ N (0, 1) , if normal X∼ N (8, 1) , if abnormality 1 X ∼ N (18, 1) , if abnormality 2 x₂(t) = X ∼ N (0, 1) , if normal X ∼ N (12, 3) , if abnormality 2 x3(t) = 0.8x1(t − l1) − 0.9x2(t − l2) (4.3)

(37)

In Figure 4.5 is possible to visualize that the variables behave differently on the oc-currence of different abnormalities. The effects of both abnormalities are clearly distinct and can easily be classified as two different abnormalities when the context of the whole process is taken into account.

0

5 x

1

(t)

0

20 x

2

(t)

00:00

04:00

08:00

12:00

16:00

20:00

24:00

Time (HH:MM)

10

0

10 x

3

(t)

Figure 4.5: Simulated example of process data with two typical abnormal states

4.2.3 Data Sections

Considering the value on the ANR and abnormality description is important to define data section with that represent the discussed information. For example, many tuning algorithms use the notion of normal and abnormal state of the process. There are three needed aspects to define a data section:

• Begin • Duration • Feature

The begin indicates when a specific feature is present in the process data. Every data section has a length or duration that indicates how many time the present feature was captured in the data. A data section also must has a feature which defines what is expected to be found on it.

For a section to be relevant the data must present a feature that would assist the alarm configuration. The more basic data sectioning would be separate the data in normal and abnormal sections. In Figure 4.6 is presented the data of a process split on normal section in green and abnormal sections in red. The data is the same from figure 4.3 where both data sections have the duration of 5 hours with the abnormal section starting at 47:30 (forty seven hours and thirty minutes) and the normal section starting at 52:30 (fifty two hours and thirty minutes).

(38)

47:30

50:00

52:30

55:00

57:30

Time (HH:MM)

2.5

0.0

2.5

5.0

7.5 Pr

oc

es

s V

ar

iab

le

x(

t)

Process Data

Normal Section

Abnormal Section

Figure 4.6: Data sections for normal and abnormal states in simulated data

The sections that contain probable regions of alarm floods, chattering alarms and any other phenomenon that deteriorate the alarm system performance are also valuable. A section with probable alarm flood can be identified by the number of variables that de-viates from the expected behaviour. Although to fully describe the possible flood it is important to select the data from the beginning of the deviations, since the first variable deviation until the corrective action takes effect. In figure 4.7 a hypothetical alarm flood is presented, in the red area is where the most variables are deviated and in the orange area complements the flood scenario.

The selection of sections of chattering alarms are associated with alarm triggering and clearing. It is expected that an alarm will be activated soon after the start of an abnormality and where noise is present. In addition, it is also expected will deactivate soon before the process return to target operation. For that reason a subsection of one abnormal section can be defined as a chattering alarm section. In figure 4.8 the region between the red dashed lines is an abnormal section and the red area is a subsection selected as a chattering alarm section.

Many sections may overlap each other, mainly on abnormal sections of the data. In some cases the section may be the same but has more than one relevant feature. An abnormal section of data may be also a section of possible alarm flood and contains several subsection of alarm chattering as presented in Figure 4.9. In blue are the normal sections which complements the abnormal section in orange color. The chattering region in green may overlap both normal and abnormal regions since the an alarm for normal operating range accepts little deviations. Finally, in red the flood section which mainly overlaps in the middle of the abnormal section.

Time (HH:MM)

x

10

(t)

Process Data

Flood Section

Figure 4.7: Simulated process with alarm alood

00:00

10:00

20:00

30:00

40:00

Time (HH:MM)

5

0

5

10

15 Pr

oc

es

s V

ar

iab

le

x(

t)

Process Data

Abnormal limits

Chattering Section

(40)

00:00

04:00

08:00

12:00

16:00

Time (HH:MM)

Normal / Abnormal

Chattering

Flood

Normal

_Abnormal

Chattering

Flood

Figure 4.9: Example of overlapping data sections

4.3 Variable Selection

The alarm system must help the operator to identify and take a corrective action when an abnormality is detected. In this context the alarm must be meaningful to clearly indi-cate what is happening and what may happen in the future. Besides alarms have indica-tions of possible causes and alarm messages the combination of different sets of alarms may mean different abnormalities.

Having small set of alarms may indicate clearly to the operator the current state of specific points of the process but fault at good overview. A huge set of alarm will probably overload the operator and easily produce alarm floods. Define the smallest set of alarms that brings meaningful information to the operator is an important step.

There are several works that uses the alarm data to create a set of previous alarm flood patterns. Then when a new flood begin the occurring alarms are compared to the patterns in order to detect which action must be taken (Yang & Guo 2017).

Considering the importance of the choice of the process variables the step of Variable Selection is proposed. Where the information from the Data Description will assist to choose a proper process variable set.

Using the equation (4.2) is possible to prioritize the most common abnormalities in the variable selection. So the alarm configuration surely provides meaningful information to operator. The priority can be used to indicates which affected set have more importance.

Using the affected variables from the abnormality description it is possible to discard any undisturbed process variable. The union of the affected variables represent all the process variables that are candidate for the alarm configuration. A subset that can repre-sent the abnormalities must be selected. An initial set for the selected variables are the variables that belongs to only one affected set.

The previous information, if provided, are useful to variable selection. The con-strained variables must be within the provided limits to guarantee safety and quality for the process. If an abnormality passes the provided limits the affected set can be used to select the variables which deviate prior to the constrained variable in order to reduce the time before the alarm trigger. The resulting sets obtained from the operating constraints, among the affected variables sets, helps to find the most meaningful variables.

(41)

4.4. ALARM CONFIGURATION 25

A classifier model can be obtained and a feature importance calculated to obtain a degree of relevance on the process variables which are important to classify the process state. Each model has intrinsic relevance for each feature and there are methods to esti-mate the feature importance based only in the data, e.g., permutation importance.

The permutation importance is calculated by shuffling the data of each variable and then use the model to predict the state of the data. The procedure is repeated for each vari-able and the performance metric of choice is calculated. The variation of the performance with the original data and the shuffled data is the permutation importance.

A classifier was obtained from the data of the figure 4.5 and the data sections labeled according with normal, abnormal 1 and abnormal 2. Then a permutation importance was obtained reveling that variable x3is affects more the prediction results. Table 4.1 presents

theses values.

Table 4.1: Permutation importance simulated results

Variable Weight

x₁ 0.2092 ± 0.0113

x2 0.1497 ± 0.0096

x3 0.2486 ± 0.0050

4.4 Alarm Configuration

At this stage of the proposed approach, the alarm attributes and types will be defined. The process data will be selected according to the data sections and the selected vari-ables. Then the type of the alarms will be determined and the attributes will be defined using specific algorithm. As discussed in chapter 3, there are several works that propose methods to define the basic alarm attributes.

Each variable present in the selected variable needs at least one alarm. The type of the alarm will defined using the normal data section and abnormal data section. When the values of the abnormal data section are higher than the normal section then a high alarm will be set. Analogously, the low alarm will be defined when abnormal sections have lower values than the normal section.

Using the chattering sections, it is possible to determine which variables probably will produce chattering. At chattering sections, the variables should drift from the normal op-eration to abnormal. When the drift is noisy, the variable should produce chattering alarms if no deadband or delay-timer is implemented. These variables must use an algorithm that also specify a dead band or delay-timer.

4.5 Performance Evaluation

The final step, Performance Evaluation, test the configuration of the system against the process data history. To asses the quality of the system there are several metrics present

(42)

in standards and academic research. If the system presents adequate performance, then it may implemented with the supervision of a process expert. In Subsection 4.5.1 the standard metrics presented in ANSI/ISA (2009) are discussed. The academic proposed metrics are discussed in Subsection 4.5.2

4.5.1 Standard Metrics

This dissertation will use the metrics present in ANSI/ISA (2009). The table 4.2 sum-marizes the metrics present in ISA standard (ANSI/ISA 2009). The values present on the target columns indicate the maximum values for acceptability and manageability. It is ex-pected that the real values to be less than the target values. The manageability is regarding the capacity of the operator manage the alarm occurrences. While the acceptability is in regard the metrics are very likely to accepted as good values for the performance of the system in general. These values are indications and each process may have specific target values more rigidly or not.

Present in the ANSI/ISA (2009), the metrics regarding the operator actions are Unau-thorized Alarm Suppression and UnauUnau-thorized Alarm Attribute Changes. These metrics are not directly affected by the alarm configuration, therefore they will not be considered in the performance evaluation in this thesis. In addition, the alarm priority is not defined in the proposed approach and the Alarm Priority Distribution will not be considered.

Finally, the metrics from the standard that will be used to evaluate the obtained alarm performance are listed below:

• Alarms per day per operating position • Alarms per hours per operating position • Alarms per 10 minutes per operating position • Percentage of hours containing more than 30 alarms

• Percentage of 10 minute periods containing more than 10 alarms • Maximum alarm count in a 10 minute period

• Percentage of time the alarm system is in a flood condition

• Percentage contribution of the top 10 most frequent alarms to the overall alarm load • Quantity of chattering and fleeting alarms

4.5.2 Academic Metrics

There are several scientific works addressing tuning of industrial alarm attributes, as discussed in chapter 3. An expressive number of works aims to define algorithms to alarm configuration. In other to evaluate the performance of the algorithms some metrics are proposed to measure the undesired effects from a bad configuration. For the evaluation, in this thesis the false alarm rate and missed alarm rate will be used to measure correctness. The average alarm delay will be used to measure how fast the alarm system detects a present abnormality. Finally, the alarm chattering will addressed by using a chattering index.

1_{The real values are expected to not exceed the values present in the table} 2_{Average value}

(43)

4.5. PERFORMANCE EVALUATION 27

Table 4.2: ISA standard performance metrics

Metric Target

1

Acceptable Manageable

Alarms per Day per Operating Position 150 alarms 300 alarms

Alarms per Hour per Operating Position 62 122

Alarms per 10 Minutes per Operating Po-sition

12 22

Percentage of hours containing more than 30 alarms

1%

Percentage of 10-minute periods contain-ing more than 10 alarms

1%

Maximum alarm count in a 10 minute pe-riod

10

Percentage of time the alarm system is in a flood condition

1%

Percentage contribution of the top 10 most frequent alarms to the overall alarm load

5%

Quantity of chattering and fleeting alarms 0

Alarm Priority Distribution 3 priorities: 80% Low, 15%

Med, 5% High

4 priorities: 80% Low, 15% Med, 5% High, <1% “highest”

Unauthorized Alarm Suppression 0

(44)

False Alarm Rate

The false alarm rate (FAR) is defined in a way that it measures the amount of time an alarm system has an alarm raised in normal condition. More specifically, it measures the amount of time that an alarm is active when it should not be. The false alarm rate can be calculated from alarm data as the sum of the duration of each alarm occurrence in normal data section. The equation (4.4) defines the FAR where ta,i is the trigger moment of the

i-thoccurrence of a specific alarm and tc,iis the respective clear moment of the same i-th

occurrence of a specific alarm.

FAR=

N

∑

i=1

tc,i− ta,i (4.4)

It is possible to define an overall false alarm rate for the entire system. The overall false alarm rate considers the moment any alarm is raised as the trigger moment,Ta and

the moment when all the alarms are clear as the clear moment, Tc. In equation (4.5), the

overall false alarm rate is defined as in equation (4.4).

OFAR=

N

∑

i=1

T_c,i− Ta,i (4.5)

Missed Alarm Rate

Like the false alarm rate (MAR), the missed alarm rate measures the amount of time an alarm system should has an alarm raised but does not have. That is, the amount of time an alarm system does not have an alarm raised when it should has. The missed alarm rate can be calculated from alarm data as the sum of the intervals lengths where the alarm is cleared in abnormal data sections. The equation (4.6) defines the MAR, where ta,i+1 is

the trigger moment of the i+1-th alarm occurrence of a specific alarm and tc,i is the clear

moment of the i-th alarm occurrence of the same specific alarm.

MAR=

N

∑

i=1

ta,i+1− tc,i (4.6)

The missed alarm rate can also also calculated to overall system. The overall missed alarm rate considers the moment all alarms are cleared as the clear moment,Tc and the

moment when any the alarm is triggered as the trigger moment, Ta. In equation (4.7) the

overall missed alarm rate is defined as in equation (4.6).

OMAR=

N

∑

i=1

Data-Driven approach to parametric configuration of industrial alarms

Data-Driven Approach to Parametric

Configuration of Industrial Alarms

Yuri Thomas Pinheiro Nunes

Order Number of PPgEEC: D546

Natal, RN, January 2019

Abstract

Resumo

Contents

List of Figures

List of Tables

List of Symbols and Abbreviations

Chapter 1

Introduction

1.1

Contextualization

1.2

Dissertation Objectives

1.3

Document Structure

Chapter 2

Industrial Alarms

2.1

Alarms

2.1.1

Mathematical Definitions

2.1.2

Standard Alarm definitions

Chapter 3

Related Works

3.1

Implemented Algorithms

3.1.1

Outlier Detection

3.1.2

Performance Optimization

15

10

5

0

5

10

15

x

0.00

0.05

0.10

0.15

0.20

0.25

p(

x)

an

d

q(

x)

p(x)

q(x)

x

3.1.3

Correlation Consistency

∑

∑

3.2

Comparison

Chapter 4

Proposed Approach

4.1

Previous Information

4.2

Data Description

4.2.1

Abnormal to Normal Ration

00:00

20:00

40:00

60:00

80:00

100:00

Time (HH:MM)