Contributions to the estimation of the worst-case execution time using measurements in real-time systems

(1)

DE AUTOMAC¸ ˜AO E SISTEMAS

Karila Palma Silva

CONTRIBUTIONS TO THE ESTIMATION OF THE WORST-CASE EXECUTION TIME USING MEASUREMENTS IN REAL-TIME SYSTEMS

Florian´opolis 2019

(2)

(3)

CONTRIBUTIONS TO THE ESTIMATION OF THE WORST-CASE EXECUTION TIME USING MEASUREMENTS IN REAL-TIME SYSTEMS

Tese submetida ao Programa de Pós-Gradua¸cão em Engenharia de Automa¸cão e Sistemas para a obten¸cão do Grau de Doutor em Engenharia de Automa¸cão e Sistemas.

Orientador: Prof. Dr. Rˆomulo Silva de Oliveira

Florian´opolis 2019

(4)

Silva, Karila Palma

Contributions to the Estimation of the Worst Case Execution Time Using Measurements in Real-Time Systems / Karila Palma Silva ; orientador, Rômulo Silva de Oliveira, 2019.

171 p.

Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico, Programa de Pós Graduação em Engenharia de Automação e Sistemas, Florianópolis, 2019.

Inclui referências.

1. Engenharia de Automação e Sistemas. 2. WCET. 3. MBPTA. 4. Complex computer architectures. 5. Input data. I. Oliveira, Rômulo Silva de. II. Universidade Federal de Santa Catarina. Programa de Pós-Graduação em Engenharia de Automação e Sistemas. III. Título.

(5)

CONTRIBUTIONS TO THE ESTIMATION OF THE WORST-CASE EXECUTION TIME USING MEASUREMENTS IN REAL-TIME SYSTEMS Esta Tese foi julgada aprovada para a obten¸cão do T´ıtulo de “Doutor em Engenharia de Automa¸cão e Sistemas”, e aprovada em sua forma final pelo Programa de Pós-Gradua¸cão em Engenharia de Automa¸cão e Sistemas.

Florian´opolis, 21 de Mar¸co 2019.

Prof. Dr. Rˆomulo Silva de Oliveira Orientador

Prof. Dr. Werner Kraus Junior

Coordenador do Programa de Pós-Gradua¸cão em Engenharia de Automa¸cão e Sistemas Banca Examinadora:

Prof. Dr. Rˆomulo Silva de Oliveira Presidente

Prof. Dr. Adenilso da Silva Sim˜ao Universidade de S˜ao Paulo (USP)

Prof. Dr. M´arcio Basto Castro

Universidade Federal de Santa Catarina (UFSC)

Prof. Dr. Alex Sandro Roschildt Pinto Universidade Federal de Santa Catarina (UFSC)

(6)

(7)

(8)

(9)

First of all, I thank God, who has always enlighten my path, giving me strength, wisdom, health and protection.

Second, I thank my parents, Marcia Terezinha Palma Silva and Carlos Antonio Tanajura da Silva, for their trust and support during my academic life, allowing access to high end education. A special thanks to my sister, Kauana Palma Silva, who encouraged and supported me for all these years. Also, I thank all my family. Without them, the way would certainly be more difficult.

Third, I thank my advisor, Rˆomulo Silva de Oliveira, for his readiness, dedication, and the remarkable suggestions that helped me to stay in the right path of the work.

Fourth, I would like to thank my laboratory colleagues and friends Leonardo Rodrigues, Daniel Bristot, Sidney, Flávio, Fernando Gon¸calves, Fábio, Kleber, Guilherme, Lya, Franciele, João, Carlos, Angelica, Rejane, Daniel, Alexandre, Stephanie, Thayse and Tadeu. Everyone participated actively in this process, sharing moments, ideas and motivations. A special thank to my friend Lu´ıs Fernando Arcaro, who helped to make this work possible, by working together with long priceless discussions.

Finally thanks to CNPq (Conselho Nacional de Desenvolvimento Cient´ıfico e Tecnol´ogico) for the financial support.

(10)

(11)

(12)

(13)

Na medida em que o uso de sistemas computacionais prolifera na sociedade, aplica¸c˜oes com requisitos de tempo real tornam-se cada vez mais comuns. Um passo cr´ıtico no desenvolvimento desses sistemas ´

e determinar se as tarefas atendem às restri¸cões temporais. Para isso, são necessárias análises de tempo de execu¸cão de pior caso (conhecido como WCET). Ao mesmo tempo, a demanda dos sistemas modernos por software cada vez mais complexo requer computadores mais poderosos e complexos. Com isso, torna-se a análise do WCET um desafio cada vez mais dif´ıcil. Neste contexto, o objetivo deste trabalho é investigar e propor métodos que podem ser usados para lidar com arquiteturas de computador complexas para estimar o WCET usando medi¸cões. A Análise Temporal Probabil´ıstica Baseada em Medi¸cão (ATPBM) promete produzir WCET probabil´ısticos para tarefas de tempo real com base na análise de medi¸cões de tempo de execu¸cão através da Teoria dos Valores Extremos (TVE). A ATPBM exige que os tempos máximos de execu¸cão observados sigam uma distribui¸cão de valores extremos e permitam determinar valores de tempo de execu¸cão que se espera que sejam excedidos apenas com probabilidades arbitrariamente baixas (pWCETs). Alguns trabalhos sugerem o uso dos modelos Generalized Pareto ou Generalized Extreme Value, enquanto outros consideram os modelos Gumbel ou Exponential mais adequados. Neste trabalho, primeiramente, realizamos uma avalia¸cão da exatidão dos limites do WCET determinados por meio desses modelos. Para isso, consideramos uma plataforma aleatorizada em tempo, na qual a ATPBM deveria produzir resultados confiáveis, a fim de ser considerada utilizável no caso geral. Posteriormente, realizamos uma avalia¸cão da adequa¸cão da ATPBM para estimar o pWCET de uma tarefa executada em uma arquitetura complexa com o Linux. Optamos por realizar a análise de uma tarefa que é frequentemente usada em análise temporal com dados de entrada fixos conhecidos para induzir altos tempos de execu¸cão e empregamos uma amostra de valida¸cão coletada usando a ferramenta Perf. Usamos a ferramenta Perf para deduzir a interferência direta de outras tarefas e tratadores de interrup¸cão. No entanto, tarefas de tempo real executadas em arquiteturas complexas sofrem grande interferência indireta de outras atividades executadas no mesmo sistema, gerando, assim, ru´ıdo nos tempos de execu¸cão. Neste contexto, é dif´ıcil ou imposs´ıvel determinar o pior cenário para a análise temporal baseada

(14)

indiretamente, dos dados de entrada usados. Nosso experimento mostra que os dados de entrada conhecidos por induzir altos tempos de execu¸cão podem produzir tempo de execu¸cão menores em tais plataformas. Para fornecer estimativas confiáveis do WCET, é necessário selecionar os dados de entrada que induzem tempos de execu¸cão altos. Neste trabalho, realizamos uma análise dos tempos de execu¸cão de tarefas de tempo real com respeito aos dados de entrada, isto é, (1) verificamos a sensibilidade dos tempos de execu¸cão de tarefas aos dados de entrada, e (2) avaliamos quantitativamente seu impacto nos tempos de execu¸cão resultantes. Tarefas sens´ıveis aos dados de entrada requerem os dados de entrada que maximizam o tempo de execu¸cão. Uma abordagem para selecionar dados de entrada é utilizar algoritmos de otimiza¸cão. No entanto, a variabilidade temporal observada em arquiteturas complexas dificulta a compara¸cão de medidas de tempo de execu¸cão obtidas usando dados de entrada diferentes. Nesse contexto, (1) realizamos uma avalia¸cão de diferentes métodos de medi¸cão, (2) implementamos um algoritmo genético no qual a fun¸cão de fitness é baseada em medi¸cões de tempo de execu¸cão selecionadas usando métodos clássicos e novos, e (3) estimamos limites pWCETs para uma tarefa, usando os piores dados de entrada obtidos através do algoritmo genético desenvolvido. Destacamos que, o trabalho proposto é justificado pela sua relevância cient´ıfica e pelo potencial impacto econômico positivo decorrente dos métodos para a determina¸cão das estimativas do WCET.

Palavras-chave: WCET; ATPBM; Arquitetura de computador complexa; Dados de entrada.

(15)

Introdu¸c˜ao

Os sistemas computacionais de tempo real são identificados como sistemas submetidos, além de requisitos de natureza lógica, a requisitos de natureza temporal, onde os resultados devem estar corretos não somente do ponto de vista lógico, mas também devem ser gerados no momento correto. Os requisitos temporais a que esses sistemas estão sujeitos são expressos em termos dos prazos (deadlines) nos quais os resultados devem ser gerados. Testes de escalonabilidade são empregados para demonstrar que, mesmo no pior caso, as tarefas de um determinado sistema podem ser escalonadas cumprindo seus prazos. Esses testes requerem, além do conhecimento do per´ıodo e do deadline que são impostos ao sistema pelo contexto de opera¸cão, a determina¸cão do tempo máximo de execu¸cão no pior caso — Worst-Case Execution Time (WCET) — de cada uma das tarefas do sistema, ou de um limite para ele, sendo esses derivados considerando o software e o hardware utilizados (LIU, 2000; WILHELM et al., 2008).

Uma tarefa de software pode ter tempos de execu¸cão diferentes quando executada múltiplas vezes em uma mesma plataforma de hardware. As principais fontes de variabilidade temporal são (1) o hardware do processador usado, e (2) os caminhos de execu¸cão que são efetivamente medidos. Com respeito a (1) os tempos de execu¸cão são afetados pelas latências dos elementos internos do processador que realizam a execu¸cão das instru¸cões. A introdu¸cão de elementos de acelera¸cão — por exemplo, pipeline, caches de instru¸cões e dados e branch prediction — para melhorar o desempenho, diminui os tempos de execu¸cão, mas os torna variáveis e dependentes do histórico de execu¸cão (ou seja, o histórico de execu¸cão precisa ser levado em conta para prever o estado do processador no in´ıcio da execu¸cão de uma instru¸cão espec´ıfica). Em rela¸cão a (2) encontrar o(s) pior(es) caminho(s) de tempo de execu¸cão — Worst-Case Execution Paths (WCEPs) — e garantir que estes tenham sido testados é um grande desafio. O que define qual caminho é executado são as variáveis de entrada e as variáveis permanentes alteradas em execu¸cões anteriores, sendo que alguns estados de variáveis permanentes são atingidos somente após muitas execu¸cões da tarefa. Portanto, várias execu¸cões de uma tarefa podem produzir tempos diferentes devido às caracter´ısticas do hardware e às entradas utilizadas nos testes (ENGBLOM, 2002;SCHNEIDER, 2000;

(16)

métodos estáticos quanto métodos baseados em medi¸cão. Os métodos estáticos geram limites seguros por meio de uma análise detalhada do código da tarefa e da arquitetura do hardware, o que implica em grandes esfor¸cos de modelagem e computacionais. Os métodos baseados em medi¸cão analisam os tempos de execu¸cão das tarefas efetivamente produzidos durante a execu¸cão. Tais métodos reduzem significativamente os esfor¸cos de análise, mas exigem a determina¸cão de margens de seguran¸ca para considerar poss´ıveis efeitos de temporiza¸cão não observados, e exigem amostras de tempos de execu¸cão representativas em rela¸cão ao pior cenário dos dados de entrada e do contexto de execu¸cão utilizado (CAZORLA et al., 2016;

WILHELM et al., 2008;ABELLA et al., 2015).

Análise temporal probabil´ıstica baseada em medi¸cão — Measurement-Based Probabilistic Timing Analysis (MBPTA) — foi proposta para determinar os limites WCET com base na análise estat´ıstica das medi¸cões do tempo de execu¸cão (CUCU-GROSJEAN et al., 2012) através da teoria dos valores extremos — Extreme Value Theory (EVT) —, um ramo da estat´ıstica inicialmente projetado para estimar a probabilidade de eventos naturais incomuns (COLES, 2001). A aplica¸cão da EVT na MBPTA envolve ajustar uma distribui¸cão de valores extremos aos tempos máximos de execu¸cão observados enquanto a tarefa analisada é executada na plataforma de hardware de destino. A distribui¸cão é usada para determinar um valor que se espera que seja excedido apenas com uma probabilidade suficientemente baixa — chamado de Probabilistic Worst-Case Execution Time (pWCET) (ABELLA et al., 2014;CUCU-GROSJEAN et al., 2012).

No entanto, EVT foi proposta para obter WCET probabil´ıstico em arquiteturas relativamente simples, de preferˆencia aleatorizada em tempo, com probabilidade de excedˆencia na ordem de 10−15 (KOSMIDIS

et al., 2013; ABELLA et al., 2014). No entanto, `a medida que o uso de

sistemas computacionais se prolifera em nossa sociedade, os aplicativos com requisitos em tempo real estão se tornando cada vez mais comuns. Por isso, é necessário que a indústria Sistemas de Tempo Real (STR) ofere¸ca sistemas eficazes, confiáveis e flex´ıveis, disponibilizando-os ao mercado o mais rápido poss´ıvel. Isso resulta no aumento da complexidade do software, onde temos mais e mais equipamentos com computadores controlando e tomando decisões que afetam o mundo f´ısico e, consequentemente, que devem usar elementos complexos de hardware.

(17)

considerando que esses sistemas incluem equipamentos industriais com milhões de linhas de código e até mesmo eletrodomésticos baseadas em Linux.

As técnicas de análise do WCET são dificultadas pelo custo e complexidade cada vez maiores de obter conhecimento preciso da opera¸cão interna de processadores avan¸cados para análise estática e por dificuldades em associar confiavelmente tempos de execu¸cão variáveis com piores comportamento por meio de técnicas baseadas em medi¸cão (WILHELM et al., 2008).

Objetivos

O objetivo deste trabalho é investigar e propor métodos que possam ser usados para estimar o pior tempo de execu¸cão de tarefas em sistemas em tempo real usando medi¸cão. Neste trabalho também abordamos esse problema quando são usadas arquiteturas de computador complexas e sistemas operacionais, já que nesses sistemas a análise estática geralmente não é poss´ıvel.

Os novos métodos propostos visam equilibrar os esfor¸cos entre confiabilidade e custo na aplica¸cão de técnicas que melhoram a seguran¸ca e robustez dos sistemas. O custo de análise é reduzido, minimizando o esfor¸co gasto nos testes, e a confiabilidade é aumentada maximizando a confian¸ca de que os requisitos de tempo são atendidos. Para atingir esse objetivo, aplicamos vários métodos estat´ısticos, juntamente com algoritmos de otimiza¸cão em alguns casos.

Entre os elementos relacionados à medi¸cão dos tempos de execu¸cão, podemos citar

Dificuldades em associar de maneira confiável os tempos de execu¸cão com o pior comportamento das tarefas por meio de técnicas baseadas em medi¸cão. Isso envolve o hardware do processador usado e os caminhos de execu¸cão que são efetivamente medidos.

A determina¸cão dos dados de entrada é uma questão cr´ıtica para a análise de tempo de uma tarefa através de métodos baseados em medi¸cões. Uma abordagem poss´ıvel para encontrar os dados de entrada do pior caso de uma tarefa com rela¸cão ao seu tempo de execu¸cão é empregando algoritmos de otimiza¸cão. No entanto, a grande variabilidade temporal observada em arquiteturas complexas de computadores dificulta a compara¸cão de medidas

(18)

minimizar o ru´ıdo gerado pela plataforma de hardware em tais compara¸cões, proporcionando assim maior confian¸ca de que as diferen¸cas observadas nos tempos de execu¸cão são de fato devidas a varia¸cões nos dados de entrada.

Evidência de que os requisitos de aplicabilidade da EVT são atendidos, apoiando, portanto, o argumento de que os dados coletados podem, de fato, ser analisados através dos modelos probabil´ısticos que ela emprega. Isso requer que os tempos de execu¸cão sejam considerados independentes e identicamente distribu´ıdos, e os tempos máximos de execu¸cão observados das tarefas analisadas aderem a uma distribui¸cão de valores extremos.

As arquiteturas que foram exploradas nesta tese consistem em (1) uma plataforma aleatorizada em tempo, no qual consideramos condi¸cões em que a MBPTA deve produzir resultados notavelmente confiáveis, com o propósito de realizar uma avalia¸cão emp´ırica do uso dos modelos EVT na análise dos tempos de execu¸cão de sistemas de tempo real, com o objetivo de detectar se eles podem ser considerados adequados para determinar as estimativas pWCET, e também (2) um processador complexo, garantindo que nossas avalia¸cões sejam realizadas em um cenário em que tais ambientes complexos de execu¸cão são tipicamente empregados em sistemas reais de computa¸cão, os quais impõe desafios adicionais, que serão explorados nesta tese com o intuito de contribuir para a estima¸cão da WCET utilizando medidas neste contexto. Contribui¸cões

As contribui¸c˜oes desta tese para o estado da arte s˜ao:

Como mencionado anteriormente, a MBPTA requer que os tempos máximos de execu¸cão observados das tarefas analisadas adiram a uma distribui¸cão de valores extremos e permite determinar os valores de tempo de execu¸cão que se espera que sejam excedidos apenas com probabilidades arbitrariamente baixas. Vários trabalhos sugerem que os modelos Generalized Pareto (GP) ou Generalized Extreme Value (GEV) devem ser empregados em tais análises, enquanto outros consideram os modelos Gumbel ou Exponential mais adequados para fornecer limites superiores com maior confiabilidade. Contribui¸cão: uma avalia¸cão emp´ırica da precisão dos limites WCET determinados por meio desses modelos.

(19)

ser considerado utiliz´avel no caso geral.

MBPTA (através da EVT) foi proposta para obter pWCETs em arquiteturas relativamente simples, de preferência aleatorizada em tempo. No entanto, existem muitas aplica¸cões que devem ser executadas em arquiteturas complexas de computadores usando sistema operacional, como o Linux, que estão sujeitos a requisitos firmes de tempo real. Contribui¸cão: uma avalia¸cão emp´ırica da adequa¸cão da MBPTA para estimar o pWCET de uma tarefa executada em um computador complexo com o Linux. EVT seria uma ferramenta altamente valiosa para desenvolvedores de sistemas, uma vez que poderia permitir estimativas pWCET que cobrem casos cuja observa¸cão é muito além do viável na prática por meio de testes padrão, por exemplo, devido a limita¸cões no número de testes que podem ser realizados.

As tarefas em tempo real executadas em arquiteturas complexas de computadores sofrem grande interferência indireta de outras atividades executadas no mesmo sistema, gerando, assim, ru´ıdo nos tempos de execu¸cão observados. Neste contexto, é dif´ıcil ou imposs´ıvel determinar o pior cenário para a análise temporal baseada em medi¸cões de tarefas, ou seja, o estado do hardware e o caminho de execu¸cão que gera o maior tempo de execu¸cão. A variabilidade temporal induzida tanto pelos efeitos de hardware como pelos caminhos de execu¸cão depende, direta ou indiretamente, dos dados de entrada usados durante a coleta de medi¸cões. Portanto, é necessário selecionar os dados de entrada que induzem tempos de execu¸cão altos neste contexto, para fornecer estimativas WCET confiáveis em arquiteturas de hardware modernas usando MBPTA. Neste contexto, propomos um método para o testador de software obter informa¸cões sobre o impacto dos dados de entrada nos tempos de execu¸cão das tarefas e, portanto, a importância de identificar os dados de entrada do pior caso — em rela¸cão ao tempo de execu¸cão — para ser usado no teste das tarefas.

Para selecionar dados de entrada para executar MBPTA, uma abordagem poss´ıvel é empregando algoritmos de otimiza¸cão, por exemplo, algoritmo genético. No entanto, a grande variabilidade temporal observada em arquiteturas complexas de computadores

(20)

propomos um novo método a ser usado para comparar diferentes dados de entrada com rela¸cão aos tempos de execu¸cão nas tarefas executados em arquiteturas de computador complexas, (2) implementamos um algoritmo genético cuja fun¸cão de adequa¸cão ´

e baseada em medi¸cões de tempo de execu¸cão selecionadas usando métodos de medi¸cão tradicionais e novos, e (3) estimamos pWCET para uma tarefa, usando os piores dados de entrada obtidos através do algoritmo genético desenvolvido, para mostrar importância dos dados de entrada na MBPTA e refor¸car a relevância do método de medi¸cão proposto.

Considera¸c˜oes Finais

Para suportar a demanda computacional das aplica¸cões executadas em sistemas de tempo real, o uso de arquiteturas de computador complexas será fundamental. Nesse contexto, uma análise temporal estática do WCET geralmente não é viável, os métodos de medi¸cão convencionais podem não obter amostras de tempos de execu¸cão representativas em rela¸cão ao pior cenário, devido ao fato que as condi¸cões para que ele ocorra são pequenas e dif´ıceis de serem determinadas. Portanto, métodos flex´ıveis de análise de tempo são necessários.

Como a execu¸cão de um grande número de testes é geralmente inviável na prática, as técnicas de inferência EVT podem ser consideradas uma ferramenta valiosa para desenvolvedores de sistemas, sempre que as condi¸cões de aplicabilidade EVT forem válidas. No entanto, existem algumas dificuldade para obter a confiabilidade dos resultados produzidos, tais como: a utiliza¸cão de um modelo EVT adequado para obter estimativas pWCET seguras, e os dados de entrada utilizados nos testes.

Portanto, neste trabalho investigamos e propomos métodos que podem ser usados para lidar com arquiteturas de computador complexas para estimar o pior tempo de execu¸cão de tarefas em sistemas de tempo real usando medi¸cão.

Palavras-chave: Tempo de execu¸cão no pior caso; Análise temporal probabil´ıstica baseada em medi¸cão; Arquitetura de computador complexa; Dados de entrada.

(21)

As the use of computers proliferates in our society, systems with strict timing requirements – or Real-Time Systems (RTS) – become ever more common. A critical step in designing such systems is determining whether tasks meet their timing constraints. For that, Worst-Case Execution Time (WCET) analyses are necessary. At the same time, modern systems’ demand for increasingly complex software requires more powerful and complex computers being used. These facts make RTS’ WCET analysis an increasingly harder challenge. In this context, the objective of this work is to investigate and propose methods that can be used to deal with complex computer architectures for estimating WCET using measurements. The technique known as Measurement-Based Probabilistic Timing Analysis (MBPTA) promises producing WCET bounds for RTS’ tasks based on the analysis of execution time measurements through Extreme Value Theory (EVT). For that, MBPTA requires the analysed tasks’ maximum observed execution times to adhere to an extreme value distribution and allows determining execution time values expected to be exceeded only with arbitrarily low probabilities (pWCETs). Several works suggest the use of Generalized Pareto or the Generalized Extreme Value models, while others consider the Gumbel or the Exponential models more adequate. In this work, firstly we perform an assessment on the tightness of the WCET bounds determined through these models. For that, we considered a time-randomized platform, in which MBPTA should yield remarkably reliable results in order to be deemed usable in the general case. Posteriorly, we performed an assessment of the adequacy of MBPTA to estimate the pWCET of a task executed on a complex architecture with Linux. For that, we chose to perform the analysis of a task that is frequently used in temporal analysis with fixed input data known to induce high execution times and employed a large execution time validation sample collected using the Perf tool. We use the Perf tool in order to deduct the direct interference from other tasks and interrupt handlers. Real-time tasks executed on complex architectures suffer large indirect interference from other activities executing on the same system, hence generating noise in the observed execution times. In this context, it is difficult to determine the worst scenario for tasks’ measurement-based temporal analysis. The timing variability induced both by hardware effects and by the execution paths depend, directly or indirectly, on the input data used.

(22)

shorter execution time for a task executed on such platforms. It is hence necessary to select the input data that induce high execution times in this context, to provide reliable WCET estimates. In this work, we performed an assessment of the execution times of real-time tasks with respect to the input data, i.e., (1) verifying the sensitivity of the task execution times to the input data used, and (2) the quantitative assessment of their impact on the resulting execution times. Tasks that are sensitive to input data require input data that maximizes the execution time being searched in order to obtain reliable pWCETs. In order to select input data for performing MBPTA on complex architectures, one possible approach for finding the worst-case input data of a task with respect to its execution time is by employing optimization algorithms, e.g., a genetic algorithm. However, the large timing variability observed on complex architectures hinders the comparison of execution time measurements obtained using different input data. In this context, we (1) performed an assessment of different measurement methods, (2) implemented a genetic algorithm in which the fitness function is based on execution time measurements selected using both traditional and novel methods, and (3) estimated probabilistic WCET bounds for a task, using the worst input data obtained through the developed genetic algorithm. We highlight that the proposed work is justified by its scientific relevance and by the potential positive economic impact arising from methods for the determination of WCET estimates. Keywords: WCET; MBPTA; Complex computer architectures; Input data.

(23)

Figure 1 Example distribution of execution times (adapted from

(WILHELM et al., 2008)) . . . 33

Figure 2 CFG (OLIVEIRA, 2018) . . . 44 Figure 3 Block Maxima . . . 56 Figure 4 Peaks Over Threshold . . . 58 Figure 5 GEV family . . . 59 Figure 6 GP family . . . 60 Figure 7 Goodness-of-fitness . . . 61 Figure 8 No goodness-of-fitness . . . 61 Figure 9 Trace output . . . 66 Figure 10 MBPTA applicability statistical tests’ p-values . . . 74 Figure 11 Real hardware sample applicability evidence . . . 75 Figure 12 Real hardware sample applicability evidence . . . 76 Figure 13 Real hardware pWCET tightness analysis . . . 77 Figure 14 Real hardware pWCET tightness analysis . . . 78 Figure 15 pWCET tightness analysis for ξ = −1₂ . . . 80 Figure 16 pWCET tightness analysis for ξ = −1₄ . . . 81 Figure 17 pWCET tightness analysis for ξ = −1₈ . . . 82 Figure 18 pWCET tightness analysis for ξ = 0 . . . 82 Figure 19 pWCET tightness for GEV data of ξ = −1

2 . . . 83

Figure 20 pWCET tightness for GP data of ξ = −1

2. . . 83

Figure 21 pWCET tightness for GEV data of ξ = −1₄ . . . 84 Figure 22 pWCET tightness for GP data of ξ = −1₄. . . 84 Figure 23 pWCET tightness for GEV data of ξ = −1₈ . . . 85 Figure 24 pWCET tightness for GP data of ξ = −1₈. . . 85 Figure 25 pWCET tightness for GEV data of ξ = 0 . . . 86 Figure 26 pWCET tightness for GP data of ξ = 0 . . . 86 Figure 27 Statistical tests’ p-values . . . 91 Figure 28 GEV - Block size 50 . . . 92 Figure 29 GEV - Block size 250 . . . 93 Figure 30 GEV - Block size 500 . . . 93 Figure 31 GEV - Block size 1000 . . . 93

(24)

Figure 34 pWCET reliability . . . 95 Figure 35 pWCET tightness . . . 96 Figure 36 Genetic algorithm. . . 101 Figure 37 IDS - bsort . . . 104 Figure 38 IDS - cnt . . . 104 Figure 39 IDS - dijkstra . . . 105 Figure 40 IDS - qurt . . . 105 Figure 41 bsort - Min . . . 107 Figure 42 bsort - Max . . . 108 Figure 43 dijkstra - Min . . . 109 Figure 44 dijkstra - Max . . . 110 Figure 45 IDI - bsort . . . 111 Figure 46 IDI - dijkstra . . . 111 Figure 47 Consistent measurements example . . . 116 Figure 48 Inconsistent measurements example . . . 116 Figure 49 Identical distribution tests’ p-values for SM . . . 117 Figure 50 Execution times obtained using SM . . . 118 Figure 51 Identical distribution tests’ p-values for BAVG . . . 119 Figure 52 Execution times obtained using BAVG . . . 119 Figure 53 Identical distribution tests’ p-values for BMEDIAN . . . . 120 Figure 54 Execution times obtained using BMEDIAN . . . 121 Figure 55 Identical distribution tests’ p-values for BMAX . . . 122 Figure 56 Execution times obtained using BMAX . . . 122 Figure 57 Identical distribution tests’ p-values for BMIN . . . 123 Figure 58 Execution times obtained using BMIN . . . 124 Figure 59 Identical distribution tests’ p-values for MBMAX . . . 125 Figure 60 Execution times obtained using MBMAX . . . 125 Figure 61 Identical distribution tests’ p-values for MBMIN . . . 126 Figure 62 Execution times obtained using MBMIN . . . 127 Figure 63 Different input data execution times . . . 130 Figure 64 Validation samples’ histograms . . . 131 Figure 65 Validation samples’ histograms . . . 133

(25)

Figure 68 GEV fitness for GA-MBMIN using block size 1000 . . . 136 Figure 69 bsort on DPCpArrr pWCET tightness analysis. . . 155 Figure 70 insertsort on DPArptdm pWCET tightness analysis . . . 155 Figure 71 insertsort on DPCpArrr pWCET tightness analysis . . . . 156 Figure 72 bs on DPArptdm pWCET tightness analysis . . . 156 Figure 73 bs on DPCpArrr pWCET tightness analysis . . . 156 Figure 74 bsort on DPArptdm pWCET tightness analysis . . . 157 Figure 75 bsort on DPCpArrr pWCET tightness analysis . . . 157 Figure 76 insertsort on DPArptdm pWCET tightness analysis . . . 158 Figure 77 insertsort on DPCpArrr pWCET tightness analysis . . . . 158 Figure 78 bs on DPArptdm pWCET tightness analysis . . . 158 Figure 79 bs on DPCpArrr pWCET tightness analysis . . . 159 Figure 80 expint on DPArptdm pWCET tightness analysis . . . 159 Figure 81 expint on DPCpArrr pWCET tightness analysis . . . 159 Figure 82 fdct on DPArptdm pWCET tightness analysis . . . 160 Figure 83 fdct on DPCpArrr pWCET tightness analysis . . . 160 Figure 84 Diagnostic artefacts the applicability of EVT — cnt . . . 163 Figure 85 Diagnostic artefacts the applicability of EVT — dijkstra 165 Figure 86 Diagnostic artefacts the applicability of EVT — huffenc 166 Figure 87 Diagnostic artefacts the applicability of EVT — lcdnum 167 Figure 88 Diagnostic artefacts the applicability of EVT — qurt . . 168 Figure 89 Diagnostic artefacts the applicability of EVT — sha . . . 168

(26)

(27)

Table 1 Confidence interval of the shape parameter . . . 92 Table 2 pWCET estimates . . . 95 Table 3 Execution times MIN obtained by GA - bsort . . . 106 Table 4 Execution times MAX obtained by GA - bsort . . . 106 Table 5 Execution times MIN obtained by GA - dijkstra . . . 108 Table 6 Execution times MAX obtained by GA - dijkstra . . . 109 Table 7 Identical distribution statistical tests. . . 128 Table 8 Direct execution time comparison . . . 128 Table 9 Measurement variation comparison . . . 129 Table 10 Confidence interval of the shape parameter . . . 135 Table 11 pWCET estimates . . . 137 Table 12 Confidence interval of the shape parameter — cnt . . . 164 Table 13 pWCET estimates using Gumbel model — cnt . . . 164 Table 14 Confidence interval of the shape parameter — dijkstra . . 165 Table 15 pWCET estimates using Gumbel model — dijkstra . . . 165 Table 16 Confidence interval of the shape parameter — huffenc . . 166 Table 17 pWCET estimates using Gumbel model — huffenc . . . 166 Table 18 Confidence interval of the shape parameter — sha . . . 169 Table 19 pWCET estimates using Gumbel model — sha . . . 169

(28)

(29)

ACET Average-Case Execution Time

AD Anderson-Darling

ALU Arithmetic Logic Unit

BAVG Block Average

BCET Best-Case Execution Time

BM Block Maxima

BMAX Block Maximum

BMIN Block Minimum

BMEDIAN Block Median

CAN Controller Area Network

CFG Control Flow Graph

CRPS Continuous Ranked Probability Score EQMAE Estimated Quantiles’ Mean Absolute Error

EVT Extreme Value Theory

FPGA Field-Programmable Gate Array FPU Floating-Point Unit

GA Genetic Algorithm

GEV Generalized Extreme Value

GMLE Generalized Maximum Likelihood Estimation

GP Generalized Pareto

HWM High Water Mark

IDI Input Data Impact Analysis IDS Input Data Sensitivity Analysis

(30)

KS Kolmogorov-Smirnov

LB Ljung-Box

MBMAX Multi-Block Maximum MBMIN Multi-Block Minimum

MBPTA Measurement-Based Probabilistic Timing Analysis MIPS Microprocessor without Interlocked Pipeline Stages MLE Maximum Likelihood Estimation

MMU Memory Management Unit

NMI Non-Maskable Interrupt

OS Operating System

POT Peaks Over Threshold

pWCET Probabilistic Worst-Case Execution Time

RAM Random Access Memory

RTS Real-Time System

SM Single Measurement

TLB Translation Lookaside Buffer WCEP Worst-Case Execution Path WCET Worst-Case Execution Time WCRT Worst-Case Response Time

(31)

1 INTRODUCTION . . . 33 1.1 WCRT E WCET . . . 35 1.2 COMPLEX COMPUTER ARCHITECTURES . . . 35 1.3 MOTIVATION . . . 36 1.4 OBJECTIVE OF THIS THESIS . . . 38 1.5 ORIGINAL CONTRIBUTIONS . . . 39 1.6 ORGANIZATION OF THE TEXT . . . 41 2 VARIANCE OF EXECUTION TIMES . . . 43 2.1 VARIANCE CAUSED BY SOFTWARE . . . 43 2.2 VARIANCE CAUSED BY HARDWARE . . . 45 2.3 INPUT DATA . . . 47 2.4 PROBLEMS . . . 47 2.4.1 Data-Dependent execution path . . . 48 2.4.2 Context dependence of execution times . . . 48 2.4.3 Timing Anomalies . . . 48 2.5 WCET ANALYSIS . . . 49

3 THE DETERMINATION OF WCET BOUNDS

THROUGH MBPTA . . . 51 3.1 STATISTICAL TESTS . . . 51 3.2 EXTREME VALUE THEORY (EVT) . . . 53 3.3 SAMPLE COLLECTION . . . 55 3.4 SELECTION OF THE MAXIMUM OBSERVED VALUES 55 3.4.1 Block Maxima (BM) . . . 55 3.4.2 Peaks Over Threshold (POT) . . . 56 3.5 FITTING AN EXTREME VALUE MODEL . . . 58 3.5.1 Generalized Extreme Value (GEV) . . . 58 3.5.2 Generalized Pareto (GP) . . . 59 3.6 EVALUATION OF THE MODEL FITTING . . . 60 3.7 pWCET ESTIMATES . . . 61 3.8 DIFFICULTIES IN THE APPLICABILITY OF EVT IN

THE CONTEXT OF WCET . . . 62 4 EXPERIMENTAL ENVIRONMENT . . . 63 4.1 TASKS . . . 63 4.2 HARDWARE PLATFORM . . . 64 4.2.1 Time-Randomized Hardware Platform . . . 64 4.2.2 Complex Hardware Platform . . . 65 4.3 STATISTICAL SOFTWARE . . . 67

(32)

5.1 SUMMARY OF RELATED WORKS . . . 70 5.2 EXPERIMENT OBJECTIVE . . . 72 5.3 REAL HARDWARE PLATFORMS . . . 72 5.3.1 Applicability evidence . . . 73 5.3.2 Sample size step . . . 74 5.3.2.1 Selecting the maximum observed values through BM . . . 74 5.3.2.2 Selecting the maximum observed values through POT . . . . 75 5.3.3 pWCET tightness . . . 76 5.3.3.1 GEV and Gumbel models . . . 76 5.3.3.2 GP and Exponential models . . . 77 5.4 SYNTHETIC DATA SAMPLES . . . 78 5.4.1 Applicability evidence . . . 79 5.4.2 pWCET tightness . . . 80 5.4.2.1 GEV and Gumbel models . . . 80 5.4.2.2 GP and Exponential models . . . 82 5.5 RECOMMENDATIONS . . . 86

6 MBPTA FOR TASKS EXECUTED ON COMPLEX

COMPUTER ARCHITECTURES . . . 89 6.1 SUMMARY OF RELATED WORKS . . . 89 6.2 COMPLEX HARDWARE PLATFORM WITH LINUX . . 90 6.2.1 Applicability evidence . . . 90 6.2.2 Selection the maximum observed values . . . 91 6.2.3 Model fitting . . . 91 6.2.4 pWCET estimates . . . 94 6.2.5 pWCET reliability and tightness . . . 95 6.3 CONCLUSION . . . 96

7 A METHOD FOR EVALUATING THE SENSITIVITY

OF THE EXECUTION TIME WITH RESPECT TO INPUT DATA . . . 97 7.1 SUMMARY OF RELATED WORKS . . . 98 7.2 EXPERIMENT OBJECTIVES . . . 99 7.2.1 Genetic Algorithm (GA) . . . 99 7.2.2 Input Data Impact Analysis (IDI) . . . 102 7.3 COMPLEX HARDWARE PLATFORM WITH LINUX . . 103 7.3.1 Input Data Sensitivity Analysis (IDS) . . . 103 7.3.2 Input Data for Test . . . 105 7.3.3 Impact of Input Data on Task Execution Time . . . . 110 7.4 CONCLUSION . . . 111

(33)

COMPUTER ARCHITECTURES . . . 113 8.1 RELATED WORKS . . . 114

8.2 METHODS FOR COMPARING THE EXECUTION

TIME OF DIFFERENT INPUT DATA . . . 114 8.2.1 Measurement Methods Assessment . . . 115 8.2.1.1 Consistency Statistical Hypothesis Tests . . . 115 8.2.1.2 Direct Execution Time Comparison . . . 116 8.2.2 Single Measurement (SM) . . . 117 8.2.3 Block Average (BAVG) . . . 118 8.2.4 Block Median (BMEDIAN) . . . 120 8.2.5 Block Maximum (BMAX) . . . 121 8.2.6 Block Minimum (BMIN) . . . 123 8.2.7 Multi-Block Maximum (MBMAX) . . . 124 8.2.8 Multi-Block Minimum (MBMIN) . . . 126 8.3 COMPARISON OF MEASUREMENT METHODS . . . 127 8.4 MEASUREMENT METHODS RELIABILITY

EVALUATION . . . 130 8.5 USE OF GENETIC ALGORITHM TO SELECT INPUT

DATA . . . 131 8.6 IMPACT OF THE INPUT DATA IN MBPTA . . . 133 8.6.1 Applicability Evidence . . . 134 8.6.2 Selection the maximum observed values . . . 134 8.6.3 Model Fitting . . . 135 8.6.4 pWCET Estimates . . . 136 8.7 CONCLUSION . . . 137 9 FINAL REMARKS . . . 139 9.1 FUTURE WORK . . . 140 9.2 PUBLICATIONS . . . 141 9.3 AWARD . . . 142 9.4 ACKNOWLEDGEMENT . . . 142 REFERENCES . . . 143 APPENDIX A -- Assessment plots associated to other scenarios with respect to the pWCET tightness . . . 155 APPENDIX B -- The applicability evidence of the EVT for pWCET Estimates of Tasks Executed on a Complex Computer Architecture with Linux . . . 163

(34)

(35)

1 INTRODUCTION

Real-Time Systems (RTSs) are computing systems subject to requirements of both functional and temporal nature, and vary greatly in size, complexity and criticality. In other words, RTSs must not only produce results which are correct, but which are also expected to be available within acceptable timing deadlines. Schedulability tests must hence be applied for demonstrating that RTSs’ underlying timing requirements (tasks’ deadlines) are met even in the worst-case execution scenario. Such tests require as input upper-bounding estimates of each task Worst-Case Execution Time (WCET), which can be defined as the longest time possibly taken by the target hardware platform to execute the software that implements the task, disregarding the direct interference of other tasks. This is different from the Worst-Case Response Time (WCRT), which also includes the time consumed to execute other tasks (LIU, 2000;WILHELM et al., 2008).

The difficulty of obtaining the WCET is presented by (WILHELM

et al., 2008) and is illustrated in Figure 1. A task typically shows a

certain variation of execution times depending on the input data or different behavior of the environment. In most cases, the state space is too large for the designer to exhaustively explore all possible executions and thereby determine the exact worst-case execution times.

Figure 1: Example distribution of execution times (adapted from

(WILHELM et al., 2008))

Measured execution times Possible execution times

Upper timing bound WCET Maximal observed execution time Time 0 Distri bution of ti mes

For the determination of WCET bounds there are both static and measurement-based methods available. Static methods generate reliable bounds through a detailed joint analysis of the task code and of the hardware architecture, hence implying large computational efforts

(36)

and limited applicability (WILHELM et al., 2008). Measurement-based methods analyse tasks’ execution times effectively produced during execution, hence significantly reducing analysis efforts but requiring the determination of safety margins to account for possibly unwitnessed timing effects (CAZORLA et al., 2016).

Measurement-Based Probabilistic Timing Analysis (MBPTA) was proposed for determining WCET bounds based on the statistical analysis of execution time measurements (CUCU-GROSJEAN et al., 2012) through Extreme Value Theory (EVT), a branch of statistics initially designed to estimate the probability of unusual natural events (COLES, 2001). The application of EVT for performing MBPTA involves fitting an extreme value distribution to the maximum execution times observed while the analysed task is executed on the target hardware platform. The distribution is used to determine a value which is expected to be exceeded only with a sufficiently low probability — and which is therefore called a Probabilistic Worst-Case Execution Time (pWCET) estimate (DAVIS et al., 2014;ABELLA et al., 2014;CAZORLA et al., 2016;

CUCU-GROSJEAN et al., 2012;KOSMIDIS et al., 2016).

Standard measurements are not enough to obtain pWCETs since they may lack completeness — through the measurements, there is no guarantee to have experienced all the execution conditions. Nonetheless, measurements are important for extracting observable features such as average behaviours and trends that can appear while executing tasks. EVT makes statistical inference on the tail region of a distribution function, making it possible to determine WCET estimates associated to arbitrarily low exceedance probabilities (BEREZOVSKYI et al., 2014;

ABELLA et al., 2014).

EVT has been proposed to obtain pWCETs in relatively simple architectures, preferably time-randomized ones, with exceedance probability in the order of 10−15 targeting hard real-time systems

(KOSMIDIS et al., 2013; ABELLA et al., 2014). However, as use of

computational systems proliferates in our society, applications with real-time requirements are becoming ever more common. It is hence necessary for the RTSs’ industry to provide effective, reliable and flexible systems, still making them available to the market as quickly as possible. This results in increasing software complexity, where we have more and more equipment with computers controlling and making decisions that affect the physical world and, consequently, that must use complex hardware elements. Providing guarantees that RTSs’ timing requirements are met through WCET upper bounds becomes increasingly difficult, considering such systems comprise industrial

(37)

equipment with millions of lines of code and even Linux-based embedded controllers.

State-of-the-art WCET analysis techniques are hampered by the ever-growing cost and complexity of obtaining accurate knowledge of the internal operation of advanced processors for static analysis, and by difficulties in reliably associating variable execution times with tasks’ worst-case behaviour through measurement-based techniques (WILHELM

et al., 2008).

1.1 WCRT E WCET

Any approach that wants to provide a guarantee about meeting deadlines needs to know the worst-case behavior of the system for both software and hardware. The deadlines should be met even in the most unfavorable scenarios for the application, i.e., assuming the case, including: the worst flow of control for each task, the worst-case synchronization scenario between tasks (e.g., mutual exclusion), the worst input data, the worst combination of external events (e.g., interruptions), the worst caches behavior, worst processor behavior (e.g., pipeline), i.e. a scenario composed of the worst possible combination of events.

The Worst-Case Response Time (WCRT) is obtained by combining the execution of various tasks, i.e., their own computation, interferences, blocks, and any delay that the task may suffer. The analysis of WCET considers each task individually, i.e., only its own computation, as the task was running without interruptions.

In this work the focus will be on obtaining WCET. Obtaining WCRT through WCET is consolidated in the literature of real-time systems. In general it means getting the WCRT from the WCETs by using utilization test, response time analysis or cyclic executive (FARINES;

da Silva Fraga; OLIVEIRA, 2000).

1.2 COMPLEX COMPUTER ARCHITECTURES

Providing guarantees that temporal requirements are met on complex computer architectures is becoming increasingly difficult. A software task may present different execution times when executed multiple times on a same hardware platform, since execution times are affected by a number of factors associated with the underlying

(38)

system. For instance, the latencies of the processor’s internal elements directly affect execution times, and the introduction of acceleration elements makes them variable and dependent on execution history. The cache memory yields different latencies for hits and misses, and may cause the longest execution path to run faster. The pipeline may introduce complex timing effects, e.g., due to: (1) instruction-level parallelism, (2) shared resources’ allocation, and (3) dynamic and out-of-order instruction scheduling. The branch prediction mechanism recognizes branch patterns for improving the pipeline behaviour and, similarly to caches, it involves an aspect of global history to decide which instructions’ information is stored. In addition, the prediction decision logic can also be complex (GRIFFIN; BURNS, 2010;PETTERS, 2002;KOSMIDIS et al., 2013;WILHELM et al., 2008).

Moreover, there are Operating System (OS) activities that may interrupt tasks’ execution, which include process scheduling, kernel events, OS services and system maintenance tasks. These sources of indirect interference may prove difficult or even impossible to determine and/or to control, and cause delays in tasks’ execution that induce noise in execution time measurements. Indirect interference includes changes in cache contents while performing other tasks, changes in the Translation Lookaside Buffer (TLB) of the Memory Management Unit (MMU) and in the history table of the branch predictor (WILHELM

et al., 2008).

The task execution time can be given according to function 1.1: Ctask= f (input data, processor state, OS ef f ect) (1.1)

As mentioned, in complex computer architectures the task execution time is subject to noise from the environment, where great (and often uncontrollable) variance comes from processor state and OS ef f ect. Timing analysis of tasks executed in such scenarios is therefore necessary, but made difficult due to timing effects that arise from both the hardware and the OS.

1.3 MOTIVATION

As previously mentioned, a software task may present different execution times when executed multiple times on a same hardware platform. The main sources of time variability are (1) the processor hardware used, and (2) the execution paths that are effectively measured. With respect to (1) the execution times are affected by the latencies of

(39)

the internal elements of the processor that perform the execution of the instructions. The introduction of acceleration elements — e.g., pipeline, instruction and data caches, and branch prediction — to improve performance, decreases execution times, but makes them variable and dependent on execution history (i.e., the execution history needs to be taken into account to predict the processor state at the beginning of the execution of a specific instruction). In relation to (2) finding the Worst-Case Execution Paths (WCEPs) and ensuring that these have been tested is a major challenge. What defines which path is executed are the input variables and the permanent variables changed in previous executions, and some states of permanent variables are reached only after many executions of the task. Therefore, several executions of a task can produce different times due to the characteristics of the hardware and the inputs used in the tests (ENGBLOM, 2002;SCHNEIDER, 2000;

ZHANG; BURNS; NICHOLSON, 1993;YAN; ZHANG, 2008).

Thorough testing of all execution paths with all possible combinations of input data is often impractical. The easiest way to generate input data is the random one, but it does not perform well in terms of coverage (WILHELM et al., 2008, 2009;LAW; BATE, 2016).

In the industry, the usual practice is to perform conventional testing and to add a margin of safety to deal with the uncertainty that the worst-case execution time case is covered by tests. On the other hand, hard real-time systems, where guaranties are required, are subject to having to use simple processors in which static analysis techniques can be applied.

The use of complex architectures associated to OSs such as Linux will potentially become fundamental for supporting the computational demand of RTSs’ applications, which are subject to temporal requirements. In this context, static temporal analysis of WCET is generally not feasible, but the estimation of timing properties must have as much uncertainty avoidance as possible. Therefore, flexible methods of timing analysis are necessary.

Since the execution of a large number of tests is generally infeasible in practice, EVT inference techniques can be considered a valuable tool for developers of systems, whenever EVT applicability conditions hold.

As previously mentioned, among the factors that influence execution times, some are affected by tasks’ input data (e.g. hardware effects and execution path) and others by OS-related effects. In complex computer architectures the task execution time is subject to noise from the environment, where great (and often uncontrollable) variance comes

(40)

from processor state and OS ef f ect. For this reason, it difficult to find the input data that maximizes Ctask. In this work we seek to (1) find

worst-case input data, and (2) use them in EVT to estimate pWCET.

1.4 OBJECTIVE OF THIS THESIS

The objective of this work is to investigate and propose methods that can be used in estimating the worst-case execution time of tasks in real-time systems by using measurements. Is this work we also address this problem when complex computer architectures and operating systems are used, since that in these systems static analysis is usually not possible.

The new proposed methods envision balancing efforts between reliability and cost in the application of techniques that improve systems’ safety and robustness. Analysis cost is reduced by minimizing the effort spent in testing, and reliability is increased by maximizing the confidence that the time requirements are met. In order to achieve such goal we apply several statistical methods, together with optimization algorithms in some cases.

Among the elements related to the measurement of execution times, we can mention:

Difficulties in reliably associating variable execution times with tasks’ worst-case behaviour through measurement-based techniques. This involves the processor hardware used, and the execution paths that are effectively measured.

The determination of input data is a critical issue for the timing analysis of a task through measurement-based methods. One possible approach for finding the worst-case input data of a task with respect to its execution time is by employing optimization algorithms. However, the large timing variability observed on complex computer architectures hinders the comparison of execution time measurements obtained using different input data. This implies in the need for measurement methods capable of disregarding the noise generated by the hardware platform in such comparisons, hence providing increased confidence that differences observed in execution times are in fact due to variations in input data.

Evidence that EVT applicability requirements are met, hence supporting the argument that the collected data can, in fact,

(41)

be analysed through the probabilistic models it employs. This requires execution times to be deemed independent and identically distributed, and the analysed tasks’ maximum observed execution times adhere to an extreme value distribution.

The architectures to be explored in this thesis consists of (1) a time-randomized processor, in which we consider conditions that MBPTA should yield remarkably reliable results, for the purpose of performing an empirical assessment of the use of the EVT models on the analysis of RTSs’ execution times, with the objective of detecting whether they can be considered suitable for determining pWCET estimates, in order to be deemed usable in the general case, and also (2) a complex processor, ensuring our evaluations are performed under a scenario in which such complex execution environments are typically employed in real computing systems. In relation to (2) the architecture imposes additional challenges, which will be explored in this thesis in order to contribute to the estimation of the WCET using measurements in this context.

1.5 ORIGINAL CONTRIBUTIONS

The contributions of this thesis to the state-of-the-art are: As previously mentioned, MBPTA requires the analysed tasks’

maximum observed execution times to adhere to an extreme value distribution and allows determining execution time values expected to be exceeded only with arbitrarily low probabilities. Several works suggest that the Generalized Pareto (GP) or the Generalized Extreme Value (GEV) models should be employed in such analysis, while others consider the Gumbel or the Exponential models more adequate for providing upper bounds with increased reliability. Contribution: an empirical assessment on the tightness of the WCET bounds determined through these models. We considered a time-randomized platform, in which MBPTA should yield remarkably reliable results to be considered usable in the general case.

MBPTA (through EVT) has been proposed to obtain pWCETs in relatively simple architectures, preferably time-randomized ones. However, there are many applications that must run on complex computer architectures using operating system such as Linux which are subject to firm real-time requirements. Contribution:

(42)

an empirical assessment of the adequacy of MBPTA to estimate the pWCET of a task executed on a complex computer with Linux. EVT would be a highly valuable tool for system developers, since it could enable pWCET estimates that cover cases whose witnessing is far beyond the feasible in practice through standard testing, e.g. due to limitations in the number of tests that can be performed. Real-time tasks executed on complex computer architectures suffer large indirect interference from other activities executing on the same system, hence generating noise in the observed execution times. In this context, it is difficult or impossible to determine the worst scenario for tasks’ measurement-based temporal analysis, i.e., the hardware state and execution path that generates the longest execution time. The timing variability induced both by hardware effects and by the execution paths depend, directly or indirectly, on the input data used during the collection of measurements. It is hence necessary to select the input data that induce high execution times in this context, to provide reliable WCET estimates in modern hardware architectures using MBPTA. In this context, we propose a method for the software tester to obtain information about the impact of the input data on the task execution times and hence the importance of identifying the worst-case input data — with respect to the execution time — to be used in the test of tasks.

In order to select input data for performing MBPTA on complex computer architectures, one possible approach for finding the worst-case input data of a task with respect to its execution time is by employing optimization algorithms, e.g., a genetic algorithm. However, the large timing variability observed on complex computer architectures hinders the comparison of execution time measurements obtained using different input data. In this context, we (1) propose a novel method to be used for comparing different input data with respect to the execution times they cause RTS’ tasks to produce in complex computer architectures, (2) implemented a genetic algorithm in which the fitness function is based on execution time measurements selected using both traditional and novel methods, and (3) estimated pWCET bounds for a task, using the worst input data obtained through the developed genetic algorithm, in order to show the importance of input data within MBPTA and reinforce the relevance of the proposed measurement method.

(43)

1.6 ORGANIZATION OF THE TEXT

This work is structured as follows. Chapter 2 gives the necessary background on variance of execution times. Chapter 3 summarizes the determination of WCET bounds through MBPTA. Chapter 4 describes the environment and conditions of the experiment. Chapter 5 assesses the EVT models with respect to the pWCET tightness. Chapter 6 contains an empirical study on the adequacy of MBPTA for tasks executed on a complex computer architecture with Linux. Chapter 7 has an empirical method for evaluating the sensitivity of real-time task execution time with respect to input data. Chapter 8 deals with the selection of input data for applying MBPTA on complex computer architectures. Finally, Chapter 9 presents our final remarks.

(44)

(45)

2 VARIANCE OF EXECUTION TIMES

Several aspects inherent to computer programs (software) added to the characteristics of modern processors (hardware) makes the execution time of a task vary.

In this Chapter we will examine constructive aspects of software and hardware that make task execution times vary, even if they are completely alone on the computer. More information can be found at

(OLIVEIRA, 2018;KOSMIDIS et al., 2013;GRIFFIN; BURNS, 2010).

2.1 VARIANCE CAUSED BY SOFTWARE

The variance in execution time caused by the software is related to the idea of control flow. The task control flow indicates where in the task code the execution passes. The usual way to represent the possible paths to the control flow is the Control Flow Graph (CFG). What defines which path is executed are the program’s input variables and the permanent variables changed in previous runs, and some states of permanent variables are reached only after many executions of the task.

The number of paths that exist may be tractable or not: Task without branch and without loop: There is only one path. Task with branch but without loop: It depends on the combination

of branches, but in general it is tractable explicitly.

Task without branch but with loop: It depends on the combination of the loops, but it is usually explicitly tractable.

Task with branch and with loop: Depends on the combination of loops and branches, and is usually intractable explicitly.

Consider a simple hypothetical example, shown in Figure 2. The execution options between START and END are:

Execute the right branch: – I.

Execute the branch on the left, which contains a loop with 5 branches:

(46)

– ADFH. – ADGH. – ACH. – ACBH. – ACBEH. Figure 2: CFG (OLIVEIRA, 2018)

Assuming the number of iterations of the loop is 10. We have: Executing the loop once: 51 _{possibilities.}

Executing the loop twice: 52 _{possibilities.}

Executing the loop ten times: 510 _{possibilities, totaling 9765625}

possible paths.

The total number of possibilities depends on how many times the loop is executed. We can observe that even for a simple example composed of branch and loop, exhaustive testing of all execution paths is usually impossible.

The number of possible paths to the control flow of a task will determine the variance of the execution time of the task in question, with respect to the variance caused by the software.

(47)

However, some paths can never be executed, because they are (1) impossible by the semantics of the program, for example due to consecutive contradictory conditions, (2) impossible by the semantics of the environment, for example due to inputs impossible to happen in the real operating environment.

Example: Path impossible by program semantics:

1: v o i d t a s k ( f l o a t x , y ) { 2: for ( int i = 1; i <= 1 0 0 ; i ++) { 3: if (x > 10) 4: x = x * 2; 5: e l s e 6: x = x + 2; 7: if (x < 0) 8: y = x; 9: } 1 0 : }

The path 2-3-4-7-8 is impossible, because if x > 10 then x ∗ 2 > 0. Even if the task has only one path, its execution time is not guaranteed to be constant. In this case, we can say that there is no variance in the execution time caused by the software. However, there may still be a lot of variance in the execution time caused by the hardware. In this case, even though exactly the same machine instructions are always executed by the task, at each execution the execution time may be different.

2.2 VARIANCE CAUSED BY HARDWARE

In modern computational architectures, there are increasingly complex hardware elements whose temporal behavior is not deterministic, which make it difficult or even impossible to determine the WCET.

The execution time varies according to multiple factors, among them the hardware elements used in the processor on which it will be executed, such as cache memories, branch prediction mechanisms, pipelines (OLIVEIRA, 2018;WILHELM et al., 2008; GONZALEZ; LATORRE;

MAGKLIS, 2010;SHEN; LIPASTI, 2013;ENGBLOM, 2002;CULLMANN et

al., 2010).

Caches are fast memories storing the most recently accessed and their adjacent instructions and data. When an information is present

(48)

in the cache the access time to it is significantly lower than if it needs to be loaded from main memory, thus reducing the average access time.

Pipeline is a processor implementation technique in which the execution of instructions is divided into several stages, so that a new instruction can enter a stage as soon as the previous instruction leaves it, thus increasing throughput of instructions. Modern processors employ pipelines with a large number of stages and often support the execution of out-of-order instructions for greater use of resources.

Branch Prediction Mechanisms aim to reduce the number of flushes in the pipeline, which consist of instruction sequence reloads due to branches in the execution flow, making the execution path more likely to be followed to be automatically loaded into the pipeline after executing branch instructions.

The execution times are affected by the latencies of the internal elements of the processor that perform the execution of instructions. Data input and output elements can also influence execution times, either directly or indirectly. The introduction of acceleration elements makes the times lower, but variable and dependent on the execution history. Therefore, multiple runs of a task can produce different times due to hardware characteristics.

In the context of WCET analysis this inherent variability induces problems of sample representativity, since (1) the worst combination of latencies does not occur frequently, and (2) determining conditions to force its occurrence is difficult.

There are many variable-latency elements:

Dependent on instructions and data previously accessed:

– Example: cache memories have different latency on hit/miss, and data accessed and their neighbors remain stored. – Null initial state (GRIFFIN; BURNS, 2010) is not necessarily

the worst-case.

– Determining the worst-case initial state is a difficult problem (WILHELM et al., 2008).

– Randomizing the initial state during measurements

(PETTERS, 2002) does not guarantee that the worst-case will

be effectively observed (it can be very rare). Dependent on the data on which they operate:

– Example: multiplication and division operations in Arithmetic Logic Units (ALUs) and Floating-Point Units (FPUs) can be completed faster for low values.

(49)

– Maximum latency can be achieved during measurements (KOSMIDIS

et al., 2016).

Dependent on the behavior of other elements:

– Example: Shared buses require exclusive access, so delays may be required for arbitration. Temporal isolation (KOTABA

et al., 2014) can eliminate dependence.

– Measurement under maximum interference: * Specific approaches required for each element. * Determining the worst-case condition can be difficult.

2.3 INPUT DATA

Both the hardware effects and the execution paths depend directly or indirectly on the input data used. In complex computer architectures a task may suffer great interference from other exisiting activities in the system (e.g., when the Linux operating system is used). This makes it even harder to determine if one input data generates longer or shorter execution times than other input data. There is direct interference when the operating system suspends the task in question to perform other higher priority tasks or interrupt handlers. This interference is relatively easy to measure and not included in the task execution time. On the other hand, there are several indirect interferences, such as changing the contents of the cache memory by other tasks. The same thing happens with the Translation Lookaside Buffer (TLB) of Memory Management Unit (MMU) and the table with the jump history used by the jump predictor. These changes can increase the execution time of a task by removing its information from the hardware, placing information about the other tasks. Because they are difficult to predict and detect changes, they can mask the effects of input data, i.e., input data generating the worst execution time is favored in a given measurement because there are, by chance, few changes due to other operating system activities.

2.4 PROBLEMS

The time for a particular execution depends on the path through the task taken by control and the time spent in the statements or instructions on this path on this hardware. Accordingly, the determination of execution-time bounds has to consider the potential

(50)

control flow paths and the execution times for this set of paths (WILHELM

et al., 2008).

2.4.1 Data-Dependent execution path

The task to be analysed attains its WCET on one (or sometimes several) of its possible execution paths. If the input and the initial state leading to the execution of this worst-case path were known, the problem would be easy to solve. The task would then be started in this initial state with this input, and the execution time would be measured. However, the worst-case input and initial state are not known, being hard (or even impossible) to determine.

2.4.2 Context dependence of execution times

Previous approaches to the timing-analysis problem assumed context independence of the timing behavior. The execution times for individual instructions were independent from the execution history and could be found in the manual of the processor. However, with modern processors, this information is no longer available due to a series of complications. The main complicators certainly are the caches and pipelines. The execution time of individual instructions may vary by several orders of magnitude, depending on the state of the processor in which they are executed. Currently, the behavior of the processor should be analysed as a specific subtask of the WCET analysis.

2.4.3 Timing Anomalies

The high complexity of current processors also affects the applicability of techniques that analyse processor behavior. Modern processors suffer from effects called timing anomalies. Timing anomalies are contraintuitive influences of the (local) execution time of one instruction on the (global) execution time of the whole task. In other words, such effects are called anomalies because they are counter intuitive, where a better local case may induce a worse overall case. Processor components that can cause anomalies are forward-looking speculation, and out-of-order execution mechanisms.

(51)

2.5 WCET ANALYSIS

There is a growing demand for processing capacity, including real-time systems. Increased processing power requires the use of modern and complex computer architectures. However, complex computer architectures are designed to reduce the Average-Case Execution Time (ACET). Such architectures generate pathological execution times in the worst-case, which has a very small probability of occurring, but a probability greater than zero (i.e., it is possible to occur).

To occur the worst-case execution time of a task requires the worst behavior for software and hardware to occur at the same time. This includes the worst flow of control (path) within the task code, the worst input data, the worst values for global variables, the worst behavior of the caches, the worst pipeline behavior, the worst behavior of the branch predictor.

In modern and complex computer architectures the static analysis of WCET is very difficult or impossible. The most commonly used method for estimating WCET is simply to perform the task (measurement). It measures the execution time of the task alone for a number of test cases, thus generating the maximum observed execution time, called the High Water Mark (HWM). However, it is highly unlikely that true WCET will be observed in tests, the estimate generated by measurement will always be optimistic. A common practice is to use HWM as an estimate of the WCET value, plus a margin of safety, but it is still not possible to guarantee that we will have a value equal to or greater than true WCET.

MBPTA is a recently proposed approach to deriving propabilistic estimates, based on the principle that future behavior tends to follow a pattern similar to what has already been observed, which can be extrapolated. It makes it possible to make even predictions for exceedance probabilities much smaller than would be possible by directly examining only the data obtained from the measurements. In this work the focus will be on obtaining WCET through MBPTA.

(52)

(53)

3 THE DETERMINATION OF WCET BOUNDS THROUGH MBPTA

The technique known as Measurement-Based Probabilistic Timing Analysis (MBPTA) promises producing WCET bounds for real-time systems’ tasks based on the analysis of execution time measurements through EVT. The role of EVT in MBPTA is to associate probabilities to the observation of values that largely exceed the effectively measured ones, by using models that describe the statistical behaviour expected for samples’ maxima (COLES, 2001).

However, the application of EVT for performing MBPTA requires execution times to be deemed independent and identically distributed (i.e., i.i.d.), the analysed tasks’ maximum observed execution times adhere to an extreme value distribution and allow determining execution time values expected to be exceeded only with arbitrarily low probabilities (i.e. pWCETs) (CUCU-GROSJEAN et al., 2012; MILUTINOVIC et al., 2017; CAZORLA et al., 2016; ABELLA et al., 2014).

The probability distributions employed by EVT for modelling analysed data (execution times, in the case of MBPTA) present a right tail with decreasing slope, i.e. the extreme execution times’ probabilities are expected to decrease as their values grow further from their mean. Witnessing such a behaviour in the frequency distribution of the maximum observed execution times of a task gives reasons to believe that pathological execution times, i.e. times that largely exceed the maximum observed ones, are associated to extremely or even negligibly low probabilities (CAZORLA et al., 2013a). The role of EVT in MBPTA is extending this intuition by using models that describe the statistical behaviour expected for samples’ maxima, in order to associate probabilities to the observation of values that largely exceed the effectively measured ones (COLES, 2001).

This chapter gives the necessary background on statistical hypothesis tests for evidencing measurements’ independence and identical distribution (Section 3.1), and EVT application (Section 3.2).

3.1 STATISTICAL TESTS

Statistical hypothesis testing is based on the evaluation of a hypothesis which is assumed to be true unless evidence is found to