Analysis of the truncated response model for fixed priority on HMPSoCs

(1)

Universidade de Aveiro Departamento deElectrónica, Telecomunica¸cões e Informática, 2014

Jesse Wayde

Brand˜

ao

An´

alise do Modelo de Resposta Truncado para

Prioridade Fixa em HMPSoCs

Analysis of the Truncated response model for Fixed

Priority on HMPSoCs

(2)

Universidade de Aveiro Departamento deElectrónica, Telecomunica¸cões e Informática, 2014

Jesse Wayde

Brand˜

ao

An´

alise do Modelo de Resposta Truncado para

Prioridade Fixa em HMPSoCs

Analysis of the Truncated response model for Fixed

Priority on HMPSoCs

Disserta¸cão apresentada à Universidade de Aveiro para cumprimento dos requisitos necessários à obten¸cão do grau de Mestre em Engenharia Electrónica e de Telecomunica¸cões, realizada sob a orienta¸cão cient´ıfica do Dr. Paulo Bacelar Reis Pedreiras, Professor Auxiliar do Departamento de Electrónica, Telecomunica¸cões e Informática da Universidade de Aveiro, e do Dr. Orlando Miguel Pires dos Reis Moreira, Principal DSP Systems Engineer na Ericsson (Eindhoven).

(3)

o j´uri / the jury

presidente / president Professor Doutor Jos´e Alberto Gouveia Fonseca

Professor Associado do Departamento de Electrónica e Telecomunica¸cões e In-formática da Universidade de Aveiro

vogais / examiners committee Professor Doutor M´ario Jorge Rodrigues de Sousa

Professor Auxiliar da Faculdade de Engenharia da Universidade do Porto

Doutor Orlando Miguel Pires dos Reis Moreira

(4)

agradecimentos / acknowledgements

I would like to express that I am sorry for my inability to mention everyone and the roles they played in making this project a reality, but I will attempt to include at least some of the people who, at some fundamental level, shaped the very course of my life.

First of all, I would like to express my gratitude to my supervisor Orlando Moreira at Ericsson, as well as Ph.D students Alok Lele and Hrishikesh Salunkhe, for the useful comments, remarks and engagement throughout the learning process of this masters thesis. I would also like to thank them for the often intellectually stimulating discussions during our lunch and coffee breaks.

I would like to show my deepest appreciation to the University of Aveiro for easing my burdens as a student and thereby allowing me to finish my degree. I am indebted to many of my fellow colleagues for providing me with an often exciting and fun environment in which I could learn and grow, not just as a student, but also as a human being.

I am especially grateful to S´ergio Vieira for being the best of friends since my very first year at university, and who has played a large part in who I am today.

I would like to express immense gratitude to my friends across the globe, specially my friend Tekin Bursal and his family for being such gracious hosts during my stay in Turkey and to my friend Syed Imran Farough for keeping me in good company during my time in Eindhoven.

I cannot thank my wife Asmaa enough for her unwavering support, humor, advice and patience.

Lastly, but by no means least, I take this opportunity to express the profound gratitude to my beloved parents, grandparents, siblings and in-laws for their love, generosity and continuous support.

(5)

Key Words Schedulers, Fixed Priority, Time Division Multiplexing, Data Flow, Real-Time, Multiprocessors, Burst, Embedded Systems, Temporal Analysis

Abstract With the ever more ubiquitous nature of embedded systems and their in-creasingly demanding applications, such as audio/video decoding and net-working, the popularity of MultiProcessor Systems-on-Chip (MPSoCs) con-tinues to increase. As such, their modern uses often involve the execution of multiple applications on the same system. Embedded systems often have applications that are faced with timing restrictions, some of which are dead-lines, throughput and latency. The resources available to the applications running on these systems are finite and, therefore, applications need to share the available resources while guaranteeing that their timing requirements are met.

These guarantees are established via schedulers which may employ some of the many techniques devised for the arbitration of resource usage among applications. The main technique considered in this dissertation is the Pre-emptive Fixed Priority (PFP) scheduling technique.

Also, there is a growing trend in the usage of the data flow computational model for analysis of applications on MultiProcessor System-on-Chips (MP-SoCs). Data flow graphs are functionally intuitive, and have interesting and useful analytical properties.

This dissertation intends to further previous work done in temporal analysis of PFP scheduling of Real-Time applications on MPSoCs by implement-ing the truncated response model for PFP schedulimplement-ing and analyzimplement-ing the its results. This response model promises tighter bounds for the worst case response times of the actors in a low priority data flow graph by considering the worst case response times over consecutive firings of an actor rather than just a single firing.

As a follow up to this work, we also introduce in this dissertation a burst analysis technique for actors in a data flow graph.

(6)

Resumo Com a natureza cada vez mais ub´ıqua de sistemas embutidos e as suas aplica¸cões cada vez mais exigentes, como a decodifica¸cão de áudio/video e rede, a popularidade de MultiProcessor Systems-on-Chip (MPSoCs) con-tinua a aumentar. Como tal, os seus usos modernos muitas vezes envolvem a execu¸cão de várias aplica¸cões no mesmo sistema. Sistemas embutidos, frequentemente correm aplica¸cões que são confrontadas com restri¸cões tem-porais, algumas das quais são prazos, taxa de transferência e latência. Os recursos dispon´ıveis para as aplica¸cões que estão a correr nestes sistemas são finitos e, portanto, as aplica¸cões necessitam de partilhar os recursos dispon´ıveis, garantindo simultaneamente que os seus requisitos temporais sejam satisfeitos.

Estas garantias so estabelecidas por meio escalonadores que podem empre-gar algumas das muitas t´ecnicas elaboradas para a arbitragem de uso de recursos entre as aplicaes. A t´ecnica principal considerada nesta dissertao ´

e Preemptive Fixed Priority (PFP).

Além disso existe uma tendência crescente no uso do model computacional data flow para a análise de aplica¸cões a correr em MPSoCs. Grafos data flow são funcionalmente intuitivos e possuem propriedades interessantes e ´

uteis.

Esta disserta¸cão pretende avan¸car trabalho prévio na área de escalonamento PFP de aplica¸cões ai implementar o model de resposta truncatedo para escalonamento PFP e analizar os seus resultados. Este modelo de resposta promete limites mais estritos para os tempos de resposta de pior caso para atores num grafo de baixa prioridade ao considerar os tempos de resposta de pior caso ao longo de várias execu¸cões consecutivas de um actor em vez de uma só.

Como seguimento a este trabalho, também introduzimos nesta disserta¸cão uma técnica para a análise de execu¸cões em rajada de atores num grafo data flow.

(7)

List of Figures

1.1 A Data flow graph . . . 4

1.2 A data flow model before response modeling . . . 5

1.3 The response model . . . 5

1.4 The response modeled graph . . . 6

2.1 Simple data flow graph . . . 9

2.2 A MRDF graph . . . 9

2.3 A MRDF graph and its CSDF equivalent . . . 10

2.4 The accumulation of tokens on the edge . . . 13

2.5 Static ordering of actors on a processor . . . 15

2.6 Time Wheel . . . 15

3.1 A flow chart representing each of Heracle’s flows . . . 19

3.2 Flowchart of the simulator . . . 20

4.1 Diagram of two applications run a processor with Preemptive FPS . . . 24

4.2 Timeline of the actors on the processor . . . 24

4.3 Response modeled graph GB . . . 25

4.4 timeline of the response modeled graph GB . . . 25

4.5 Graph with a low priority source than produces 2 tokens with each firing . . . 26

4.6 Timeline of the behavior BSA . . . 26

4.7 Timeline with actor B . . . 26

4.8 Service curve graph of B and BSA . . . 27

5.1 Generic SRDF representation of the truncated model . . . 29

5.2 Graph with a low priority source than produces 2 tokens with each firing . . . 29

5.3 Timelines of the B with its original execution and twice execution time . . . 30

5.4 Service curve graph of B and BT runcated . . . 30

5.5 Generic CSDF representation of the truncated model . . . 31

5.6 Simple graph with four actors mapped on two processors . . . 32

5.7 Graph with reduced actors . . . 32

5.8 Usage of splitter and collector actors for equivalency . . . 32

5.9 Implemented form of the truncated model . . . 33

5.10 Example graph . . . 33

5.11 Response modeled example graph . . . 34

6.1 Flow chart of the implementation . . . 36

6.2 Slot filling process . . . 37

6.3 Modified slot filling process . . . 38

6.4 Worst Case Response Time Acquisition Algorithm for PFP . . . 39

6.5 Worst Case Response Time Acquisition Algorithm for TDM . . . 40

7.1 Diagram of Experiment 1 . . . 43

(11)

7.3 Diagram of Experiment 2 . . . 44 7.4 Results of Experiment 2 . . . 44 7.5 Diagram of Experiment 3 . . . 45 7.6 Results of Experiment 3 . . . 45 7.7 Diagram of Experiment 4 . . . 46 7.8 Results of Experiment 4 . . . 46 7.9 Diagram of Experiment 5 . . . 46 7.10 Results of Experiment 5 . . . 47 7.11 SRDF Model of WLAN . . . 47 7.12 SRDF Model of TDSCDMA . . . 47

7.13 Results of the WLAN - TDSCDMA case study . . . 48

7.14 Depth vs Analysis time for WLAN - TDSCDMA . . . 48

7.15 Results of the WLAN - WLAN case study . . . 49

8.1 Single cycled graph with 3 tokens . . . 51

8.2 Gantt chart of the single cycled graph . . . 51

8.3 A simple graph . . . 52

8.4 Transformation of graph of the single cycled graph . . . 53

8.5 A two cycled graph . . . 53

8.6 The cycle expansion of the two cycled graph according to actor A . . . 53

(12)

Acronyms

ABS Anti-lock Braking System. CSDF Cyclo-Static Data Flow. CTS Compile-Time Scheduling. DPS Dynamic Positioning System.

FADEC Full Authority Digital Engine Controller. FIFO First-In First-Out.

HMPSoC Heterogeneous MultiProcessor System-on-Chip. HP High Priority.

IP Internet Protocal. LP Low Priority.

MCM Maximum Cycle Mean. MoC Model of Computation.

MPSoC MultiProcessor System-on-Chip. MRDF Multi-Rate Data Flow.

NPNBRR Non-Preemptive Non-Blocking Round Robin. OCaml Objective Categorical Abstract Machine Language. PFP Preemptive Fixed Priority.

ROSPS Rate-Optimal Static Periodic Schedule. RTS Run-Time Scheduling.

SA Single Actor.

SDF Synchronous Data Flow. SPS Static Periodic Schedule. SRDF Single-Rate Data Flow. STS Self-Timed Schedule.

(13)

TDM Time Division Multiplexing.

TDSCDMA Time Division Synchronous Code Division Multiple Access. VoIP Voice over IP.

WCET Worst Case Execution Time. WCRT Worst Case Response Time. WCSTS Worst Case Self-Timed Schedule. WLAN Wireless Local Area Network.

(14)

(15)

Chapter 1

Introduction

”What does the future look like? Well, it’s a network full of services.” These were the words David D. Clark left us with in 1999, in his talk ”The Post PC Internet.”[1] He described to us a world where ”everything” is connected to the internet and where storage and computing has shifted to a service model. In such a world, previously everyday offline objects could now, with their new found connectivity, inform us of their state or react accordingly in relation to our states. It is not difficult to imagine devices that are connected to us on a daily basis, such as our mobile devices or the up and coming wearable devices sharing information with objects in our surroundings or miles away across the Internet. Situations may include adjusting the ambiance of one’s home, a store preparing our purchases in advance, informing our relatives of our current state and a myriad of other potential interactions. Many such devices may in fact be connected to service providers, storing, manipulating, computing and analyzing the data produced by our devices, then sharing with us the results, implications and, in the case of problems, potential solutions. As is the case with many a technological advancement, this once hypothetical future is now an emergent reality which is being carried under the banners of ”The Internet of Things”[2] and ”Cloud Computing.”[3]

Embedded systems play an ever increasing role[4] in this Post PC era. These are the systems that truly put ”things” in the ”Internet of Things.” According to IDC’s statistical forecast[5] the biggest growth in target-rich (i.e. high value) data will be from embedded systems. By 2020 it will make up 21% of the digital universe’s target-rich data, up from the 8% in 2014.

Embedded systems are microprocessor-based systems typically designed for a single or a specific range of applications[6]. Connected they may sense, communicate or respond to orders. As devices so entrenched in the real world, often they face real world timing restrictions to effectively fulfill their purposes. Increasingly resource demanding modern applications run on these systems, such as net-working or multimedia streaming applications, which often may even need to run in parallel, increase the work load such that they require multiple processors in order to achieve the necessary computa-tional performance, cost effectiveness and power efficiency, among other potential requirements[7]. An important branch of this multiprocessor technology is the group of multiprocessors known as MPSoC. In the taxonomy of multiprocessors, MPSoCs introduce at least 2 branches: the homogeneous model introduced by the Lucent Daytona [8] in 2000, and the heterogeneous model introduced by C-5 net-work processor [9] and Philips Viper Nexperia [10] in 2001. The remainder of this dissertation will focus on the multiprocessors belonging to the heterogenous model, i.e. Heterogeneous MultiProces-sor Systems-on-Chip HMPSoCs. These are comprised of multiple processing units (general-purpose, vector processing, application specific, etc) laid out and connected on a single chip[11].

Vincent G. Cerf, in order to express the expectation of users, said ”Users will not tolerate less than instant availability, nearly 100% reliability, and the minimum possible delay in accessing supported services”[12]. For service providers, this statement provides the necessary framing to understand the typically stringent requirements these systems must successfully handle even in contexts which are not safety-critical. As such, it is necessary to manage the resources and scheduling of the applications running on the system in an efficient and robust fashion.

A piece of software which is part and parcel of an embedded system, in particular a system which is going to run multiple real-time streaming applications, is the scheduler. The scheduler is the piece

(16)

of software that manages how resources will be distributed among the different applications set to run on the Heterogeneous MultiProcessor System-on-Chip (HMPSoC). For this dissertation in particular, our main focus will be the temporal analysis of a PFP scheduler. In Chapter 2, we will provide more information on schedulers and temporal analysis.

Most real-time streaming applications are built upon signal processing functions[13] and in such a function, a sequence of quantized data items (digital signal) is typically received by some form of external source, then manipulated in a series of steps that involves the processing and transferring of data until it is eventually discarded, producing an output that is also a sequence of quantized data items. Real-time streaming applications may also exhibit concurrency. According to [14] the data flow model is both intuitive and provides the necessary analytical tools by which one can analyze such applications.

The aim of this dissertation is to build upon previous work on PFP scheduling for hard real-time applications on a HMPSoC. As a result, the major contribution of the dissertation is the implemen-tation and analysis of the truncated response model for fixed priority scheduling. Furthermore, as a follow up to this work, we also introduce a burst analysis technique for actors in a data flow graph.

The remainder of this chapter will focus on equipping the reader with the necessary conceptual tools to understand the problem this dissertation proposes to solve, as well as introduce the organization of the rest of the thesis.

1.1 Streaming Applications

A streaming application, as per [15], is an application that operates over a data stream, i.e. an extended (potentially infinite) sequence of input data items. As alluded to in the previous section, this data is received via an external source then processed with finite duration prior to being discarded. Like the input, the output is also a data stream.

In order to gain an intuitive feel for streaming applications, let us take a look at someone playing a music rhythm game. In such a game, a series of symbols are presented to the player one after another. Let us then consider this series of symbols as being analogous to an external source serving as input of an HMPSoC, and the player as being analogous to the HMPSoC itself, which will be processing said input. The player needs to identify the each symbol presented, within a set time frame, and then carry out a specific action or set of actions. Custom dictates this to usually be the pressing of some predetermined combination of buttons. A player failing to execute the tasks within the expected time frame can lead to a penalty in the player’s overall score, or worse, lead to immediate loss.

Currently, there exists a plethora of different uses for streaming applications. Among these are diverse elements, such as audio compression, sound reinforcement, digital communications, economic forecasting, seismic data processing, medical imaging, sonar, etc.

According to [11] There are three characteristics that are typically present in these sort of appli-cations:

• High computational intensity: The application has a large number of arithmetic operations to execute per I/O.

• Data Parallelism: Data items’ temporal behavior is not dependent on the outcome of the processing of a prior element, as such, they can be processed simultaneously.

• Data locality: After the data is produced and read, it is never used again.

1.2 Real-Time Applications

Recall the analogy given for a streaming application in the previous section. This analogy is also an example of a real-time application because it requires the player to carry out the task within a limited time frame, i.e. the execution of the task has timing requirements which need to be met. Also, remember that the penalty for failure varied in severity depending on the game. Within an industrial scenario, there are, at least, two accepted classifications for real-time applications are the following:

(17)

• Hard Real-Time: Under no circumstance is it permissible to violate the timing constraints. To do so would imply either completely unacceptable results, or even system failure. Examples: – Full Authority Digital Engine Controller (FADEC) which controls the activities of an

air-craft jet engine

– Dynamic Positioning System (DPS) found in a marine vessel or offshore drilling platform, it provides the capability to automatically maintain a position or heading using propellers, rudders, and thrusters.

– Anti-lock Braking System (ABS) found in automobiles, it allows the wheels on a motor vehicle to maintain tractive contact with the road surface during the process of braking, preventing the wheels from locking up and skidding uncontrollably.

• Soft Real-Time: It is tolerable for the necessary timing constraints to be violated within a reasonable degree. The implication of such is typically a decrease in quality, accuracy or responsiveness. Examples:

– Voice over IP (VoIP) is a methodology and group of technologies for communications and multimedia sessions over the Internet.

– Television and radio

It should be noted that classifications of real-time applications vary within the literature, like for example in [16] and [17] where applications are referred to as hard only if failure to satisfy the temporal requirements leads to catastrophic results. They are referred to as firm if the results are only useless. Finally they are referred to as soft if the results are still useful, but suffer some degradation in the behavior of the system.

Timing Constraints

During the course of this introduction, the term timing constraint has been used multiple times and in this section we intend to clarify its nature. Within the data flow paradigm, timing constraints are of two kinds:

• Throughput: The rate at which the application should produce results

• Latency: The time difference between the moment the output data is produced and the moment the input data has arrived

An application may have one or both of these constraints. An example of an application with a throughput constraint is internet radio, because it needs to provide the audio data at a consistent rate. Whereas an application with a latency constraint is online gaming. In order for a player to be successful it is necessary that there exist minimal delay between the player’s action and the reaction of the server.

1.3 Fixed Priority Scheduling

Let us begin by defining an application, in this context, as a set of tasks which need to execute over a sequence of values, in order to produce some desired outcome. Then let us suppose we have an application A and an application B whose tasks we want to schedule on a processor p. In PFP scheduling, this would be carried out by attributing to A and B fixed priorities, i.e. priorities that do not change over time, Pa and Pb, respectively, such that according to some predefined criteria

Pa > Pb. Whenever the scheduler is faced with a decision on which task of application A or B it

should execute on p, it will always select the task of A, if it is available. This sort of scheduling technique is particularly useful if each application has a different level of strictness with respect to its timing requirements.

(18)

Figure 1.1: A Data flow graph Preemption and Non-Preemption

In a preemptive fixed priority scheduling scheme, the tasks of a higher priority application are not only favored at the moment the scheduler decides which application is going to run on a given processor, but it is also possible for the lower priority application to be preempted mid-execution. If we continue within the context introduced in the beginning of the section, then this is to say that if a task belonging to B is already executing on p, then the scheduler will preempt this task, temporarily interrupting its execution until once again, a task of A no longer requires p. In a non-preemptive scheme, a task belonging to a higher priority application would have to wait until the execution of the lower priority’s task comes to an end. Non-preemption, although potentially improving the behavior lower priority applications, increases the difficulty in predicting the behavior of higher priority applications due to the fact that tasks of lower priority applications can also interfere with the execution of tasks belonging to higher priority applications.

1.4 Data flow Graphs

Data flow concepts play a central role in this thesis and, as such, we dedicate chapter 2 to the introduction and explanation of each of the essential concepts. However, we provide here a brief introduction to the matter.

Data flow is a computational model which is said to be distributed since there is no single locus of control and it is considered asynchronous, more precisely, since computation progresses due to the availability of data, it is said to be data-driven.

A Data flow graph, as presented in figure 1.1 is a directed graph constituted of nodes which are referred to as actors and arcs. The actors represent functions, or computations, that act on data. The edges represent paths that can contain data, more precisely, First-In-First-Out (FIFO) queues. This data is referred to as tokens that are placed on the edges. Actors only perform their computations when the necessary data is found on its incoming edges. When this condition is met, the tokens on its incoming edges are consumed and the execution of the computation is called a firing of an actor. At the inputs of the actors, one will find the number of tokens the actor needs to fire, and when it does, it consumes those tokens. At the outputs one will find the number of tokens that are produced once the actor is finished processing.

1.4.1 Response modeling

A simple data flow graph typically only shows the division of an application in to tasks, such as the one presented in figure 1.1 where the application is constituted of two tasks represented by actors A and B. Starting from the simple data flow graph, real-time analysis is made possible by graph transformation wherein actors are replaced by sub-graphs that contain timing information as regards each task’s execution in a scheduler [18]. The sub-graphs, in order to account for more nuanced behaviors may become increasingly complex, and therefore the decision between more or lesser detailed models is one where the trade-off between simplicity and more accurate timing analysis results needs to be taken in to consideration. The benefit one gains with improved accuracy is a decrease in the estimation of the worst-case response time of a given task and the ability to reduce the over-allocation of resources, thus permitting the system to potentially run additional applications it would otherwise be incapable of.

We offer a simple example to illustrate how response modeling may help improve accuracy of a model. Let us begin by assuming we have an application we want to run on a HMPSoC and that

(19)

Figure 1.2: A data flow model before response modeling

Figure 1.3: The response model

this application is initially modeled by a simple data flow graph as presented in 1.2. In this graph, each actor takes the longest time it takes for the task, each actor is representing, to process a token. However, this hypothetical task takes half the time to process every second token, for some particular reason. Since this behavior is not captured by the original graph, and could be used to free up resources, we can replace the actors in figure 1.2, by a new model that does. This model is represented in figure 1.3. Briefly, this model describes the behavior previously mentioned in the following way: Actors A1 and A4 exist merely to split the edge at the input and connect the edges at the output,

respectively, and as such process tokens instantaneously. When a token is received by A1 it will

produce a token on each of its output edges. Firstly A2 which processes tokens in an interval of time

equal to the original time will the consume the two tokens and produce two tokens on each of its outputs. A4having at least one token on each edge fires also. It should be noted now that the original

distribution of tokens has shifted. Whereas there used to be a token between A1 and A2, there is now

only a token between A1 and A3. Also, whereas there used to be a token between A3 and A4, there

is now a token between A2 and A4. Now, when the second token arrives, A3 will have two tokens

and its input and thus fire, processing for an interval of time equal to half that of A2. Finally it also

produces two actors at its output leading A4 to fire again and bringing us to the initial state of the

model. Although now, the over-estimation in relation to the time it would take each actor to process its second token has reduce. As can be seen in figure 1.4, the complexity of the graph has increased.

1.5 Current Temporal Analysis Techniques

1.6 Problem Statement

The current response model utilized for a HMPSoC running multiple hard real-time streaming applications with a preemptive fixed priority scheduling scheme consists of analyzing the extent of the interference generated by the higher priority applications upon a lower priority application. The final form of the worst-case response time for an actor is merely the original execution of the actor with the addition of the worst-case load imposed by the higher priority graphs[11] [19]. It is known that in

(20)

Figure 1.4: The response modeled graph

a burst (When the conditions of the next firing are ready before the end of the current one) of firings, the maximum response time is bounded by the first firing in the burst. As such, the current response model is more conservative than it needs to be.

With an understanding of the currently used response model and of the concepts introduced so far, our problem statement can thus be formulated in the following manner:

Let us assume a set of n hard real-time streaming applications, each of which is represented by a data flow graph that we want to run on a HMPSoC with a set of processors π. Let us also assume that we are using a preemptive fixed priority scheduling scheme utilizing one of the thus far introduced algorithms, in which each application has associated with it a distinct and permanent priority from 1 through n. How then, in this scenario, can one improve the response model for preemptive fixed priority scheduling such that we can obtain more accurate worst-case response times without loss of conservativity?

1.7 Contributions

For the duration of this dissertation, the work done yielded a set of contributions that can be summarized as follows:

• Extention of the analysis and simulation tool: We implemented within Heracles a cyclo-static truncated response model for PFP and Time Division Multiplexing (TDM) which takes in to account bursty behavior in a system.

• Analysis: We conducted an analysis of the behavior of this model with both the PFP and TDM schedulers.

• Introduction of a burst analysis technique: We introduced a technique for analyzing the maximum bursts of actors in a strongly connected data flow graph.

1.8 Thesis Organization

This thesis is organized according to the following scheme: in chapter 2 we review the concepts introduced by data flow and their properties. We also introduce the mathematical notation, utilized throughout the analysis of the truncated model for PFP, and methods of conducting temporal analysis of data flow graphs as well how we can derive schedules for these graphs. In chapter 3 we give an

(21)

overview of Heracles the simulation and analysis tool and briefly explains its flow and usage. In chapter 4 we formalize our problem statement. In chapter 5 we present and discuss the truncated model and go over its functional details. In chapter 6 we discuss the implementation of the truncated model within the simulation and analysis tool. In chapter 7 we go over the experiments and results we obtained with out implementation of the truncated model. In chapter 8 we propose a burst analysis technique which provides the maximum burst for any actor in a strongly connected graph. In chapter 9 we make our concluding remarks and discuss of further work.

(22)

Chapter 2

Data flow Computation Models

A model of computation (MoC), according to [20] defines how computation takes place in a struc-ture of concurrent processes, thus providing semantics to said strucstruc-ture. Such semantics can then be utilized in order to formulate an abstract machine capable of executing a model. As stated in the introduction, throughout this dissertation, data flow is used as the base MoC. In particular, among the varied styles (SRDF, MRDF, MCDF, SADF, etc) of data flow in existence, the variants of inter-est are those that are insensitive to the values in the data streams. This is because these variants exhibit analytical properties which are of significant import. The focus of this chapter is on the math-ematical notation and some of the more significant analytical properties pertaining to the relevant data flow models. The material exposed in this chapter can be found in greater detail can be in [13, 11, 14, 21, 22, 23] among others in the literature.

2.1 Graphs

A graph G is an ordered pair G = (V, E) where V is the set of vertices or nodes and E is the set of links or edges [24]. If e ∈ E is an ordered pair e = (i, j), i, j ∈ V then it is said that edge e is directed from i to j. i is said to be the source node and j as the sink node denoted as src(e) and snk(e), respectively. Also, if ∀e ∈ E ∃i, j ∈ V : e = (i, j) where (i, j) is an ordered pair, then G is said to be a directed graph, or digragh. For the purposes of this dissertation only directed graphs will be considered. As such, the term graph will be used as a shorthand for directed graph. The same reasoning should be applied for the term edge.

2.1.1 Paths and Cycles in a graph

We define a path within a graph as a finite, non-empty sequence of edges (e1, e2, ..., en) such that

snk(ei) = src(ei+1), for i = 1, 2, ..., n − 1. A path is said to be directed from src(e1) to snk(en); It

can also be said that a path traverses src(e1), src(e2), ..., src(en), snk(en). If the nodes traversed by

a path appear only once, then the path is said to be simple. A simple path where src(e1) = snk(en)

is termed a cycle. Also, if for a path the statement ∃ek, ek+m : src(ek) = snk(ek+m), m ≥ 0 is true,

then the path is a circuit.

2.1.2 External sources in a graph

As stated in the introductory chapter, streaming applications operate over a data stream whose source is typically an external one. Like in [13], we shall represent these external sources in our data flow graphs as single actors with a self edge.

2.2 Data Flow Graphs

Data flow graphs are graphs where nodes and edges are referred to as actors and arcs, respec-tively. Actors represent time consuming entities whereas arcs represent First-In-First-Out (FIFO)

(23)

Figure 2.1: Simple data flow graph

Figure 2.2: A MRDF graph

queues that direct data from the output of one actor to the input of another. This data is transported as discrete segments termed tokens and represented in data flow graphs as black dots. Figure 2.1 represents a simple data flow graph. For the remainder of this dissertation, unless explicitly stated otherwise, omission of production and consumption values indicates value 1. When an actor is acti-vated, it is said that the actor has fired. The condition which brings about the activation of an actor is called the firing rule [25]. The availability of the necessary tokens at the input shall be considered the firing rule for every actor in every graph. It is also assumed that the necessary tokens required for an actor to fire are known at compile time. With the firing of an actor, the necessary tokens at the input are consumed at the instance the actor starts firing, whereas tokens produced at the instance the actor ceases firing. Also, any given firing of an actor has associated with it a value k ∈ N0 which

identifies the current iteration of the actor. For any given firing of an actor, its iteration is the number of times the actor has fired since the beginning of the execution of the graph until said firing. And lastly, a graph is said to be strongly connected if for any actor in the graph, there exists a path to any other actor.

2.2.1 Multi-Rate Data Flow Graphs

Multi-Rate Data Flow (MRDF) graphs, otherwise known as Synchronous Data Flow (SDF) graphs, are defined by a tuple G(V, E, t, d, prod, cons) where V and E are, as alluded to previously, the set of actors and the set of arcs, respectively. Associated with every actor i ∈ V is the valuation t : V → N0; t(i) that maps each actor to an execution time. Also, associated with every arc e ∈ E

in the graph are valuations d, prod and cons where the d : E → N0; d(e) map an arc to initial

distribution of tokens; prod : E → N0; prod(e) maps e the number of tokens produced by src(e), and

cons : E → N0; cons(e) maps e the number of tokens consumed by snk(e). The execution time of each

actor, as well as the token production and consumption values are constant and known at compile time.

As the streaming applications which run on an HMPSoC need to process potentially infinitely long sequences of data with finite resources, it is necessary to be able to schedule these applications peri-odically with a finite bound on memory utilization. If this is possible for a given application’s MRDF graph, then we say that this MRDF graph is correctly constructed. Only correctly constructed graphs will be considered in this dissertation.

Assuming then a correctly constructed graph, we introduce another important concept. The concept of an iteration of graph. An iteration of an MRDF graph is complete once all actors have fired the number of times necessary for the graph’s token distribution to return to its initial state. Since the number of firings may be different for every actor, a single value is insufficient to define an MRDF graph’s iteration. Therefore, a graph iteration is usually expressed resorting to a repetition vector symbolized by r. The repetition vector r of a graph is a column vector with length equal to |V | and where if every actor is named from 1 to |V |, then each entry of the vector represents the number of times an actor needs to fire for the graph to return to its initial token distribution.

(24)

Figure 2.3: A MRDF graph and its CSDF equivalent

2.2.2 Single-Rate Data Flow Graphs

Single-Rate Data Flow (SRDF) graphs are a special case of MRDF graphs where it holds that ∀e ∈ E : prod(e) = cons(e). It is possible for any MRDF graph to be converted in to a SRDF graph [13]. This is of particular import because SRDF graphs have very useful analytical properties. One such property is that a SRDF graph is deadlock-free if and only if in the initial distribution of tokens there is at least one token in every cycle. In a SRDF graph, one can define a cycle mean as:

µ(c) = P i∈N (c)t(i) P e∈E(c)d(e) (2.1)

where N (c) is the set of all nodes in the cycle c and where E(c) is the set of all edges in the cycle c. For a given graph G it is possible to establish its maximum cycle mean (MCM) simply as:

µmax(G) = max

c∈C µ(c) (2.2)

From [13] we know that the MCM of a given SRDF graph is directly related to the graph’s maximum throughput expressed as _µ 1

max(G).

2.2.3 Cyclo-Static Data Flow Graphs

In 1996, Greet Bilsen et al. introduced Cyclo-Static Data Flow (CSDF) graphs [26]. Much like MRDF graphs, CSDF graphs can be represented by a tuple G = (V, E, d, t, prod, cons). However, in these graphs, execution times, productions and consumptions of tokens may differ with every iteration of an actor. As such, the valuations t, prod and cons then become t : V × N0 → N0,

prod : E × N0 → N0 and cons : E × N0 → N0, respectively. Since these variations are cyclical

with unchanging order, formally that is to say, ∀u, i, j ∈ V ∃m ∈ N∀l ∈ N : t(i, k) = t(i, k + l ∗ m) ∧ prod(i, j), k) = prod((i, j), k + l ∗ m) ∧ cons(u, i), k) = cons((u, i), k + l ∗ m) where m is the smallest satisfactory value, then it is possible to describe these execution times, production rates and consumption rates of an actor with a sequence of length m, wherein each element of the sequence is referred to as a phase of the actor. Also, it is known [27] that is possible convert a cyclo-static graph in to an MRDF graph and therefore, from the previous subsection, also in to an SRDF graph. We show in fig 2.3 a MRDF graph and its cyclo-static equivalent. Since actors B and C always execute in the same order, they are substituted by a single cyclo-static actor with two phases. As the first actor to fire is B followed by C, then the cyclo-static actor BC will first fire with an execution time equal to the execution time of B, then with an execution time equal to that of C. Since there is a token for each actor, so should there be a token for each phase.

(25)

2.3 Schedules

We utilize data flow graphs to model applications running on a HMPSoC. As such, actors of data flow graphs are mapped on to processors. It is assumed that every processor only processes one actor at any time. For a correctly constructed and deadlock-free data flow graph there is a schedule that defines the time at which each iteration of every actor is to start its execution. In this section we introduce the scheduling notation which will provide us the mathematical tools necessary to also discuss the type of schedules that we will be analyzing within the context of this dissertation, i.e. self-timed schedules. A schedule represents the time at which each iteration of every actor starts its execution, and since it is assumed that only one actor is processed at a time on a given processor, then it is necessary that the start time of another actor of the same processor start, at the earliest, immediately afterwards. In order for this to be guaranteed, the execution time of an actor must be expressed conservatively with respect to the real execution time of the task it is modeling. As such, we will also be introducing in this section self-timed schedules, static periodic schedules and how the later is related to the former in order to bound the start times of firings on a self-timed execution. Further information can be found in [13]. For the sake of simplicity, only SRDF graphs will be mentioned for the rest of this section.

2.3.1 Schedule notation

We start by introducing the function s : V × N0→ N0; s(i, k) where i is actor of the graph, k is the

actor’s iteration and time is represented as a natural positive number. This function defines the start time of iteration k of actor i. Moreover, we define another function f : V × N0 → N0; f (i, k) which

represents the finishing time of iteration k of actor i. If t(i) is the worst case execution time (WCET) of an actor, it should hold that ∀k ∈ N0 : t(i) ≥ t(i, k). The finishing time function f can also be

seen as f (i, k) = s(i, k) + t(i, k). Lastly, an SRDF schedule is considered admissible if the following inequality is satisfied:

s(i, k) = s(x, k − d(x, i)) + t(x), (2.3)

where d(x, i) is the initial number of tokens. The reason behind the fact that i’s iteration is subtracted by the initial distribution of tokens on the edge (i, j) can be easily understood if one realizes that since it refers to a SRDF graph where only one token is consumed with each firing and j already has a d(i, j) tokens available at the input, then j’s dependency on i is delayed by d(i, j) iterations. A more general equation for admissibility of a schedule can be found in [13, 28].

2.3.2 Self-Timed Schedules

A Self-Timed Schedule (STS) of a SRDF graph is a schedule where every actor fires as soon as the necessary tokens are available on every input edge. If every iteration of every actor executes with t(i) then the schedule is said to be the Worst Case Self-Timed Schedule (WCSTS). In a WCSTS, every firing k of an actor i is given by:

s(i, k) = max

(x,i)∈E

s(x, k − d(x, i)) + t(x, k − d(x, i)) if k ≥ 0

0 if k < 0 (2.4)

where t(x, k − d(x, i)) denotes the worst case case execution time of actor x. This equation merely states that i will fire as soon as the latest of its dependent actors finishes its execution. If a graph is strongly connected, then it will always enter the periodic regime after a fixed number of iterations. Since this number of iterations is dependent only on the graph, then a single simulation of the graph is sufficient to discover the length of the transition period of the graph.

(26)

2.3.3 Static Periodic Schedules

A Static Periodic Schedule (SPS) of a SRDF graph is a schedule where every actor of the graph fires periodically. This can be expressed as:

∀i ∈ V, ∃T ∈ N0 : s(i, k) = s(i, 0) + T · k, k > 0 (2.5)

A SPS schedule can be uniquely represented by its initial start time s(i, 0) and period T . Next we present a theorem that is essential to understand how the start times of a graph’s actors with self-timed execution can be bounded. The proof of the theorem can be found in [13].

Theorem 2.1 For a SRDF graph G = (V,E,t,d), it is possible to find a SPS schedule, if and only if T ≥ µmax(G). If T < µmax(G), then no SPS schedule exists with period T.

Therefore, in a strongly connected SRDF graph, the highest throughput achievable by any actor is

1

µmax(G). If T = µmax(G) we say the schedule is a Rate-Optimal Static Period Schedule (ROSPS).

2.4 Temporal Analysis

To verify whether or not a given data flow graph utilized to model an application meets its through-put and latency requirements, various temporal analysis techniques have been developed. It is as-sumed, as specified earlier, that the graphs are self-timed. As such, self-timed graphs can be divided in to 2 regimes: the transient and the periodic regime. It has been shown in [29] that a data flow graph will reach the periodic regime so long as its execution times are constant [30], however the number of firings until this regime is reached is not efficiently computable [31]. We will be presenting some techniques in this section which will allow us to reason about the temporal behavior of the types of data flow graphs introduced in the previous section. The techniques are applied directly on SRDF graphs, but as previously mentioned, it is possible to transform both the MRDF and CSDF graphs in to Single-Rate Data Flow (SRDF) equivalents. So far it has been established that the WCSTS will eventually settle into a periodic behavior with an average throughput of _µ 1

max(G), but it is also

necessary to introduce another two theorems on the monotonicity of a self-timed execution:

Theorem 2.2 Monotonicity of a self-timed execution: For a SRDF graph G = (V, E, d, t) with worst-case self-timed schedule sW CST S, for any i ∈ V , and k ≥ 0, it holds that , for any self-timed

schedule sST S of G:

sST S(i, k) ≤ sW CST S(i, k) (2.6)

Theorem 2.3 In any admissible SPS of a SRDF graph G = (V,E,t,d), all start times can only be later or at the same time than in the WCSTS of that graph, that is, for all i ∈ V, k ≥ 0, and all admissible static periodic schedules sSP S of G, it must hold that:

sW CST S(i, k) ≥ sSP S(i, k). (2.7)

Theorem 2.2 allows one to state that if any actor finishes its execution earlier than in the WCSTS, then subsequent firing can never occur later than in the WCSTS. And with theorem 2.3 we can draw the important conclusion that any SPS start time can be used as an upper bound to any start time of the same firing of the same actor in the WCSTS. The proofs for these theorems can be found in [13].

2.4.1 Throughput Analysis

In the introduction we stated that the two metrics taken in consideration for the purposes of this dissertation were throughput and latency. Throughput is defined as the rate at which a graph completes an iteration over an elapsed period of time. Typically, there are two forms by which one ascertains the value of the throughput for a given graph. The first method is, as alluded to previously,

(27)

Figure 2.4: The accumulation of tokens on the edge

calculating the MCM. The second method involves the simulation of the graph to the determine the throughput of strongly connected graph. The graph is simulated until it reaches a periodic phase. When it reaches this phase, one can then analyze the throughput behavior of the graph. The iteration with the longest duration is taken as the value for the maximum throughput.

2.4.2 Latency Analysis

Latency, as the other important metric to be considered, is defined as the interval between two start times of two specific actors, i.e.:

L(i, k, j, p) = s(j, p) − s(i, k), (2.8)

where i, j ∈ V , and k and p are firings of i and k, respectively. In this set up, i is said to be the source of the latency whereas j is said to be the sink. However, we will be dealing with streaming applications which can run for potentially infinite periods of time, therefore it is also necessary to define the maximum latency between two actors, for all firings. This concept is stated in the following manner:

ˆ

L(i, k, j, p) = max

k≥0(s(j, k + n) − s(i, k)), (2.9)

where n is a fixed iteration distance.

Maximum Latency from a periodic source

We stated earlier that source actors cannot be delayed, as such a functional graph needs to be able to keep up with the source, that is to say, the behavior and execution rate of the graph are imposed by the source. To illustrate this, let us suppose that indeed the graph executed at a rate slower than that imposed by the source. As can be seen in figure 2.4, tokens would accumulate on the edge, which corresponds to a gradual filling up of the FIFO Queue. As memory in reality is finite, eventually the system would either have to discard data.

The start times of a periodic source are defined by:

s(i, k) = s(i, k) + T · k, (2.10)

where T is the period of the source. Since it is necessary for the graph to keep up with the source, we say that the graph is lower bounded by the period of the source. As such, only graphs that exhibit a MCM lower than the period T of the source shall be accepted.

Since the graph can at the least have a M CM = T , that is to say, it’s schedule is a ROSPS, then, the maximum latency can be determined through the following expression:

ˆ

L(i, j, n) = max

(28)

where ˘sROSP S(j, 0) is the earliest start time of j in an admissible ROSPS. Given equation 2.11, it is

then possible to calculate the maximum latency by determining a ROSPS with the earliest start time of j and a WCSTS for the earliest time of i. The proof for equation 2.11 can be found in [13].

Maximum Latency from a bursty source

Another type of source we want to present in this chapter is the bursty source. A bursty source is defined as a source that may fire consecutively at most n times within any interval of time T . The interval between the firings is represented as ∆t. In this situation, for the graph to be able to keep up with the source, it should hold µmax(G) ≤ T_n in order to guarantee that the application can be

executed for an indefinite period of time utilizing a bounded amount of memory. It should be noted that ∆t 6= T_n. If µmax(G) ≤ ∆t, then the maximum latency is given by:

ˆ

L(i, j, n) ≤ ˘sµmax(j, 0) − s(i, 0) + µmax(G) · n (2.12)

This equation also described what is called the maximum of a sporadic source. More information on this is in [13]. However, if µmax(G) ≥ ∆t, there is the possibility that tokens may accumulate

indefinitely. However, due to the property of monotonicity, making the source fire faster doesn’t make the sink slower or faster. Therefore, so long as the source is limited to n firings within T , even if µmax(G) ≥ ∆t, so long as it is µmax(G) ≤ T_n tokens won’t accumulate indefinitely. The maximum

latency for a bursty source is given by: ˆ

L(i, j, n) ≤ ˘sROSP S(j, 0) − sROSP S(i, 0) + (n − 1)(µmax(G) − ∆t) (2.13)

2.5 Modeling schedulers in Data Flow graphs

In this section we introduce the types of task scheduling and two scheduling techniques that are essential within the context of this dissertation.

2.5.1 Task scheduling

We are interested in dealing with two types of scheduling mechanisms:

• Compile-Time Scheduling (CTS): Scheduling decisions are fixed at compile-time.

• Run-Time Scheduling (RTS): Scheduling decisions are determined while the system is run-ning the applications.

According to [23], CTS is a popular strategy for scheduling data flow graphs. Applications running on a HMPSoC can start and stop independently. Therefore, in order to appropriately use CTS, one would have to derive a schedule for all possible combinations of applications that could be active at any given moment. Since this can simply be unfeasible to do, there is RTS which allows for scheduling decisions to be computed in run-time. However, for these decisions to be computed, an overhead is necessarily introduced which does not exists in CTS. In this dissertation, as in [13], a combination of CTS and RTS is used.

2.5.2 Static-Order scheduling

For a set of actors defined as A = {a0, a1, a2, ...} mapped on to a processor, a static order is

defined as a sequence so = [ax0, ax1, ..., axn, axm], where x0, x1, ..., xm ∈ [0, |A|[. The sequence so

generates extra dependency constraints, aside from the dependency constraints already existent in the applications containing the actors of A, such that the processor must execute actors in the order of so. That is to say, first ax0 executes, then ax1 until axm. As axm is the last element in the list, after

it has executed, the sequence starts from the beginning again.

In a SRDF graph, the method of generating a static order among the actors of A is by generating tokenless edges (ax0, ax1), (ax1, ax2), ..., (axn, axm) followed by the generation of edge (axm, ax0). After

(29)

Figure 2.5: Static ordering of actors on a processor

Figure 2.6: Time Wheel

these edges are generated, a token is added to the edge (axm, ax0) in order that actor ax0may execute

first. This process can be visualized in figure 2.5.

Please note that, in this configuration, the tokens and edges don’t really represent data transfer between actors, but exist merely to model a desired behavior.

2.5.3 Time Division Multiplexing scheduling

The general idea behind Time Division Multiplexing (TDM) scheduling is the following: An interval of time P is defined as a replenishment period wherein a subsection of P is attributed to every application that is going to be run on the processor in a circular fashion. This subsection is referred to as a slice. The circular fashion in which the applications are given time on the processor leads to the natural presentation of the idea shown in figure 2.6 which, for the purposes of this dissertation, shall be designated the term time wheel.

The effect of TDM can modeled by replacing the WCET of an actor by its worst case response time under TDM scheduling. The response-time of an actor i is the time necessary to complete the fire of actor i in a single iteration, taking in to account resource arbitration. Let us assume TDM is implemented on a processor, and that it has a replenishment period and a slice, P and S respectively,

(30)

where S ≤ P and S is a slice designated for an actor i. An interval of time will pass beginning at the instant where i has enough tokens on its input edges and ending after the actor has completed its execution. This interval of time is equal to or larger than i’s execution time t(i) and is called processing time. When this interval of time is greater than t(i), then either one or two effects of TDM scheduling are at play. The first of which is arbitration time which is defined as the time it takes for the TDM scheduler to grant the i the permission to execute. If we define the worst case-arbitration time rA then:

rA(i) = P − S (2.14)

The second effect appears when S < t(i), that is to say, the size of the slice is less than the execution time. If we denote the processing time as rP then

rP(i) = b

t(i)

S c · (P − S) + t(i) (2.15)

Combining these two, we get the total-worst case response time of i in a TDM schedule: rP(i) = (P − S) · (b

t(i)

S c + 1) + t(i) (2.16)

2.5.4 Preemptive Fixed Priority scheduling

Within the context of a preemptive fixed priority scheduling scheme, low priority tasks can be preempted by higher priority tasks. As such, in order to guarantee that every application meets its requirements, it is necessary that concrete bounds be determined for the worst-case response time of a given application. The current method of analysis involves establishing the effects felt by a lower priority applications due to interference from higher priority applications. The details on the analysis methods for preemptive fixed priority scheduling are introduced in [11]. We will briefly introduce the more essential concepts.

Load on a processor from a SRDF graph

Let us assume we have a SRDF graph G = (V, E, t, d) and resources Π. Now, let us also assume we have an actor i ∈ V mapped to a processor m ∈ Π, then, the first concept that needs to be introduced is the concept of load. Load is the amount of time a given resource spends with an actor or actors within a known time frame. This is formalized through a function map : V → Π; map(i). If map(i) = m, we say that i is a load actor of m, otherwise we say it is a non-load actor. The time frame which is being considered is termed load window and is typically defined with a beginning δ0

and length δ such that its ending is at δ0 + δ. Formally, the load of an actor i ∈ V of graph G on

processor m ∈ Π can be defined as:

L(G, m, i, s, t, δ0, δ) =

X

k

min(s(G, t, i, k) + t(i, k), δ0+ δ) − max(s(G, t, i, k), δ0), (2.17)

where s is the start time of i, k is a firing of i such that k : δ0− t(i, k) < s(G, i, t, k) < δ0+ δ; s(i, k) is

the start time of the k-th firing of i and t(i, k) the execution time of the k-th firing of i. In order to simplify the notation, the following simplified version of equation 2.17 will be used:

L(m, i, δ0, δ) =

X

k

min(s(t, i, k) + t(i, k), δ0+ δ) − max(s(t, i, k), δ0), (2.18)

.

Due to the fact that functional graphs are not composed of singular actors, it is important to expand the previous notion of load of an actor to a graph. This extension is done simply by summing over all actors of a graph:

L(m, i, δ0, δ) =

X

k,i

(31)

.

Maximum load on a processor from a SRDF graph

We start by analyzing the maximum load of a single actor. Looking at equation 2.18 we can conclude that, in order to be able to characterize the maximum load, we must analyze three parameters:

• start times of load and non-load actors • execution times of load and non-load actors • start time of a the load window

The effects of each of these is analyzed in the depth in [11] and from where we get that list of conditions which lead to the maximum load are:

• Start of the load window coincides with the firing of a load actor

• Block the graph, all future firings activate at their natural distance from the first firing of the load actor in the given load window

• Assume the worst-case execution time for each firing of the load actor • Assume the best-case execution time for each firing of all other actors.

For the case of a complete SRDF graph, this process is repeated for every load actor in the graph where, the load actor that produces that maximum load corresponds with the maximum load of the SRDF graph.

The worst-case response time (WCRT) can now be defined as follows: Let G be a high priority SRDF graph on resources Π with schedule s, execution time function t and processor m ∈ Π. Let i be an independent low priority task with execution time time ti. Then the response time of i is the

elapsed time between the start (δi) and the end of the execution of i, considering the interference from

the high priority application, is the smallest positive value ˆri that satisfies the equation:

ˆ

ri = ti+ ˆLi(G, m, s, t, δi, ˆri), (2.20)

where δi is equal to the schedule start time of actor i and ˆLi the maximum load characterization of

graph G for actor i.

Fixed Priority Analysis for N Application

The techniques shown thus far for fixed priority analysis have been for a single low priority (LP) application and a single high priority (HP) application. However, the techniques are easily extensible to n applications.

If we look at equation 2.20, it is possible to understand how the load of a single HP application affects a LP application. In order to determine the interference from a set of n higher priority applications. One simply needs to sum over the load caused by each application. The modified version of equation 2.19 then becomes:

ˆ ri = ti+ n X j=1 ˆ Li(G, m, s, t, δi, ˆri), (2.21) .

(32)

Chapter 3

Software Framework

This chapter introduces the software framework upon which the work for this dissertation was conducted. We’ll start with an overview of the Heracles and the modules that were essential for carrying out this project.

The two main contributions in this dissertation are the integration of the truncated response model with the Heracles data flow simulator and the analysis of the results attained from the model. During the execution of this project, other minor contributions like the integration and unification of separate code bases as well as bug fixes and streamlining of certain project essential algorithms. For this reason, the Heracles data flow simulator can be considered a fundamental cornerstone in this project. Heracles is a simulation and analysis tool originally conceived and developed by Orlando Moreira. It’s a complex and flexible tool for designing, scheduling, simulating, programming and analyzing data flow graphs. However, we shall, for the purposes of this work, focus on the simulation, response modeling and temporal analysis. It should also be noted that during this section, for the sake of easing the presentation, the initial state of the graphs will include the necessary tokens for the HP and LP actors to fire from the beginning. All this allows one to do, is start the firing from zero and make better use of the timelines rather than presenting the initial segments of the timelines in blank until the first firing.

3.1 Heracles Data Flow Simulator

3.1.1 Language

Heracles was written in OCaml (Objective Categorical Abstract Machine Language). OCaml was designed and implemented by Inria’s (Institut National de Recherche en Informatique et en Automa-tique)Formel team lead by Grard Huet [32]. It’s a language that provides a slew useful features, among which the most important are the following:

• Type inference: Allows one to define a function without having to explicitly provide the type of its parameters and of its result.

• Parametric Polymorphism: Allows one to define functions that work over a list or array independently of the types its elements.

• Pattern Matching: A data structure can be defined as combination of records and sums. Then, functions that operate over this structure can be defined through pattern matching, which is a generalized form of the switch statement that offers a simple way of both examining and naming the content of a data structure.

• Hybrid Paradigm: Allows one to express function in both functional or imperative forms. • Performance: In some cases, OCaml is as fast as C, in terms of execution time.

(33)

Figure 3.1: A flow chart representing each of Heracle’s flows

• Portability and Efficiency: OCaml offers two batch compilers: a bytecode compiler and a native code compiler. The bytecode produces generates small, portable executables, whereas the native code compiler generates more efficient machine code.

• Safety: Programs are verified by the compiler before they can be executed, protecting the integrity of the data manipulated by an OCaml program.

This language also provides debugging and profiling tools along with its compiler. More on this language can be found in [33, 34, 35, 32]

3.1.2 Heracles Flows

Heracles provides the user with a series of tools which grant one the ability to simulate and analyze the behavior data flow graphs on a HMPSoC. Depending on the particular aspects of the graph one wants to analyze, we are also provided with three different flows. Each of these flows are represented in figure 3.1. The simulation flow is represented in green, whereas the scheduling flow is in blue and, finally, in orange the temporal analysis flow. We shall briefly describe how each one of these flows function in the follow subsections.

Heracles consists of a single executable that takes instructions from the command line. Graphs are provided to it by typing in the directory of a graph file that is identified with with a ”.sdf” extension. It is also necessary to provide it a ”.sys” file that is a resource based description of the system the graph will be running on. It is also possible to provide the tool with a mode sequence file. This file is for the execution of Mode Controlled Data Flow (MCDF) graphs. These will graphs are not within the domain of this project. More information on MCDF graphs can be found in [13, 11]

The graph file includes all the basics that are necessary to identify a graph, such as the name of each node, as well as the source and sink of each edge. It also includes other information, like the execution time of each actor. Optional arguments are also included, such as worst case execution time,

(34)

Figure 3.2: Flowchart of the simulator

best case execution times, jitter, along with slew of other options. The system file includes information on every processor, such as it’s type, name and scheduling scheme.

3.1.3 Simulator flow

The Heracles simulator, represented in figure 3.2, is currently based on events, which are run one after another until a stop condition is satisfied in the compare function. In order to understand how this is done, one must first describe what an event consists of within this context. An event is is defined by the following properties:

• id: Identifies whether event is a start, or finish. • issue: Uniquely identifies the event.

• time: Time at which the event is to processed.

• multiplicity: Number of simultaneous firings of an actor. • Internal Actor: Identifies the actor that issued the event.

Secondly, one must explain what the compare function does, because it is one of the central components of the simulator. The compare function reorders the set of events such that if the start time of a finish event coincides with the start time of a start event, then the finish event will be the first event extracted from the ordered set. For an intuitive understanding of why this is the case, let us suppose one finds himself in a cafe with two of his friends. Also, one only has enough change at hand for two drinks. Drinks which then need to be shared between him and his two friends. Let us then say that the start event is when someone picks up a drink in order to drink and the finish event is when someone puts a drink back down after drinking. With these assumptions in mind, consider then the state where one isn’t drinking and there his two friends are. Also, consider that the instant

Analysis of the truncated response model for fixed priority on HMPSoCs

Jesse Wayde

Brand˜

ao

An´

alise do Modelo de Resposta Truncado para

Prioridade Fixa em HMPSoCs

Analysis of the Truncated response model for Fixed

Priority on HMPSoCs

Jesse Wayde

Brand˜

ao

An´

alise do Modelo de Resposta Truncado para

Prioridade Fixa em HMPSoCs

Analysis of the Truncated response model for Fixed

Priority on HMPSoCs

Contents

List of Figures

Acronyms

Chapter 1

Introduction

1.1

Streaming Applications

1.2

Real-Time Applications

1.3

Fixed Priority Scheduling

1.4

Data flow Graphs

1.5

Current Temporal Analysis Techniques

1.6

Problem Statement

1.7

Contributions

1.8

Thesis Organization

Chapter 2

Data flow Computation Models

2.1

Graphs

2.2

Data Flow Graphs

2.3

Schedules

2.4

Temporal Analysis

2.5

Modeling schedulers in Data Flow graphs

Chapter 3

Software Framework

3.1

Heracles Data Flow Simulator