Escalonadores de prioridade fixa em multiprocessadores de tempo-real

(1)

Universidade de Aveiro

2012

Departamento de Eletrónica, Telecomunicações e

Informática

Ricardo Daniel Lopes

Almeida

Escalonadores de Prioridade Fixa em

Multiprocessadores de tempo-real

Dissertação apresentada à Universidade de Aveiro para cumprimento dos

requisitos necessários à obtenção do grau de Mestre em Engenharia Eletrónica

e de Telecomunicações, realizada sob a orientação científica pelo Dr. Paulo

Bacelar Reis Pedreiras, Professor Auxiliar do Departamento de Eletrónica,

Telecomunicações e Informática da Universidade de Aveiro e co-orientação

científica por Dr. Orlando Miguel Pires dos Reis Moreira, Principal DSP

Systems Engineer na empresa ST-Ericsson.

Apoio financeiro da FCT e do FSE no

âmbito do III Quadro Comunitário de

Apoio.

(2)

o júri

Presidente

Professor Doutor José Alberto Gouveia Fonseca

Professor auxiliar do Departamento de Eletrónica e Telecomunicações da

Universidade de Aveiro

Professor Doutor Paulo Bacelar Reis Pedreiras

Professor auxiliar do Departamento de Eletrónica e Telecomunicações da

Universidade de Aveiro

Doutor Orlando Miguel Pires dos Reis Moreira

Principal DSP Systems Engineer em ST-Ericsson

Professor Doutor Luís Miguel Pinho de Almeida

Professor associado do Departamento de Engenharia Eletrotécnica e de

Computadores da Faculdade de Engenharia da Universidade do Porto

(3)

Palavras-chave

Escalonadores de prioridade fixa, data-flow, tempo-real, multiprocessadores,

carga computacional, sistemas embebidos, processamento digital de sinal.

Resumo

Devido evolução tecnológica observada nos últimos anos, os sistemas

embutidos com capacidade de multi processamento tornaram-se comuns.

Nestes dispositivos, a escassez de recursos obriga a uma distribuição

otimizada dos mesmos pelas diversas atividades suportadas.

Este tipo de dispositivos contam normalmente com um processador de uso

geral, tipicamente um processador da família ARM, e um ou mais

processadores direcionados a tarefas específicas, como processadores

vetoriais (EVP), utilizados em sistemas de processamento digital de sinal por

exemplo.

A distribuição de recursos pelas tarefas do sistema é feita por um escalonador.

Este pode fazer a distribuição de recursos obedecendo a uma das várias

disciplinas conhecidas: Round Robin, First In First Out, Time Division

Multiplexing, Fixed Priority, etc.

O presente trabalho tem como principal objetivo a investigação de

escalonadores de tempo-real baseados em prioridades fixas, com especial

atenção para a aplicações de streaming a executar em plataformas

multiprocessador, utilizando dataflow.

Dataflow é um paradigma que utiliza teoria de grafos para realizar a

modelação, programação e análise de aplicações e sistemas.

A primeira parte deste projeto é dedicada à análise e modelação de grafos de

fluxo de dados onde a distribuição de recursos é feita com recurso a um

escalonador de prioridade fixa. A segunda parte será dedicada ao estudo da

interferência entre tarefas com níveis de prioridades distintos em grafos

independentes, quando mapeados para execução no mesmo processador. Em

sistemas embebidos, existem tarefas de alta prioridade (periódicas ou

esporádicas) que têm de ser atendidas o mais rapidamente possível quando

prontas a executar. Este atendimento irá interferir na execução de tarefas que

corram na mesma plataforma com níveis de prioridade inferiores, pois estas

serão bloqueadas durante a execução das tarefas de maior prioridade. Esta

interferência tem como consequências diretas a diminuição do tempo de

resposta das tarefas de alta prioridade e o aumento do tempo de execução das

tarefas com níveis de prioridades baixos.

Com este trabalho pretendemos verificar quais as vantagens e desvantagens

que um escalonador de prioridade fixa pode oferecer neste tipo de situações,

quando comparado com outros escalonadores.

(4)

Keywords

Fixed priority schedulers, data-flow, real-time, multiprocessors, computational

load, embedded systems, digital signal processing.

Abstract

Due to the technological evolution that happened recently, embedded systems

with multiprocessing capabilities are becoming common. Application

requirements often impose resource constrains, leading to the necessity of

distributing them in an efficient manner.

This type of devices counts normally with a general purpose processor,

typically from the ARM family, and one or more task specific processors, such

as vector processors (EVP), used in digital signal processing systems for

instance.

The resource distribution through the tasks is done by a scheduler. The

scheduling can be done through one of the known scheduling policies: Round

Robin, Fist In First Out, Time Division Multiplexing, Fixed Priority, etc.

The main goal with this project is to investigate fixed-priority real-time

schedulers, with special focus to streaming applications executing on

multiprocessor platforms, using dataflow.

Dataflow is a paradigm that uses graph theory for modelling, programming and

analysis of applications and systems.

The fist part of this project is dedicated to the analysis and modelling of fixed

priority dataflow graphs with shared resources distributed through a fixed

priority scheduler. The second part is dedicated to the study of interference

between tasks with different levels of priority on independent graphs, when

mapped to execution on the same processor.

Embedded systems frequently have high priority tasks (periodic or sporadic)

that need to be dispatched as soon as they become ready to execute. This

action is going to interfere in the execution of tasks that are running in the same

platform but with lower priority levels, since they are going to be blocked during

the execution of the high priority tasks. This interference has two direct

consequences: a lower response time for the high priority tasks and an

increase in the execution time for the tasks in lower priority levels.

With our work, we intend to investigate the advantages and disadvantages that

a fixed priority scheduler can offer in this type of situations, when compared

with other schedulers.

(5)

1 Introduction 1

1.1 Fixed priority scheduling: A historicalreview . . . 1

1.2 Streaming applications . . . 3

1.3 Real-Time applications . . . 3

1.3.1 Timing requirements . . . 4

1.3.2 Scheduling. . . 4

1.4 Fixed priority scheduling applications. . . 4

1.4.1 Preemption . . . 5

1.5 State ofthe art . . . 5

1.5.1 Classical real-timetheory . . . 5

1.5.2 SymTA/S . . . 7

1.5.3 Real-Time calculus . . . 7

1.6 Data ow graphs . . . 8

1.7 Problem description . . . 8

1.8 Developed work . . . 9

1.9 Thesis organization . . . 9

2 Data Flow computationmodels 11 2.1 Graphs. . . 11

2.1.1 Directed graphs . . . 11

2.1.2 Path and cyclesina graph . . . 12

2.2 Data Flow . . . 12

2.2.1 Actor rings. . . 13

2.3 Temporal analysis . . . 13

2.3.1 Schedules . . . 14

2.3.2 Single Rate DataFlow . . . 15

2.3.3 Timed Single RateDataFlowgraphs . . . 15

2.3.4 Application graphs . . . 15

(6)

2.4.2 Time Division Multiplexingscheduling (TDM) . . . 16

2.4.3 Non-Preemptive Non-BlockingRound-Robin scheduling . . . 16

2.4.4 Static-Order scheduling . . . 17

2.4.5 Static periodicschedulers . . . 17

2.5 Data Flowtemporal analysistechniques . . . 17

2.5.1 Throughput analysis . . . 17

2.5.2 Latency analysis . . . 19

2.6 Conclusion. . . 21

3 Software framework 22 3.1 The Heracles dataow simulator . . . 22

3.1.1 Heracles toolow. . . 22

3.1.2 Explanation of the softwaremodel . . . 23

3.1.3 Usage of thetool . . . 26

4 Implementations introducedinthe software framework 28 4.1 Mainchanges to thecode . . . 28

5 Intra-Graph xed priority analysis for data-ow graphs 32 5.1 Problem denition . . . 32

5.2 Theory . . . 33

5.2.1 Data-Flow analysisof axed prioritysystem . . . 33

5.3 Multi processormapping analysis . . . 38

5.3.1 Overview . . . 38

5.3.2 Worst-caseresponse time . . . 39

5.3.3 Analysis ofstart times . . . 39

5.4 Software implementation . . . 51

6 Inter-graph xed priority analysis 55 6.1 Problem denition . . . 55

6.2 Theory . . . 56

6.2.1 Denition of loadof aprocessor . . . 56

6.2.2 Initial considerations and evolution ofthe concept . . . 56

6.2.3 Establishing timeintervals . . . 58

(7)

7 Results 70 7.1 Analysis ofintra-graph xedpriority dataow graphs . . . 70

7.1.1 Data Flowanalysisof axed prioritygraph . . . 70

7.1.2 Simulationresults . . . 71

7.1.3 Multi Processormapping analysis. . . 72

7.2 Load analysisof a WirelessLANand aTDSCDMA job . . . 78

7.2.1 Theoretical approach . . . 78

7.2.2 Simulationresults . . . 80

8 Conclusion and future work 87

(8)

Introduction

Multiprocessorsystemsaregettingcommon nowadays. Duetothetechnological advancesin

thisarea,todayitismorepracticalandecient tocreatesystemswithmorethanoneprocessor,

relinquishing specic tasks to specic processors. This devices are known as Multi Processor

Systemson Chip[MPSoC]. Bydoingthis,not only wecan benetfrom theparallel execution of

tasksbutwecan alsousesomeunique traitsof aprocessorto increaseour processingcapability.

Mostcomputing embedded systems thatperform some digital signal processing possess at least

twotypesofprocessingunits: ageneral-purposecoreandavectorprocessor. Alltheowcontrol

decisions are performed by the general-purpose processor while the processing of vectors and

matrixoperationsaredoneinthe vectorprocessor,takingadvantageofitscapabilityofhandling

multiply-accumulate operationson manyinputvalues simultaneously.

Inorder tomaximize the productivityofsuch devicesitisusual tomapseveral applications

on the same MPSoC device. With such computational power at our disposal, we need an

ef-cient mechanism to distribute the computational load through the available platforms. Every

computational systemwithlimited shared resources, like memory, processorcores or peripheral

access among others, needs a proper resource sharing mechanism. A scheduler is essentially a

programthat coordinates theaccessto resources. Inmost embedded systems,itisthescheduler

who decideswhich taskcan be executed at some point in time. Since every task has a dened

numberofresourcesthatitneedtoexecute,thescheduler istheoneresponsibleforensuringthat

agiven taskcanonly beset to executionwhen the correspondingset ofresources isavailable.

Duetothenatureoftheapplicationswhereembeddedsystemsaredesignedto,itisexpectable

thattheyperform at leastone or moretasks withreal-timecomputing constraints.

This dissertation focuses in two major goals: the characterization of xed priority graphs,

i.e,determination oftheworst-caseresponse-timeandstarttimesfor thetasksthatcomposethe

system, and the study of the interference between tasks with dierent priorities when mapped

into thesameprocessing platform.

In theremainder of this chapter, we will dene the fundamental concepts neededto

under-standand dene our problem.

1.1 Fixed priority scheduling: A historical review

A real-time system is one with explicit deterministic or probabilistic timing requirements.

(9)

proach to scheduling produced systems thatwereinexible and dicult to maintain [11]. More

advancetechniqueswhere requiredfor the design,analysisandimplementation ofhard-realtime

systems. Itwashopedthat these techniqueswould provide additional exibility whilst enabling

thepredictability of suchsystems to be guaranteed.

Subsequently,awiderangeofschedulingstrategieshavebeenproposed. Thesestrategiescan

be characterized by their prescribed run-time behaviours and the forms of associated analysis

provided for predicting/optimizing system behaviour. At one extreme, for a simple application

model, static scheduling (cyclicexecutives)provides very deterministic yetinexible behaviour.

Theother extremeis oftenknown asbest-eort scheduling [18];itfacilitates maximum-run time

exibility, but, at best allows only probabilistic predictions of run-time performance. Fixed

priority scheduling falls between these two extremes: it is oftencriticised as being too static by

the proponents of best-eort scheduling and as being too dynamic by the supporters of cyclic

executives. However, itisapredictable approach: o-lineguarantees regardingprocessdeadlines

can be aorded using appropriated analysis. In reality itrepresentsa practical, highlyeective

approach toscheduling alarge classof real-timeapplications.

Work in a xed priority scheduling concentrated on two separate issues: policies for the

assignment of priorities to processesand feasibility testsfor process sets. Theassumptions and

constraints for much of this work are identical to those described by Liu and Layland in 1973

[24]:

1. All processes areperiodic;

2. All processes have a deadlineequal to theirperiod;

3. All processes areindependent;

4. All processes have a xedcomputation time;

5. No process mayvoluntarily suspend itself;

6. All processes arereleasedassoon astheyarrive;

7. All overheads areignored (assumedto be0).

Development of real-time theoryprogressed steadily, before a resurgencein the1980's. The

motivation for this renewed interest stemmed for many diverse factors, including the

realiza-tion that the requirements of hard (i.e safety critical) real-time systems outstripped available

theoretical analysis (for example, formal methods, scheduling theory etc.) and implementation

techniques. Typicalreal-timesystemsimplementedpriorto themid 1980'sincluded basic

avion-icscontrol,laboratory control etc. Looking forward fromthis point intime,thefuturereal-time

systems were considered to be applications such as the space station, robots, intelligent

man-ufacturing and advanced avionics control. The common requirements shared by these systems

were the need for dynamic and adaptive behaviour, including elements of articial intelligence,

togetherwithanincreased demandfor predictability andreliability.

Another factor in the renaissance of real-time systems research was the rapid development

ofhardware(e.gminicomputersinthe1970'sandmicrocomputersinthe1980's)whichledto

(10)

sors became more complex, with the inclusion of pipelines and caches, and peripheral devices

became more intelligent. The availabilityof such hardware within thecontext of hard real-time

applicationsprompted further work intermsofanalysis [2].

1.2 Streaming applications

A streaming application is anapplication thatoperatesover a long(potentially innite)

sequenceof inputdata items, also refereed as data stream. Thedata is fed into theapplication

normallyfromanexternal sourceandeachdatatokenisprocessedinalimitedtimebeforebeing

discarded [34 ]. This process outputs also a long (potentially innite) sequence of output data

items. Thistypeofapplicationsarecommoninsignal-processingfunctionswherewehavealways

some type of antenna as the external source of data. In this situation, the application has no

controloverthe incomingorvolumeofthedatatobeprocessed. Asexamplesofstreaming

appli-cationswe can indicate software-dened radio, radar tracking, audio and video decoding, audio

and video processing, cryptographic kernels or network processing [26 ]. Streaming applications

followareactivemodeland,whentheapplication requiressynchronizationwiththedatastream,

temporal restrictionsarealso applied to it.

1.3 Real-Time applications

The validity of the results produced by a real-time application are depended on their

functionalcorrectness, asonanyothertypeofapplication,but alsofromthetimeinwhichthese

resultsareproduced. Although correct,anoutputfromareal-timeapplication maybeirrelevant

ifitviolatesitstemporaldeadline. Theterm"may"usedonthelastsentenceimpliestheexistence

ofmorethan onetypeofreal-time systems. Areal-timeapplicationcan becategorizedinto three

typesaccordingto itstemporal restrictions[6 ]:

•

Soft - If this type of restriction is violated, the associated result maintains some of its

utilityto the application, although thereis degradationinthequalityofservice.

Let us consider an automated gate asan example. If there is a signicant delay between

the reception of the activation signal for the open button and the activation of the gate

motor, itis annoying for a driverbuttheend result itisstill usable.

•

Firm - Ifa rmdeadline isoverdue, the consequent resultis unusable but theintegrity of

the systemand the userare not compromised. As anexample we can refer data collected

from asensorarraythatit isusedfor autopilot navigation. Ifsome ofthesamples arrived

after the established deadline, they are useless. But aslong as some other dataarrivesin

a timelyfashion,the systemis stillable to function correctly.

•

Hard - For this type of restrictions, a deadline violation could also imply a catastrophic

consequence to the system. Every critical security system is characterized by having at

leastonehardtemporalrestriction. Asanexamplewecanrefertoalifesupportsystemor

thetractioncontrol systemof acar. Ifthecontrolsystemisnot abletomeet itsdeadlines,

(11)

accordingwiththe previous temporal restrictions:

•

Soft Real-Time - Thesesystems onlypossesssoft orrm temporalrestrictions

•

Hard Real-Time-All thesystemsthatpossessat leastonehard temporalrestrictionare

categorized under thislabel.

1.3.1 Timing requirements

Timing requirementscome intwo basic types: throughput and latency. Iftherate at which

aniterativeapplicationproducesresultsisimportant,thenweareinthepresenceofathroughput

requirement. Iftheminimumor maximumtimeintervalbetween thearrivalofaninput andthe

production of the corresponding output areto be respected, then theapplication hasa latency

requirement. Aheartratemonitorisanexampleofanapplicationwiththroughputrequirements.

In order to return a correct value for this measure, all theheart beats in a given time interval

mustbereadand thetimebetween themmustberespected,althoughtheprocessingandoutput

ofthenalvalue couldsuersome delay resultingin aservicedegradation. The navigation and

actuationsignalsinacarexempliesasystemwithlatencyrequirements. Itisimportantthatthe

maximumtime between theactuationonthebrakepedalandtheactuationon thebrakesystem

isrespected, for instance. In this case, due to the random natureof all the possible stimulus to

the system,no throughput requirementsarepresent, at leastnot inthe systemsconsidered.

Temporal requirements can also appear in the form of a required worst-case timing,

best-case timing or both. If the worst-case timing coincide with the best-case timing, the result is

designatedasanon-time requirement. Inthis projectweconcentrate our attention inworst and

best-case timingcalculations.

1.3.2 Scheduling

A typical computational systemis comprised of several resources (processors, memory,

pe-ripheraldevices,etc.) that shouldbeused concurrently bydierent tasks. Theseresources need

to be assigned to the concurrent tasksin an orderly and ecient fashion. The set of predened

criteriathatregulates theallocationof resourcesto tasksiscalledascheduling policy. Thesetof

rules that, at any time, determines the order in which tasks are executed is called a scheduling

algorithm. The specic operation of allocating a resource to a task selected by the scheduling

algorithm is referred as dispatching [6]. There are several known scheduling algorithms in

exis-tence: FirstInFirstOut,RoundRobin,ShortestRemainingTime,FixedPriority,TimeDivision

Multiplexing, etc. Every one of these algorithms has advantages and disadvantages that had

been studied throughout the years. Our project will be focused mainly on the Fixed Priority

scheduling algorithm.

1.4 Fixed priority scheduling applications

In a xed priority scheme, all tasks arecharacterized byan immutable priority value.

Nor-mallythisvalueisanumericone. Theorderinwhichthesevaluesareassigneddependsessentially

onthe systemspecicationsbut conventionally higherpriorities receive smallervalues.

(12)

where

T

designatesatask,

i

and

j

indicatenumericalpriorityvalueswith

i, j ∈ N

0

. Thescheduler

uses priorities to determine the next job to be scheduled. These are calculated at design time

and never change during execution, hence the term xed [33]. In xed priority scheduling, the

dispatcherwillmakesurethatatanytime,thehighestpriorityrunnabletaskisactuallyrunning.

1.4.1 Preemption

In a pre-emptive system, if we have a task with a lowpriority running, and a highpriority

taskarrives,i.e,someeventhadoccurredandthedispatcherneedstodeployataskintoexecution,

the low priority task will be suspended and the high priority task will start running. If while

the the highprioritytask is running, a task witha medium priority arrives, the dispatcherwill

leave it unprocessed and the high priority task will carry on running, nishing its computation

in a later time. Only when both the high and medium priority tasks have completed can the

lowpriority taskresumeits execution. Thislow prioritytaskcan thencarry on executing until

eithermorehigherprioritytasksarriveorithasnisheditswork[35]. Iftheplatforminusedoes

not supportpreemption,then thetaskswithhigher priorities areonly set to executionahead of

the lower priority ones ifthey could be started at the same time instant. Otherwise, ifa lower

prioritytasksisalreadyexecuting intheplatformwhen ahigherprioritytaskbecomesreadyfor

executionit justgetsblocked, at leastuntil theexecuting tasknishesits current execution.

1.5 State of the art

1.5.1 Classical real-time theory

Real-Timeisasubjectthathasbeenstudiedforsometime,whichledtothedevelopmentofa

considerabletheoryaroundit,knownnowadaysasclassical real-time theory. Inthisintroductory

section,wearegoingtofocusonlyonon-lineschedulingwithxedprioritieswithspecialattention

tothetwomaincriteriaforclassicalschedulingusingxedpriority: therate-monotonicandthe

deadline monotonic criteria [6] [25 ]. This type of scheduling has some advantages regarding

o-linescheduling. Namely:

•

Anyalteration inthetaskscharacteristicsis immediatelytakeninto account bythe

sched-uler.

•

It can easily accommodatesporadic tasks.

•

Deterministic behaviouron overloads since itonly aectsthetaskswithlower priorities.

Asexpected, therearesome disadvantagesthat gowiththepros mentioned before:

•

The on-line scheduling has a morecomplex implementation since it requiresa kernel with

xed priorities.

•

Thistypeof schedulingrequirestheactionofascheduler and adispatcher, which impliesa

higher execution overhead.

(13)

The Rate Monotonic (RM) scheduling algorithm is a simple rule that assigns priorities to

tasksaccording to their requestrates. Specically, taskswithhigher requestrates, whichmeans

shorter periods, will have higher priorities and vice-versa. Since periods are constant, RM is a

xed-priority assignment: a priority

P

i

is assigned to the task before execution and does not

change over time. For theremaining of this section, we will assumetheexistence of preemption

by the platform. In the initial analysis performed in [24 ], the Rate Monotonic algorithm is

intrinsicallypre-emptive,and allthetasksareindependent,i.e,therearenosharedresources. In

thiscontext,arunningtaskwillbepreemptedbyanewlyarrivedtaskwithshorterperiod. Since

the scheduleis builton-line, itmaybeuseful to knowa priori ifa givenset oftasksrespects its

temporal requirements. To aidus inthis subject there are two main types of tests that can be

performeduponthe taskset:

•

TestsbasedontheutilizationrateoftheCPU-Theseconsistsininequalitiesapplied

to the tasks characteristics, such as their worst-case execution time, period and deadline.

The verication of these inequalities allow us to conclude if a given task as guaranteed

activationsor not. Thetworeferencecriteria forthissubjectaretheMinor boundofLiu

and Leylandand theHyperbolic bound of Bini, Buttazzo and Buttazzo. A more

detailed explanationofeach canbefoundin[24]and[3]respectively. Ourprojectdoesnot

deal directlywithlocaldeadlines, sowe willnot progress anyfurther inthis subject.

•

Tests based in the response-time - For systems with arbitrary xed priorities, the

analysis ofthe response-timeallowus to perform a schedulability testthat, assuming that

thesystemallowspreemptionandsynchronousactivation,isnecessaryandsucient. These

tests consists in computing the worst-case response-time, i.e, the maximum elapsed time

between the activation of a task and its completion, and then check if it is below the

deadline. For further information,please refer to [1].

1.5.1.2 Deadline-Monotonic scheduling

The Deadline Monotonic (DM) priority assignment weakens the "period equals deadline"

constraintwithinastaticpriorityschedulingscheme. Theapplicationoftheschedulingalgorithm

assumesthat every taskis characterized by aphase

φ

i

,a worst-caseconstant computation time

C

i

for each instance, a constant relative deadline

D

i

and a period

T

i

. According to the DM

algorithm,eachtaskisassignedaxedpriority

P

i

,inversely proportionalto itsrelative deadline

D

i

. Thus, at any instant,thetaskwiththeshortest relative deadlineis executed. Sincerelative

deadlinesareconstant,DMisastaticpriorityassignment. AsinRateMonotonic,DMisnormally

usedinafullypre-emptive mode. [6]

0

(14)

period isequal to the deadline, meaning that, ifa tasksetis schedulable by some xedpriority

assignment, thenis also schedulable by DM. The proof of this assumption and a more detailed

explanation on this algorithm can be found in [23 ]. A more comprehensive overview on

Rate-Monotonicand Deadline-Monotonic scheduling isavailable in[2].

1.5.2 SymTA/S

SymTA/S is a system-level performance and timing analysis approach based on formal

scheduling analysis techniques and symbolic simulation. It is essentially a software tool used

todetermine system-levelperformance datasuchasend-to-end latencies, busand processor

uti-lization and worst-case scheduling scenarios. SymTA/S focus its utilization mainly on MPSoC

designs, where the complexity level achieved due to all the concurring hardware makes manual

analysisandoptimization a very timeconsumingand proneto errorstask.

The core ofthe SymTA/S toolis a technique to couple local scheduling analysis algorithms

usingevent streams. For amoredetailed descriptionofthesealgorithms, please referto [29 ] and

[30].

In order to perform a system level analysis, SymTA/S locally performs existing scheduling

analysisusing a well know algorithm, like for example Rate-Monotonic, Time Division Multiple

Access, Round Robin, etc., and propagates their results to the neighbouring components. This

analysis-propagate mechanism is repeated iteratively until all components are analysed, which

meansthat alloutput streamsremained unchanged.

A more accurate descriptionofthis toolcan befound in[16] and[15 ].

1.5.3 Real-Time calculus

Real-Time Calculus establishes alinkbetween three areas,namelyMax-Plus LinearSystem

Theory[9]asusedfordealingwithcertainclassesofdiscreteeventsystems,NetworkCalculus [4]

forestablishingtime boundsincommunicationnetworks,andreal-timescheduling. Inparticular,

it shows that important results from scheduling theory can be easily derived and unied using

Max-Plus Algebra. In its essence, Real-Time Calculus focus on the characterization of sets of

task

T

1 , . . . , T

i

, . . . , T

n

byarequest anda demandcurve

α

i

r

and

α

i

d

respectively. Thesetasksare

all processed by one processing unit characterized by a delivery curve

β

using a static priority

scheduler with preemption. It is important to refer that the tasks are sorted with decreasing

priority.

The algorithm consists in an iterative process to determine the tasks priorities such as the

whole task system can be successfully scheduled. The process consists in selecting the tasks

in increasing order of priority and perform a schedulability test based on the task deadlines,

demandcurveandthe deliverycurveoftheprocessingunit. Ifanyofthetasksfailsthistest,the

wholesetcan notbescheduled. Otherwise,theschedulabletaskisremovedfromthesetandthe

whole selection procedure is repeated until there is no more tasks left. A more comprehensive

(15)

Inthisprojectweintendtousethedata owparadigmtotacklethexedpriorityscheduling

problem. Dataowhasdevelopedintoausefultool,withextensiveuseintheanalysisofstreaming

applications, modelling multiprocessor environments and dealing with concurrent applications.

Theapplication of dataowinthe situationsindicated isdone through theuseof graphtheory

toestablish mathematical models foranalysis usingthetools provided bytheparadigm.

In themost general sense, a dataow graph is a directedgraph withactors represented by

nodes and arcs representing connections between the actors. These connections convey values,

corresponding to data packets, also designated as tokens, between the nodes. Connections are

conceptuallyFIFOqueueswhichpermitinitial tokensonthem.

The operation in which an actor consumes a certain number of tokens from its incoming

edges and thenstarts executing is known asan actor ring. Theset of rules thatcontrol this

ring, namely the minimum number of tokens present in the incoming edges, is know as the

ring rules.

If actors arepermitted to produce and consume onlyone token peractivation, theresulting

graph is designated as a Single Rate Data Flow graph. If, on the other hand, an actor can

consumeand produce multipletokensinits activations,thegraphisnow knownasaMulti Rate

Data Flow graph. Independently of the rateofconsumption and production oftheactors,ifthe

quantity oftokensinanyactor operation isconstant and well dened,we obtain a Synchronous

Data Flow graph[5].

All these concepts will be addressedingreater detailinfuture chapters.

1.7 Problem description

Embedded platforms for streaming applications are expected to handle several streams at

thesame time,each one withits ownrate. Thisfunctionality can be divided injobs. Ajob is a

groupof communicating tasks that are started and stopped independently. The approach that

hasbeentakensofarforanalysisresortstothemodellingofthesesystemsusingdataowgraphs

[21].

The overall scheduling strategy used mixes static (compile-time) and dynamic techniques

(run time). The scheduling of tasks that belong to the same job, or intra-job scheduling, is

handled by means of static order, i.e, per job and per processor, a static ordering of actor is

foundthatrespectsthe Real-Timerequirementswhiletryingto minimizeprocessorusage.

Inter-jobscheduling is handledbymeansof local TimeDivision Multiplex(TDM)schedulers.

The biggest disadvantage of TDM schedulers is that they waste many resources for

low-latency,low throughputtasks.

The goal of this project is to investigate how the ow must be changed to allow the usage

of a non-budget-based scheduler, such asFixed Priority. In order to achieve this goal, we must

follow thefollowing steps:

•

Determinewhetherthe dataowanalysisisstill possible undertheseconditions andunder

which conditions analysiscan still be carriedout.

•

Propose a method for priority assignment per processor per job and design the scheduler

(16)

•

The resourcemanager hasto beadapted to handlea FixedPriority schedule.

Theprocessorusageisanimportantfactortotakeintoaccount. Insharedresourcesplatform,

like the MPSoC devices that we refer in this document, the response-time of a taskor a job is

related to the capacity that a particular resource, specically a processor, has to process that

instance. Ontheotherhand,thiscapacityorprocessoravailabilityisrelatedtothecomputational

loadrequired fromotherjobsor tasks withhigherpriority.

Theanalysisandcharacterizationofthecomputationalloadofasharedprocessorisalsoone

ofthefocalpointsofour project.

1.8 Developed work

The organization ofthis projectfollowed thepointsestablished intheprevious section. The

contributions ofthis projectto the state oftheartcan be summarizedinto thefollowing points:

1. Data Flow Analysis - A comprehensive analysis of data ow models of xed-priority

systems comprisedthe bulkofour initial work. Thisanalysiswascentredinthe

character-ization of best and worst case response-times for xed-priority dataow graphs. Initially

this analysisconsideredthewholesystemmapped onasingleprocessingunitandlater on,

thebehaviourof thesame typeofsystemsmapped ondierentplatforms wasstudied,

giv-ing emphasis to the dependenceand interference between tasksthe same job but mapped

on dierent processing units.

2. Computational load analysis - We formalized the concept that quanties the amount

of work required from a processor by a particular task. In a Fixed-Priority scheduling, it

is useful to characterize the amount of time that a processor is busy with a high priority

task, thus allowing us to determine the availability of thesame processorto execute lower

prioritytasks.

3. Extension ofthe tools available- Forthe analysisofallthesystemsconceivedtostudy

the xed-priority approach to this scheduling problem we had at our disposition a set of

softwaretools,namely adataowgraph simulator.

These toolsdid not contemplate eitherthe simulation of xedpriority dataow graphsor

thefunctionalitiesto perform loadanalysisofaprocessing unit. Inorderto obtainreliable

results to supportour study,itwasnecessaryto addthese functionalities.

In order to simplify the readability of the results provided by this set of tools, we also

included thenecessarychanges for anintegration withan external visualizationtool.

1.9 Thesis organization

The remainder ofthis thesis is organized as follows: in chapter 2 we reviewdata ow

com-putation models and their analytical properties. The mathematical notation for representing

data ow graphs is also introduced in this chapter. The software framework used throughout

this project is introduced in chapter 4, which includes a detailed explanation of the usage and

functioningof the set of toolsavailable. The changesand implementations made to provide the

(17)

implementations to obtainresults. Chapter 6 follows a similar template of theprevious chapter

butnowrelativeto inter-graphxedpriorityanalysis. Thepracticalresults,eitherfromsoftware

simulationsor from analysisof practical examples, and their respective discussionarepresented

(18)

Data Flow computation models

This dissertation uses data ow computation models for modelling and analysing various

systems. In this chapter, we present the notation for the data ow model that we will use

throughout this document and the propertiesof several dataow computation models that are

relevant to our work. Thisis reference material and most of itcan be found in [26 ] [5 ] [28] [32]

[21].

2.1 Graphs

Inthis dissertation,we usedataowanalysis,whichinturnusesgraph theoryinits

formal-ization. Therefore we need to rstintroduce graph theory.

2.1.1 Directed graphs

Denition 2.1. A directed graph

G

is an ordered pair

G = (V, E)

, where

V

is the set of

vertexesor nodes and

E

is the set of edgesor arcs. Each edge isan ordered pair

(i, j)

where

i, j ∈ V

. If

e = (i, j) ∈ E

, we saythat

e

is directedfrom

i

to

j

.

i

issaid tobe the source node

of

e

and

j

is the sink node of

e

. We also denote the source and sink nodes of

e

as

src(e)

and

snk(e)

, respectively.

A

B

C

Figure2.1: Anexampleofadirectedgraph

The graph depicted on thepreviousgure isdescribed bythefollowing sets:

(19)

E = {(A, B), (B, C), (B, B), (C, A)}

(2.2)

It isalso a directedgraph: node

A

isdirected tonode

B

,node

B

isdirected to node

C

and

itselfthrougha self-edge andnode

C

isdirected to node

A

.

2.1.2 Path and cycles in a graph

A path in a directed graph is a nite, nonempty sequence

e

1 , e

2 , ...., e

n

of edges such that

snk(e

i

) = src(e

i+1

)

, for

i = 1, 2, ..., n − 1

. We say that path

(e

1 , e

2 , ..., e

n

)

is directed from

src(e

1 )

to

snk(e

n

)

; we also say that this path transverses

src(e

1 ), src(e

2 ), ..., src(e

n

)

and

snk(e

n

)

; the path is simple if each node is traversed once, that is

src(e

1 ), ..., src(e

n

), snk(e

n

)

are all distinct; the path is a circuit if it contains edges

e

k

and

e

k+m

such that

src(e

k

) =

snk(e

k+m

), m ≥ 0

;acycle isa path suchthat thesubsequence

(e

1 , e

2 , ..., e

n−1

)

isasimple path and

src(e

1 ) = snk(e

n

)

[26 ].

E

C

A

B

D

Figure2.2: Exampleofagraphwithasimplepath

In thepreviousgure, thesimple path

{(A, C), (C, D), (D, B), (B, A)}

describesacycle.

2.2 Data Flow

Data ow is a natural paradigm for describing Digital Signal Processing applications for

the concurrent implementation on parallel hardware. Data ow programs for signal processing

are directed graphs where each node represents a function and each arc represents a signal

path. More specically, in a data ow graph, nodes represent actors. An actor is a time

consumingentity associatedwith ring rules. An edge or arcina dataowgraph represents

a First-In-First-Out queue that directs values from the output of an actor to the input of

another.

In dataow, datais transportedindiscrete chunks, referred to astokens. When an actor

startsanexecution,it consumes adened numberof tokensfrom itsincoming edges.

Concep-tually,this consumption is areading operation of thedatatokens thatareneeded forbeginning

theexecution. Thesetokensremainintheedge(FIFO)duringtheexecutionoftheactor. Bythe

end of that execution, the actorproduces a dened number of tokens into its outgoing edges.

Thisproductionprocessisawritingoperationontotheoutgoingedges(FIFOs). Itisalsopossible

toperformareservation ofspaceintheoutgoing edgesduringthestart oftheexecutioninorder

(20)

tokensproduced or consumed is specieda priori.

Thedataowprincipleisthatanyactorcanre(performitscomputation) wheneverinput

dataare available on all of its incoming edges. A actor withno input edges may reat any

time. This implies that many actors may re simultaneously, hence the concurrency. Because

theprogram executioniscontrolledbytheavailabilityof data,dataowprogramsaresaid tobe

data-driven [21].

2.2.1 Actor rings

Atthispoint,itisusefultodene theringconceptinthedataowcontext,sincethesame

willbereferredin thefuture.

As described in theprevious section, in data ow, every edge has also two associated

val-uations:

prod : E → N

and

cons : E → N

. For a given edge

e ∈ E

,

prod(e)

gives the constant

numberof tokens produced by

src(e)

on

e

ineach ring and

cons(e)

givestheconstant number

oftokensconsumed by

snk(e)

ineachring.

An actor ring is an indivisible quantum of computation. A set of ring rules give

preconditions for a ring. Firing consumes tokens from the input streams and produces tokens

intotheoutputstreams. Theringsthemselvescanbedescribedasfunctions,andtheinvocation

ofthese rings iscontrolledbyring rules [20 ].

Thestarttimeofaringreferstothetimeinstant atwhichtheringrules areveried and

thetokensfrom theinputstreams areconsumed. We aregoingto usethefollowing notation:

s(i, k) = m,

m ∈ N

0

(2.3)

where

i

is denotes the actorand

k

the instance oftheactivation.

As such, the nish time of a ring corresponds to the time instant at which the tokens

resultant from the computation are produced into the output streams. Just like for the start

time,to indicate aparticular nish timewe referto a similarnotation

f (i, k) = m,

m ∈ N

0

(2.4)

An actor ring can be designated asa task instance insome contexts. Task instances

are used mostly in classical real-time theory while actor rings are their counterpart in data

ow.

2.3 Temporal analysis

Execution time of an actor -

τ

Before the denition of TimedSRDF itis important to

dene theconceptof execution time ofan actor.

Denition 2.2. The execution time

τ (i)

of an actor

i

is the elapsed time between the start

timeofthe ringforthatactorandthenish timeofthe ring,attheendofthatexecution.

The execution time can be dened in a more general sense, as

τ (i)

, in which it isassumed that

all executions of actor

i

have a constant execution time, or it can be specied as

τ (i, k)

, where

k

(21)

inadvance, for analyticalpurposes itis oftenconvenient to useboundsto thisvalue.

A given execution timeof an actor

i

can be upper bounded by a worst-case execution time

ˆ

τ (i, k)

andbe lowerbounded by abest-case execution time

τ (i, k)

ˇ

. The following property must

always hold:

ˇ

τ (i) ≤ τ (i, k) ≤ ˆ

τ (i),

∀i ∈ G, ∀k ∈ N

0

(2.5)

2.3.1 Schedules

Inthecontext ofthisproblem, itisnecessarytodevelopaconcisedenitionof schedulethat

is consistent with the type of result that we plan to obtain. At this point it is important to

makeadistinctionbetween schedulers inanimplementation, asforexampleFixedPriority,Time

DivisionMultiplexing,RoundRobinetc.,andtheexecutionofdataowgraphsusingaschedule,

asfor instance aSelf-timedor a StaticPeriodicschedule. Thissectionwill addressthelatter. It

is important to indicate from the beginning that, in this context, we will work with Self-timed

schedules. Amoreintuitivedesignationforthistypeofschedules isASAPSchedules (AsSoonAs

Possible)since the start timesvector for every actor isdetermined fromtheprinciple thatevery

taskshouldstartassoonasithasconditionsforit. So,inasimilarwaythatschedulershadbeen

denedinothersituations, a scheduleris denedto aspecic actor

i

,which,inour denition, is

precededbyanother actor

j

. The edgeconnectingboth actors possesa number

d(i, j)

oftokens

onit, asthe next gureillustrate:

j

i

d(i, j)

Figure2.3: Simplearrangementoftwoactorsconnectedthroughanedge

Fromthis arrangement, we canwrite the following expressionfor theschedule ofactor

i

:

s

Self T imed

=

−∞,

k < 0

max(max

_∀(i,j)∈E

(s(j, (k − d(i, j))) + τ (j)), 0), k ≥ 0

(2.6)

For atwoactorarrangementastheoneinthepreviousgure,we canelaboratethefollowing

logic:

A

B

d(A, B) = 0

Figure2.4: Twoactorsconnectedthroughanedgewithnotokens

From this gurewe can writethat:

s(B, k) ≥ s(A, k) + τ (A)

(2.7)

The starttime of the

k

th

iteration ofactor

B

isgoing tobealways

τ (A)

timeafterthestart

timeofthe

k

th

iterationoftheprecedentactor

A

. Butifwehavesometokensbetweentheactors,

(22)

A

B

d(A, B) = 1

Figure2.5: Twoactorsconnectedthoughanedgewithone token

Witha token inthe edgeconnectingthetwo actors, thepreviousexpression 2.7needsto be

adapted:

s(B, k) ≥ s(A, k − 1) + τ (A)

(2.8)

Since nowactor

B

doesnot need to waitfor actor

A

to produce at leastone token for itto

start executing, the start time of this actor is now referenced to the

(k − 1)

th

iteration of the

precedent actor. If we expandthis logic to

d(A, B)

tokens intheinterconnecting edge, we reach

thebottom branch of expression 2.6. Since a negative value for the start time of an execution

does not make sense in the context of this problem, we included the

0

argument in the max

expression,sothat insucha case, theminimum start timeofa execution isgoingto bezero.

TheWorst-Case Self-TimedSchedule ofanSRDFgraphistheself-timedscheduleofan

SRDFwhereeveryiterationofeveryactor

i

takes

ˆ

τ (i)

toexecuteandwhere

ˆ

τ (i)

istheworst-case

executiontime of the actor. Notethatthe WCSTS ofan SRDFgraph isunique.

2.3.2 Single Rate Data Flow

If ina dataowgraph we can verify that

prod(e) = cons(e)

for every edge

e ∈ E

, thenthe

graphisaSingle Rate Data Flow(SRDF)graph. A SRDFgraphisone where every actorin

itconsumesand producesthe same numberof datatokens. We can formalize this conceptwith

G

SRDF

= (V, E, d, τ )

(2.9)

V

and

E

are alreadydened indenition 2.1.

d

is a valuation

d : E → N

0

.

d(i, j)

is called

the delayof edge

(i, j)

andrepresents thenumberofinitial tokensinarc

(i, j)

.

2.3.3 Timed Single Rate Data Flow graphs

We can now include the execution time of every actor of the graph into consideration and

dene aTimed SRDF graph:

G

T imedSRDF

= (V, E, d, ˆ

τ )

(2.10)

where

τ

ˆ

representstheworst-case response-time ofan actor.

2.3.4 Application graphs

Inthecourse ofourwork,werealizedthatweneed tofurtherspecifythedenitionofTimed

SRDFreferredabove,byincludinga newparameterinto consideration. Ournewgraph instance

diersjustslightly from equation2.10 :

G

app

= (V, E, d, ˇ

τ , ˆ

τ )

(2.11)

(23)

Inordertousedataowtoanalyseaparticularschedule,rstitneedtobemodelledusingthe

dataow paradigm. Inthe present section we will present strategies to perform this modelling,

usingconcreteexamples astoillustrate the process.

2.4.1 Task scheduling

There are two types of task scheduling mechanisms that we are interested in modelling:

Compile-Timeand Run-Time Scheduling.

Compile-Time Scheduling (CTS) encompasses scheduling decisions that are xed at

compile-time, such asstatic orderscheduling.

Run-Time Scheduling (RTS) refers to scheduling decisions that cannot be resolved at

compile-time, becausethey depend onthe run-time task-to-processor assignment,which inturn

depends on the dynamic job-mix. This is handled by the local scheduling mechanism of the

processor. Modelling theworst-caseeect of thelocal scheduler on theexecution of an actor is

neededto include inthe compile-time analysistheeects ofsharing processing resources among

jobs. IftheWCETofthetask,thesettingsofthelocaldispatcher,andtheamount ofcomputing

resourcestobegiventothetaskareknown,thentheactorexecutiontimecanbesettoreectthe

worst-case response-time of that task running in that local dispatcher, with that particular

amount ofallocatedresources [26].

2.4.2 Time DivisionMultiplexing scheduling (TDM)

TheeectofaTDMschedulingcanbemodelledbyreplacingtheworst-caseexecutiontimeof

theactorbyits worst-caseresponse-timeunderTDMscheduling. Theresponse-time ofan actor

i

is the total time necessary to complete re

i

, when resource arbitration eects (scheduling,

preemption, etc) are taken into account. This is counted from the moment the actor meets its

enabling conditions to themoment theringis completed. Assuming thata TDMwheelperiod

P

isimplementedonthe processorandthatatimeslicewithduration

S

isallocatedforthering

of

i

,suchthat

S ≤ P

,a timeintervalequalor longerthan

τ (i)

passesfromthemomentan actor

is enabled by the availability of enough input tokens to the completion of its rings. The rst

ofthis isthe arbitration time, i.e,thetimeittakesuntil theTDM scheduler grantsexecution

resources to the actor, oncethe ring conditions of the actor are met. In the worst-case,

i

gets

enabledwhen its timeslice hasjustended, which means thatthearbitration timeisthe timeit

takes for the slice of

i

to start again. Ifwe denote the worst-case arbitration time as

ˆ

r(i)

then

[12]:

ˆ

r(i) = P − S

(2.12)

2.4.3 Non-Preemptive Non-Blocking

Round-Robin scheduling

InaNon-PreemptiveNon-BlockingRound-Robin(NPNBRR)scheduler,allclustersassigned

tothesame processorareput ina circular scheduling list. The run-time scheduler goesthrough

this list continuously. It picks an actor from thelist and tries to execute it. The actor (or the

(24)

such that the actor can consume and produce tokens according to its ring rules, the actor

executesuntilthe ring isover, ifnot,the actoris skipped. Theprocessis repeatedfor thenext

actorinthecircular scheduling list, andsoon.

The worst-casearbitration timeofan actoris given by thesum oftheexecution timesof all

other actors mapped to the same NPNBRR-scheduled processor. The processing time is equal

to theactor'sexecution time, since there isno preemption. Thetotal response-timeis therefore

equal to the sum of execution times of all actors mapped to the NPNBRR-scheduled processor

[27].

2.4.4 Static-Order scheduling

Astatic-orderscheduleofasetofactors

A = {a

0 , a

1 , ..., a

n

}

mappedtothesameprocessoris

asequence of execution

so = |a

k

, a

l

, ..., a

m

|

that generatesextra precedenceconstraintsbetween

theactor in

A

suchthatfromthe start oftheexecutionofthegraph,

a

k

mustbe therst oneto

execute, followed by

a

l

and soon, up to

a

m

. After

a

m

executes, the execution restarts from

a

k

forthenext iteration of the graph.

Anystatic orderimposedto agroup of Single RateDataFlow actorsexecuting inthesame

processorcanberepresentedbyaddingedgeswithnotokensbetween them. Fromthelasttothe

rst actor in the static order, an edge is also added, witha single initial token. This construct

reectsthefactthat,thegraphexecutionbeingiterative,whenthestaticordernishesexecution

fora giveniteration, itrestarts itfrom therst actorinthestatic orderfor thenextiteration.

Notice that the new edges represent a series of sequence constraints enforced by the static

orderschedule anddo not represent anyreal exchange of databetween the actors.

2.4.5 Static periodic schedulers

A Static Periodic Scheduler (SPS) of an SRDF graph is a schedule such that, for all

nodes

i ∈ V

,and all

k > 0

:

s(i, k) = s(i, 0) + T · k

(2.13)

where

T

is the designed period of the SPS. Please note that an SPS can be represented

uniquelyby

T

and the valuesof

s(i, 0), ∀i ∈ V

[26].

2.5 Data Flow temporal analysis techniques

Temporal analysis is required inorder to verifywhether a given timed dataow graphcan

meetarequiredthroughputorlatencyrequirementofanapplication. Inthissectionwewillcover

someof the analysismethods available inthisregard.

2.5.1 Throughput analysis

In some systems, rate constraints are often imposed by designers on the execution rate of

eachprocess inthe systeminorder to ensurecorrect timingbehaviour andachieve performance

(25)

one. Thesephases occurinthe same order asthey were mentioned. When theexecution of the

graphisinitiated,thetransientphasebegins. Thisphasehasalimiteddurationwhere theinitial

tokens are distributed through the edges of the graph. Eventually the graph enters the next

phase: theperiodic one.

The state of a graph is dened by the amount of tokens present at each one of its edges.

Whenevera graphenters inthe periodic phase, thesame sequence ofstates repeatsitself

recur-rently. The time period required to repeat the same sequence of states is dened as the graph

period.

In thetransitionphase, the throughput analysiscan bederived bysimulatingtheexecution

of the data ow graph, given worst-case execution times to all actors [26]. Another known

technique for temporal analysisis theMaximum Cycle Mean. A simple explanation ofthese

twotechniques follows:

2.5.1.1 Simulation

This isperhaps the most direct approach to this problem. By running a reliable simulation

of the data ow graph, it is possible to verify if the throughput requirements are met or not.

Thesimulationtoolthatwe used, whichisgoing to be described indetailinchapter 3 ,provides

enough information sothat, incaseof violationof thethroughput specications, one can adjust

the graphcharacteristics (ifpossible)inorderto obtain a throughputcompliant graph.

2.5.1.2 Maximum Cycle Mean

Theaverageweightofadirectedcycleisthequotientbetweenthesummationoftheexecution

timeof all of itsactors and the total number ofinitial tokens present inthecycle, and is called

cycle mean. The maximum mean cycle problem for a directed graph with cycles is to nd a

cycle having the maximum average weight, called the maximum cycle mean, over all directed

cycles in the graph. Such a cycle is called a critical cycle. The maximum mean cycle problem

hasapplications inndingthe iteration bound ofa data owgraph for digitalsignalprocessing,

in performance analysis of synchronous, asynchronous, or mixed systems, and on throughput

analysisfor embedded systems[10 ].

The Maximum Cycle Mean (MCM),

µ(G)

ofa SRDFgraph

G

isdened as:

µ(G) = max

c∈C(G)

P

i∈N(c)

τ

i

P

e∈E(c)

d

e

(2.14)

where

C(G)

is thesetof simplecycles ingraph

G

.

Theorem 2.1. For an SRDF graph

G = (V, E, d, τ )

, it is possible to nd a Static Periodic

Schedule (SPS)if an onlyif

T ≥ µ(G)

. If

T < µ(G)

, then no SPS existswithperiod

T

.

Thistheorem andrespective proof arefoundingreater detail in[26 ].

2.5.1.3 Monotonicity

The monotonicity of a function, or in our case, of a self-timed execution, is a important

conceptto introduce at this stage since ithadbeen proved very usefulinthis context.

(26)

Denition 2.3. Monotonicity: A function

f (n)

ismonotonic increasing if

m ≤ n

implies

f (m) ≤ f (n)

. Similarly, it is monotonically decreasing if

m ≤ n

implies

f (m) ≥ f (n)

. A

function

f (n)

isstrictly increasing if

m < n

implies

f (m) < f (n)

and strictly decreasing if

m < n

implies

f (m) > f (n)

[8].

But we are more interested inthe application of themonotonicityconcept in a Single Rate

DataFlowcontext, specicallywhenapplied toself-timed schedulers. Assuch,wedene

mono-tonicityinthis context as:

Denition2.4. Monotonicity of aself-timed execution: In aSRDFgraph

G = (V, E, τ, d)

with worst-case self-timed schedule

s

W CST S

, for any

i ∈ V

, and

k ≥ 0

, it holds that, for any

self-timed schedule

s

ST S

of

G

s

ST S

(i, k) ≤ s

W CST S

(i, k)

(2.15)

Becauseofthemonotonicityof self-timedexecution,ifanygivenringof anactornishesit

executionfaster than its worst-caseexecution time (WCET), thenanysubsequent ringsinany

self-timedschedule canneverhappen later than inthe WCSTS,whichcan be seenasafunction

that bounds all start times for any self-timed execution of the graph. This was dened as a

theoremandproved in[26 ].

2.5.2 Latency analysis

Althoughthroughput isaveryusefulperformanceindicatorfor concurrent real-time

applica-tions,anotherimportantmetricislatency. Especiallyforapplicationssuchasvideoconferencing,

telephony and games, latency beyond a certain limit cannot be tolerated. Usually, the

depen-dencies on a SRDF graphallow some freedom in the execution order of the actors. This order

determines performance properties like throughput, storage requirements and latency [13 ].

La-tencyisthe time intervalbetween two events. We measurelatencyasthedierencebetween the

start timesof twospecic rings oftwo actors, i.e:

L(i, k, j, p) = s(j, p) − s(i, k)

(2.16)

where

i

and

j

areactors,

p

and

k

arerings. Wesaythat

i

isthesourceofthelatencyand

j

isthesink. Anotheruseful concept to takeinto account inthissection isthemaximumlatency,

heredened as:

ˆ

L(i, j, n) = max

k≥0

(s(j, k + n) − s(i, k))

(2.17)

where

n

is a xed iteration distance. Next we are going to refer some latency analysis

techniquesbypresenting some concretesituationswhere this type ofanalysisis used.

2.5.2.1 Maximum Latency from a periodic source

In data ow, a source is an actor that models a generator of data, such as an antenna for

example. Duetotheunpredictablebehaviourofthemodelleddevice,asourcecanberepresented

asanactorthatdoesnothaveasetofringrulesassociatedbutproducestokensintoitsoutgoing

(27)

period

T

that correspondsto the elapsed timebetween twoconsecutive productions.

Aswehave seeninsection 2.4.5 ,thestart timesof aperiodicsource aregiven by:

s(i, k) = s(i, 0) + T · k

(2.18)

The period oftheexecution ofthe graph isimposedbythesourcesince the sourceexecutes

witha period

T

,theperiodof the execution of the graph islowerbounded by theperiodof the

source. If on the other hand, the graph has a longer period, then it cannot keep up with the

source, and innite token accumulation on some buer will happen for the WCSTS. Therefore,

we willperform thelatencyanalysis underthe assumption that

µ(G) = T

.

For the determination of the maximum latency for this case, we rst need to establish the

conceptofRate-OptimalStaticPeriodicSchedule(ROSPS).Thisdesignationisattributed

to a Static Periodic Schedule that hasa period

T

equal to the MCMof theSRDF graph

µ(G)

.

Consideringthis concept,the maximumlatency fora periodicsource can be written as:

ˆ

L(i, j, n) = max

k≥0

(s

ST S

(j, k + n) − s(i, k)) ≤ ˇ

s

ROSP S

(j, 0) − s(i, 0) + µ(G) · n

(2.19)

Where

s

ˇ

ROSP S

(j, 0)

representsthesmallesttime of

j

inanadmissibleROSPS.Wecan

deter-minethemaximumlatency fora periodicsource justbycalculatinganROSPSwiththeearliest

start time

j

and a WCSTS for the earliest start time of

i

. A more extended approach to this

subject, includinga more detailedexplanation onthelogic behindexpression 2.19can befound

in[26 ].

2.5.2.2 Maximum latency from a sporadic source

Inreactivesystems,itfrequentlyhappensthatthesourceisnotstrictlyperiodic,butproduces

tokenssporadically,withaminimuminterval

µ

betweensubsequentrings. Typically,amaximum

latency constraint must be guaranteed. For any given graph with this type of source, it is

mandatory that it has to be able to sustain a throughput of

1/µ

in order to guarantee that it

cannotbe overran bysuch a source, operatingat its fastest rate. Thismeans that theMCM of

the graph,

µ(G)

, is such that

µ(G) ≤ µ

. Theproof of this statement relies on the possibility of

bounding the self-timed behaviour of a graph by static periodic

µ

, which is possible as long as

µ(G) ≤ µ

.

Inordertokeepthesimplicityofthischapter,wepresentthenalexpressionforthemaximum

latencyofagraphwithasporadicsource,omittingallthestepsthatweretakeninitsdeduction:

ˆ

L(i, j, n) ≤ ˇ

s

µ

(j, 0) − s(i, 0) + µ · n

(2.20)

The latency

L(i, j, n)

ˆ

with a sporadic source has the same upper bound as the latency for

thesamesource

i

,sink

j

and iteration distance

n

inthesame graphwithaperiodicsource with

period

µ

.

(28)

A bursty source is characterized as a source that may re at most

n

times within any

T

timeinterval, withaminimal

∆t

interval between consecutive rings. Ajob thatprocessessuch

a source must have

µ(G) ≤ T /n

to be able to guarantee its processing within bounded buer

space. Moreover, if

µ(G) ≤ ∆t

, then we have the previous case, i.e, maximum latency from

a sporadic source. If

µ(G) ≥ ∆t

then the latency may accumulate over iterations, as the job

processes input tokens slower than the rate at which they arrive. The maximum latency must

occur when the longest burst occurs, with the minimum interval between rings of the source,

thatisaburstof

n

tokenswith

∆t

spacing. Becauseofmonotonicity, makingthesource execute

fastercannot make the sinkexecute slower, but italso cannotguarantee thatitexecutes faster.

source i

source i´

actor j

T

Figure2.6: Arrivaltimesfrom tokensofaburstysourcerelativelytostrictly periodicsource

As depicted in gure 2.6 , the tokens of the bursty source

i

will arrive earlier than for the

periodic source

i

0

. Therefore, at iteration

n − 1

after the beginning of the burst (iteration 0)

happensthe earliest time:

s(i, n − 1) = s(j, n − 1) ≤ ˇ

s

ROSP S

(j, 0) + (n − 1) · µ(G)

(2.21)

Assuch,aboundon themaximumlatencyis givenby:

ˆ

L(i, j, n) ≤ ˇ

s

ROSP S

(j, 0) − s

ROSP S

(i, 0) + (n − 1)(µ(G) − ∆t)

(2.22)

2.6 Conclusion

By choosing a dataow modelasa programming and analysismodel, we can nowusetheir

analytical proprieties to our advantage. In this chapter were dened all the concepts essential

to understand and use the tools provided by the data ow paradigm, which included a brief

introduction to graphtheory. Data ow is very useful for analysing streaming applications and

modelling multiprocessor systems. All the systems that we are going to work with in future

chapters fall into one of these categories, or both, and that is one of the main reasons why we

(29)

Software framework

The execution of our project relied heavily on the utilization of a set of software tools for

obtaining results. The core of this set of tools is theHeracles data ow simulator. During the

executionof this project, some questionswereansweredusing theexistingfunctionalities ofthis

simulator,but aswe progressed deeper into our problem, we had to extend the toolswith a set

of functionalities that addressour specic needs. The simulator possessesa modular structure,

which eases the insertion of new functionalities through integration of custom modules or even

throughthemodicationofexistingones, minimizingthenecessityoftamperingwiththecoreof

theprogram.

By default,the resultsprovided bythesimulation toolwere presentedin atext format. On

theendofasuccessfulsimulation,asummarywiththeresultsfromthedataowgraphbehaviour

is shown inthe console while a detailed list with the taskexecutions characteristics was stored

onatext le. Thistype ofvisualizationwaspractical forsome casesbut created someconfusion

inothers. For example,itisdicult toidentify executionsthatoverlapintimejustbyanalysing

alistof its startand nishtimes. So,inorder to make thesimulation results more readable,we

decidedtointegrate anexternal visualizationtoolthatallowed thetimingbehaviourofthetasks

simulatedto bepresentedina coloured Gantt chartform.

In thischapter we explainindetailall thefunctionalitiesand changesmentioned sofar.

3.1 The Heracles data ow simulator

3.1.1 Heracles tool ow

TheHeraclessimulatorwaswritten usingObjectiveCaml(Objective CategoricalAbstract

Machine Language). OCaml isa dialectof the ML (Meta-Language) family of languages, which

derive from the Classic ML language designed by Robin Milner in 1975 for the LCF (Logic of

ComputableFunctions) theoremprover[14].

OCaml sharesmanyfeatureswithotherdialects ofML,andit provides several newfeatures

ofits own. Themain characteristics ofthis language areasfollows:

•

It is afunctional language,meaning thatfunctions aretrated asrst-classvalues.

•

It is strongly typed, meaning that the type of every variable and every expression in a

(30)

•

Related to strongtyping,OCaml usestype inferenceto infertypesfor theexpressionsin

a program.

•

Thetype systemispolymorphic,meaningthatitispossibletowriteprograms thatwork

for values ofanytype.

Although ML languages are mostly functional, they also include some imperative traits,

which allows that a program written in OCaml can evidence both type of programming traits.

The Heracles simulator was written using this paradigm. For more information regarding this

programminglanguage,please refer to [17],[7 ] and[22].

OCaml oerssome distinctadvantages whencompared tootherimperativelanguages, asfor

example Cor Java. One ofthe most attractive features of this language isthe type inference

preformed by the compiler. This feature allows a user to write programs without explicitly

indicatethe type of data(integer, oat, stringetc.) of its dened variables: the compiler infers

them by observing the context where the variable is inserted. This characteristic allows the

compiler to detect and identify a great number of bugs that otherwise would be dicult to

discover. Thishappensbecausethe compiler,asit infersthetypeof a certainvariable, willalso

checktheconsistencyof therestof thefunction regardingalltheoperationsthatthis variableis

included. Thisway,onlyprograms thatareabletomaintain datatype coherence throughout all

the operations areableto compile successfully.

Another advantageof OCaml is the abstraction of pointers. While this feature issource for

many troublesome bugs in languages as C or C++, in OCaml, the compiler handles all these

references. The user is not allowed to change the intrinsic value of these references: only the

variables thatthey point to. From a userstandpoint,thesyntaxused for deningand changing

memorypositions when referredthrougha pointer(in OCaml we usetheref operator to dene

areferenceor pointer) arerelatively easierto manipulate than itsimperativecounterpart.

Finally,one ofthe mostdistinctcharacteristicsofOCamlisthecodecompression. Withthis

language is possible to write complex applicationsusing a third ofthe code linesthat thesame

programwouldneedwhen writtenonafullyimperativelanguage. Duetotypeinferenceandthe

functional nature of this language, a function of medium complexity can sometimes be written

inasinglecode line.

3.1.2 Explanation of the software model

TheHeraclessimulationmodulewasdevelopedwiththepurposeofprovidingaccuratetiming

simulations. The usage of this toolismainly text based: thesystemto be simulated is inserted

into thetoolvia a text lecontaining adescription of thegraph to be simulated, namely,a list

withalltheactors andtheirrespectiveexecutiontimes, assomeotherrelevant characteristics, as

for example, the mapped processor, associated priority, etc, and a list of all theedges withthe

sourcesand sinksproperlyidentied,theinitialtokens, production andconsumption rates. This

textleisthenparsedsothatallthe componentsandmaincharacteristicsmentionedbeforecan

be extracted into internal variables.

Thesimulatoroperatesonasetofobjectsofarecordtype(commonlyknownalsoasstructure

orclass inotherprogramminglanguages) calledEvent. Alongwithplenty ofuseful information,

anevent also representsthe start or nishof ascheduled task. Theseeventsareaggregated ina