Universidade de Aveiro
2012
Departamento de Eletrónica, Telecomunicações e
Informática
Ricardo Daniel Lopes
Almeida
Escalonadores de Prioridade Fixa em
Multiprocessadores de tempo-real
Dissertação apresentada à Universidade de Aveiro para cumprimento dos
requisitos necessários à obtenção do grau de Mestre em Engenharia Eletrónica
e de Telecomunicações, realizada sob a orientação científica pelo Dr. Paulo
Bacelar Reis Pedreiras, Professor Auxiliar do Departamento de Eletrónica,
Telecomunicações e Informática da Universidade de Aveiro e co-orientação
científica por Dr. Orlando Miguel Pires dos Reis Moreira, Principal DSP
Systems Engineer na empresa ST-Ericsson.
Apoio financeiro da FCT e do FSE no
âmbito do III Quadro Comunitário de
Apoio.
o júri
Presidente
Professor Doutor José Alberto Gouveia Fonseca
Professor auxiliar do Departamento de Eletrónica e Telecomunicações da
Universidade de Aveiro
Professor Doutor Paulo Bacelar Reis Pedreiras
Professor auxiliar do Departamento de Eletrónica e Telecomunicações da
Universidade de Aveiro
Doutor Orlando Miguel Pires dos Reis Moreira
Principal DSP Systems Engineer em ST-Ericsson
Professor Doutor Luís Miguel Pinho de Almeida
Professor associado do Departamento de Engenharia Eletrotécnica e de
Computadores da Faculdade de Engenharia da Universidade do Porto
Palavras-chave
Escalonadores de prioridade fixa, data-flow, tempo-real, multiprocessadores,
carga computacional, sistemas embebidos, processamento digital de sinal.
Resumo
Devido evolução tecnológica observada nos últimos anos, os sistemas
embutidos com capacidade de multi processamento tornaram-se comuns.
Nestes dispositivos, a escassez de recursos obriga a uma distribuição
otimizada dos mesmos pelas diversas atividades suportadas.
Este tipo de dispositivos contam normalmente com um processador de uso
geral, tipicamente um processador da família ARM, e um ou mais
processadores direcionados a tarefas específicas, como processadores
vetoriais (EVP), utilizados em sistemas de processamento digital de sinal por
exemplo.
A distribuição de recursos pelas tarefas do sistema é feita por um escalonador.
Este pode fazer a distribuição de recursos obedecendo a uma das várias
disciplinas conhecidas: Round Robin, First In First Out, Time Division
Multiplexing, Fixed Priority, etc.
O presente trabalho tem como principal objetivo a investigação de
escalonadores de tempo-real baseados em prioridades fixas, com especial
atenção para a aplicações de streaming a executar em plataformas
multiprocessador, utilizando dataflow.
Dataflow é um paradigma que utiliza teoria de grafos para realizar a
modelação, programação e análise de aplicações e sistemas.
A primeira parte deste projeto é dedicada à análise e modelação de grafos de
fluxo de dados onde a distribuição de recursos é feita com recurso a um
escalonador de prioridade fixa. A segunda parte será dedicada ao estudo da
interferência entre tarefas com níveis de prioridades distintos em grafos
independentes, quando mapeados para execução no mesmo processador. Em
sistemas embebidos, existem tarefas de alta prioridade (periódicas ou
esporádicas) que têm de ser atendidas o mais rapidamente possível quando
prontas a executar. Este atendimento irá interferir na execução de tarefas que
corram na mesma plataforma com níveis de prioridade inferiores, pois estas
serão bloqueadas durante a execução das tarefas de maior prioridade. Esta
interferência tem como consequências diretas a diminuição do tempo de
resposta das tarefas de alta prioridade e o aumento do tempo de execução das
tarefas com níveis de prioridades baixos.
Com este trabalho pretendemos verificar quais as vantagens e desvantagens
que um escalonador de prioridade fixa pode oferecer neste tipo de situações,
quando comparado com outros escalonadores.
Keywords
Fixed priority schedulers, data-flow, real-time, multiprocessors, computational
load, embedded systems, digital signal processing.
Abstract
Due to the technological evolution that happened recently, embedded systems
with multiprocessing capabilities are becoming common. Application
requirements often impose resource constrains, leading to the necessity of
distributing them in an efficient manner.
This type of devices counts normally with a general purpose processor,
typically from the ARM family, and one or more task specific processors, such
as vector processors (EVP), used in digital signal processing systems for
instance.
The resource distribution through the tasks is done by a scheduler. The
scheduling can be done through one of the known scheduling policies: Round
Robin, Fist In First Out, Time Division Multiplexing, Fixed Priority, etc.
The main goal with this project is to investigate fixed-priority real-time
schedulers, with special focus to streaming applications executing on
multiprocessor platforms, using dataflow.
Dataflow is a paradigm that uses graph theory for modelling, programming and
analysis of applications and systems.
The fist part of this project is dedicated to the analysis and modelling of fixed
priority dataflow graphs with shared resources distributed through a fixed
priority scheduler. The second part is dedicated to the study of interference
between tasks with different levels of priority on independent graphs, when
mapped to execution on the same processor.
Embedded systems frequently have high priority tasks (periodic or sporadic)
that need to be dispatched as soon as they become ready to execute. This
action is going to interfere in the execution of tasks that are running in the same
platform but with lower priority levels, since they are going to be blocked during
the execution of the high priority tasks. This interference has two direct
consequences: a lower response time for the high priority tasks and an
increase in the execution time for the tasks in lower priority levels.
With our work, we intend to investigate the advantages and disadvantages that
a fixed priority scheduler can offer in this type of situations, when compared
with other schedulers.
1 Introduction 1
1.1 Fixed priority scheduling: A historicalreview . . . 1
1.2 Streaming applications . . . 3
1.3 Real-Time applications . . . 3
1.3.1 Timing requirements . . . 4
1.3.2 Scheduling. . . 4
1.4 Fixed priority scheduling applications. . . 4
1.4.1 Preemption . . . 5
1.5 State ofthe art . . . 5
1.5.1 Classical real-timetheory . . . 5
1.5.2 SymTA/S . . . 7
1.5.3 Real-Time calculus . . . 7
1.6 Data ow graphs . . . 8
1.7 Problem description . . . 8
1.8 Developed work . . . 9
1.9 Thesis organization . . . 9
2 Data Flow computationmodels 11 2.1 Graphs. . . 11
2.1.1 Directed graphs . . . 11
2.1.2 Path and cyclesina graph . . . 12
2.2 Data Flow . . . 12
2.2.1 Actor rings. . . 13
2.3 Temporal analysis . . . 13
2.3.1 Schedules . . . 14
2.3.2 Single Rate DataFlow . . . 15
2.3.3 Timed Single RateDataFlowgraphs . . . 15
2.3.4 Application graphs . . . 15
2.4.2 Time Division Multiplexingscheduling (TDM) . . . 16
2.4.3 Non-Preemptive Non-BlockingRound-Robin scheduling . . . 16
2.4.4 Static-Order scheduling . . . 17
2.4.5 Static periodicschedulers . . . 17
2.5 Data Flowtemporal analysistechniques . . . 17
2.5.1 Throughput analysis . . . 17
2.5.2 Latency analysis . . . 19
2.6 Conclusion. . . 21
3 Software framework 22 3.1 The Heracles dataow simulator . . . 22
3.1.1 Heracles toolow. . . 22
3.1.2 Explanation of the softwaremodel . . . 23
3.1.3 Usage of thetool . . . 26
3.2 Conclusion. . . 27
4 Implementations introducedinthe software framework 28 4.1 Mainchanges to thecode . . . 28
4.2 Conclusion. . . 31
5 Intra-Graph xed priority analysis for data-ow graphs 32 5.1 Problem denition . . . 32
5.2 Theory . . . 33
5.2.1 Data-Flow analysisof axed prioritysystem . . . 33
5.3 Multi processormapping analysis . . . 38
5.3.1 Overview . . . 38
5.3.2 Worst-caseresponse time . . . 39
5.3.3 Analysis ofstart times . . . 39
5.4 Software implementation . . . 51
5.5 Conclusion. . . 53
6 Inter-graph xed priority analysis 55 6.1 Problem denition . . . 55
6.2 Theory . . . 56
6.2.1 Denition of loadof aprocessor . . . 56
6.2.2 Initial considerations and evolution ofthe concept . . . 56
6.2.3 Establishing timeintervals . . . 58
6.4 Conclusion. . . 69
7 Results 70 7.1 Analysis ofintra-graph xedpriority dataow graphs . . . 70
7.1.1 Data Flowanalysisof axed prioritygraph . . . 70
7.1.2 Simulationresults . . . 71
7.1.3 Multi Processormapping analysis. . . 72
7.2 Load analysisof a WirelessLANand aTDSCDMA job . . . 78
7.2.1 Theoretical approach . . . 78
7.2.2 Simulationresults . . . 80
7.3 Conclusion. . . 85
8 Conclusion and future work 87
Introduction
Multiprocessorsystemsaregettingcommon nowadays. Duetothetechnological advancesin
thisarea,todayitismorepracticalandecient tocreatesystemswithmorethanoneprocessor,
relinquishing specic tasks to specic processors. This devices are known as Multi Processor
Systemson Chip[MPSoC]. Bydoingthis,not only wecan benetfrom theparallel execution of
tasksbutwecan alsousesomeunique traitsof aprocessorto increaseour processingcapability.
Mostcomputing embedded systems thatperform some digital signal processing possess at least
twotypesofprocessingunits: ageneral-purposecoreandavectorprocessor. Alltheowcontrol
decisions are performed by the general-purpose processor while the processing of vectors and
matrixoperationsaredoneinthe vectorprocessor,takingadvantageofitscapabilityofhandling
multiply-accumulate operationson manyinputvalues simultaneously.
Inorder tomaximize the productivityofsuch devicesitisusual tomapseveral applications
on the same MPSoC device. With such computational power at our disposal, we need an
ef-cient mechanism to distribute the computational load through the available platforms. Every
computational systemwithlimited shared resources, like memory, processorcores or peripheral
access among others, needs a proper resource sharing mechanism. A scheduler is essentially a
programthat coordinates theaccessto resources. Inmost embedded systems,itisthescheduler
who decideswhich taskcan be executed at some point in time. Since every task has a dened
numberofresourcesthatitneedtoexecute,thescheduler istheoneresponsibleforensuringthat
agiven taskcanonly beset to executionwhen the correspondingset ofresources isavailable.
Duetothenatureoftheapplicationswhereembeddedsystemsaredesignedto,itisexpectable
thattheyperform at leastone or moretasks withreal-timecomputing constraints.
This dissertation focuses in two major goals: the characterization of xed priority graphs,
i.e,determination oftheworst-caseresponse-timeandstarttimesfor thetasksthatcomposethe
system, and the study of the interference between tasks with dierent priorities when mapped
into thesameprocessing platform.
In theremainder of this chapter, we will dene the fundamental concepts neededto
under-standand dene our problem.
1.1 Fixed priority scheduling: A historical review
A real-time system is one with explicit deterministic or probabilistic timing requirements.
proach to scheduling produced systems thatwereinexible and dicult to maintain [11]. More
advancetechniqueswhere requiredfor the design,analysisandimplementation ofhard-realtime
systems. Itwashopedthat these techniqueswould provide additional exibility whilst enabling
thepredictability of suchsystems to be guaranteed.
Subsequently,awiderangeofschedulingstrategieshavebeenproposed. Thesestrategiescan
be characterized by their prescribed run-time behaviours and the forms of associated analysis
provided for predicting/optimizing system behaviour. At one extreme, for a simple application
model, static scheduling (cyclicexecutives)provides very deterministic yetinexible behaviour.
Theother extremeis oftenknown asbest-eort scheduling [18];itfacilitates maximum-run time
exibility, but, at best allows only probabilistic predictions of run-time performance. Fixed
priority scheduling falls between these two extremes: it is oftencriticised as being too static by
the proponents of best-eort scheduling and as being too dynamic by the supporters of cyclic
executives. However, itisapredictable approach: o-lineguarantees regardingprocessdeadlines
can be aorded using appropriated analysis. In reality itrepresentsa practical, highlyeective
approach toscheduling alarge classof real-timeapplications.
Work in a xed priority scheduling concentrated on two separate issues: policies for the
assignment of priorities to processesand feasibility testsfor process sets. Theassumptions and
constraints for much of this work are identical to those described by Liu and Layland in 1973
[24]:
1. All processes areperiodic;
2. All processes have a deadlineequal to theirperiod;
3. All processes areindependent;
4. All processes have a xedcomputation time;
5. No process mayvoluntarily suspend itself;
6. All processes arereleasedassoon astheyarrive;
7. All overheads areignored (assumedto be0).
Development of real-time theoryprogressed steadily, before a resurgencein the1980's. The
motivation for this renewed interest stemmed for many diverse factors, including the
realiza-tion that the requirements of hard (i.e safety critical) real-time systems outstripped available
theoretical analysis (for example, formal methods, scheduling theory etc.) and implementation
techniques. Typicalreal-timesystemsimplementedpriorto themid 1980'sincluded basic
avion-icscontrol,laboratory control etc. Looking forward fromthis point intime,thefuturereal-time
systems were considered to be applications such as the space station, robots, intelligent
man-ufacturing and advanced avionics control. The common requirements shared by these systems
were the need for dynamic and adaptive behaviour, including elements of articial intelligence,
togetherwithanincreased demandfor predictability andreliability.
Another factor in the renaissance of real-time systems research was the rapid development
ofhardware(e.gminicomputersinthe1970'sandmicrocomputersinthe1980's)whichledto
sors became more complex, with the inclusion of pipelines and caches, and peripheral devices
became more intelligent. The availabilityof such hardware within thecontext of hard real-time
applicationsprompted further work intermsofanalysis [2].
1.2 Streaming applications
A streaming application is anapplication thatoperatesover a long(potentially innite)
sequenceof inputdata items, also refereed as data stream. Thedata is fed into theapplication
normallyfromanexternal sourceandeachdatatokenisprocessedinalimitedtimebeforebeing
discarded [34 ]. This process outputs also a long (potentially innite) sequence of output data
items. Thistypeofapplicationsarecommoninsignal-processingfunctionswherewehavealways
some type of antenna as the external source of data. In this situation, the application has no
controloverthe incomingorvolumeofthedatatobeprocessed. Asexamplesofstreaming
appli-cationswe can indicate software-dened radio, radar tracking, audio and video decoding, audio
and video processing, cryptographic kernels or network processing [26 ]. Streaming applications
followareactivemodeland,whentheapplication requiressynchronizationwiththedatastream,
temporal restrictionsarealso applied to it.
1.3 Real-Time applications
The validity of the results produced by a real-time application are depended on their
functionalcorrectness, asonanyothertypeofapplication,but alsofromthetimeinwhichthese
resultsareproduced. Although correct,anoutputfromareal-timeapplication maybeirrelevant
ifitviolatesitstemporaldeadline. Theterm"may"usedonthelastsentenceimpliestheexistence
ofmorethan onetypeofreal-time systems. Areal-timeapplicationcan becategorizedinto three
typesaccordingto itstemporal restrictions[6 ]:
•
Soft - If this type of restriction is violated, the associated result maintains some of itsutilityto the application, although thereis degradationinthequalityofservice.
Let us consider an automated gate asan example. If there is a signicant delay between
the reception of the activation signal for the open button and the activation of the gate
motor, itis annoying for a driverbuttheend result itisstill usable.
•
Firm - Ifa rmdeadline isoverdue, the consequent resultis unusable but theintegrity ofthe systemand the userare not compromised. As anexample we can refer data collected
from asensorarraythatit isusedfor autopilot navigation. Ifsome ofthesamples arrived
after the established deadline, they are useless. But aslong as some other dataarrivesin
a timelyfashion,the systemis stillable to function correctly.
•
Hard - For this type of restrictions, a deadline violation could also imply a catastrophicconsequence to the system. Every critical security system is characterized by having at
leastonehardtemporalrestriction. Asanexamplewecanrefertoalifesupportsystemor
thetractioncontrol systemof acar. Ifthecontrolsystemisnot abletomeet itsdeadlines,
accordingwiththe previous temporal restrictions:
•
Soft Real-Time - Thesesystems onlypossesssoft orrm temporalrestrictions•
Hard Real-Time-All thesystemsthatpossessat leastonehard temporalrestrictionarecategorized under thislabel.
1.3.1 Timing requirements
Timing requirementscome intwo basic types: throughput and latency. Iftherate at which
aniterativeapplicationproducesresultsisimportant,thenweareinthepresenceofathroughput
requirement. Iftheminimumor maximumtimeintervalbetween thearrivalofaninput andthe
production of the corresponding output areto be respected, then theapplication hasa latency
requirement. Aheartratemonitorisanexampleofanapplicationwiththroughputrequirements.
In order to return a correct value for this measure, all theheart beats in a given time interval
mustbereadand thetimebetween themmustberespected,althoughtheprocessingandoutput
ofthenalvalue couldsuersome delay resultingin aservicedegradation. The navigation and
actuationsignalsinacarexempliesasystemwithlatencyrequirements. Itisimportantthatthe
maximumtime between theactuationonthebrakepedalandtheactuationon thebrakesystem
isrespected, for instance. In this case, due to the random natureof all the possible stimulus to
the system,no throughput requirementsarepresent, at leastnot inthe systemsconsidered.
Temporal requirements can also appear in the form of a required worst-case timing,
best-case timing or both. If the worst-case timing coincide with the best-case timing, the result is
designatedasanon-time requirement. Inthis projectweconcentrate our attention inworst and
best-case timingcalculations.
1.3.2 Scheduling
A typical computational systemis comprised of several resources (processors, memory,
pe-ripheraldevices,etc.) that shouldbeused concurrently bydierent tasks. Theseresources need
to be assigned to the concurrent tasksin an orderly and ecient fashion. The set of predened
criteriathatregulates theallocationof resourcesto tasksiscalledascheduling policy. Thesetof
rules that, at any time, determines the order in which tasks are executed is called a scheduling
algorithm. The specic operation of allocating a resource to a task selected by the scheduling
algorithm is referred as dispatching [6]. There are several known scheduling algorithms in
exis-tence: FirstInFirstOut,RoundRobin,ShortestRemainingTime,FixedPriority,TimeDivision
Multiplexing, etc. Every one of these algorithms has advantages and disadvantages that had
been studied throughout the years. Our project will be focused mainly on the Fixed Priority
scheduling algorithm.
1.4 Fixed priority scheduling applications
In a xed priority scheme, all tasks arecharacterized byan immutable priority value.
Nor-mallythisvalueisanumericone. Theorderinwhichthesevaluesareassigneddependsessentially
onthe systemspecicationsbut conventionally higherpriorities receive smallervalues.
where
T
designatesatask,i
andj
indicatenumericalpriorityvalueswithi, j ∈ N
0
. Thescheduleruses priorities to determine the next job to be scheduled. These are calculated at design time
and never change during execution, hence the term xed [33]. In xed priority scheduling, the
dispatcherwillmakesurethatatanytime,thehighestpriorityrunnabletaskisactuallyrunning.
1.4.1 Preemption
In a pre-emptive system, if we have a task with a lowpriority running, and a highpriority
taskarrives,i.e,someeventhadoccurredandthedispatcherneedstodeployataskintoexecution,
the low priority task will be suspended and the high priority task will start running. If while
the the highprioritytask is running, a task witha medium priority arrives, the dispatcherwill
leave it unprocessed and the high priority task will carry on running, nishing its computation
in a later time. Only when both the high and medium priority tasks have completed can the
lowpriority taskresumeits execution. Thislow prioritytaskcan thencarry on executing until
eithermorehigherprioritytasksarriveorithasnisheditswork[35]. Iftheplatforminusedoes
not supportpreemption,then thetaskswithhigher priorities areonly set to executionahead of
the lower priority ones ifthey could be started at the same time instant. Otherwise, ifa lower
prioritytasksisalreadyexecuting intheplatformwhen ahigherprioritytaskbecomesreadyfor
executionit justgetsblocked, at leastuntil theexecuting tasknishesits current execution.
1.5 State of the art
1.5.1 Classical real-time theory
Real-Timeisasubjectthathasbeenstudiedforsometime,whichledtothedevelopmentofa
considerabletheoryaroundit,knownnowadaysasclassical real-time theory. Inthisintroductory
section,wearegoingtofocusonlyonon-lineschedulingwithxedprioritieswithspecialattention
tothetwomaincriteriaforclassicalschedulingusingxedpriority: therate-monotonicandthe
deadline monotonic criteria [6] [25 ]. This type of scheduling has some advantages regarding
o-linescheduling. Namely:
•
Anyalteration inthetaskscharacteristicsis immediatelytakeninto account bythesched-uler.
•
It can easily accommodatesporadic tasks.•
Deterministic behaviouron overloads since itonly aectsthetaskswithlower priorities.Asexpected, therearesome disadvantagesthat gowiththepros mentioned before:
•
The on-line scheduling has a morecomplex implementation since it requiresa kernel withxed priorities.
•
Thistypeof schedulingrequirestheactionofascheduler and adispatcher, which impliesahigher execution overhead.
The Rate Monotonic (RM) scheduling algorithm is a simple rule that assigns priorities to
tasksaccording to their requestrates. Specically, taskswithhigher requestrates, whichmeans
shorter periods, will have higher priorities and vice-versa. Since periods are constant, RM is a
xed-priority assignment: a priority
P
i
is assigned to the task before execution and does notchange over time. For theremaining of this section, we will assumetheexistence of preemption
by the platform. In the initial analysis performed in [24 ], the Rate Monotonic algorithm is
intrinsicallypre-emptive,and allthetasksareindependent,i.e,therearenosharedresources. In
thiscontext,arunningtaskwillbepreemptedbyanewlyarrivedtaskwithshorterperiod. Since
the scheduleis builton-line, itmaybeuseful to knowa priori ifa givenset oftasksrespects its
temporal requirements. To aidus inthis subject there are two main types of tests that can be
performeduponthe taskset:
•
TestsbasedontheutilizationrateoftheCPU-Theseconsistsininequalitiesappliedto the tasks characteristics, such as their worst-case execution time, period and deadline.
The verication of these inequalities allow us to conclude if a given task as guaranteed
activationsor not. Thetworeferencecriteria forthissubjectaretheMinor boundofLiu
and Leylandand theHyperbolic bound of Bini, Buttazzo and Buttazzo. A more
detailed explanationofeach canbefoundin[24]and[3]respectively. Ourprojectdoesnot
deal directlywithlocaldeadlines, sowe willnot progress anyfurther inthis subject.
•
Tests based in the response-time - For systems with arbitrary xed priorities, theanalysis ofthe response-timeallowus to perform a schedulability testthat, assuming that
thesystemallowspreemptionandsynchronousactivation,isnecessaryandsucient. These
tests consists in computing the worst-case response-time, i.e, the maximum elapsed time
between the activation of a task and its completion, and then check if it is below the
deadline. For further information,please refer to [1].
1.5.1.2 Deadline-Monotonic scheduling
The Deadline Monotonic (DM) priority assignment weakens the "period equals deadline"
constraintwithinastaticpriorityschedulingscheme. Theapplicationoftheschedulingalgorithm
assumesthat every taskis characterized by aphase
φ
i
,a worst-caseconstant computation timeC
i
for each instance, a constant relative deadlineD
i
and a periodT
i
. According to the DMalgorithm,eachtaskisassignedaxedpriority
P
i
,inversely proportionalto itsrelative deadlineD
i
. Thus, at any instant,thetaskwiththeshortest relative deadlineis executed. Sincerelativedeadlinesareconstant,DMisastaticpriorityassignment. AsinRateMonotonic,DMisnormally
usedinafullypre-emptive mode. [6]
0
period isequal to the deadline, meaning that, ifa tasksetis schedulable by some xedpriority
assignment, thenis also schedulable by DM. The proof of this assumption and a more detailed
explanation on this algorithm can be found in [23 ]. A more comprehensive overview on
Rate-Monotonicand Deadline-Monotonic scheduling isavailable in[2].
1.5.2 SymTA/S
SymTA/S is a system-level performance and timing analysis approach based on formal
scheduling analysis techniques and symbolic simulation. It is essentially a software tool used
todetermine system-levelperformance datasuchasend-to-end latencies, busand processor
uti-lization and worst-case scheduling scenarios. SymTA/S focus its utilization mainly on MPSoC
designs, where the complexity level achieved due to all the concurring hardware makes manual
analysisandoptimization a very timeconsumingand proneto errorstask.
The core ofthe SymTA/S toolis a technique to couple local scheduling analysis algorithms
usingevent streams. For amoredetailed descriptionofthesealgorithms, please referto [29 ] and
[30].
In order to perform a system level analysis, SymTA/S locally performs existing scheduling
analysisusing a well know algorithm, like for example Rate-Monotonic, Time Division Multiple
Access, Round Robin, etc., and propagates their results to the neighbouring components. This
analysis-propagate mechanism is repeated iteratively until all components are analysed, which
meansthat alloutput streamsremained unchanged.
A more accurate descriptionofthis toolcan befound in[16] and[15 ].
1.5.3 Real-Time calculus
Real-Time Calculus establishes alinkbetween three areas,namelyMax-Plus LinearSystem
Theory[9]asusedfordealingwithcertainclassesofdiscreteeventsystems,NetworkCalculus [4]
forestablishingtime boundsincommunicationnetworks,andreal-timescheduling. Inparticular,
it shows that important results from scheduling theory can be easily derived and unied using
Max-Plus Algebra. In its essence, Real-Time Calculus focus on the characterization of sets of
task
T
1
, . . . , T
i
, . . . , T
n
byarequest anda demandcurveα
i
r
andα
i
d
respectively. Thesetasksareall processed by one processing unit characterized by a delivery curve
β
using a static priorityscheduler with preemption. It is important to refer that the tasks are sorted with decreasing
priority.
The algorithm consists in an iterative process to determine the tasks priorities such as the
whole task system can be successfully scheduled. The process consists in selecting the tasks
in increasing order of priority and perform a schedulability test based on the task deadlines,
demandcurveandthe deliverycurveoftheprocessingunit. Ifanyofthetasksfailsthistest,the
wholesetcan notbescheduled. Otherwise,theschedulabletaskisremovedfromthesetandthe
whole selection procedure is repeated until there is no more tasks left. A more comprehensive
Inthisprojectweintendtousethedata owparadigmtotacklethexedpriorityscheduling
problem. Dataowhasdevelopedintoausefultool,withextensiveuseintheanalysisofstreaming
applications, modelling multiprocessor environments and dealing with concurrent applications.
Theapplication of dataowinthe situationsindicated isdone through theuseof graphtheory
toestablish mathematical models foranalysis usingthetools provided bytheparadigm.
In themost general sense, a dataow graph is a directedgraph withactors represented by
nodes and arcs representing connections between the actors. These connections convey values,
corresponding to data packets, also designated as tokens, between the nodes. Connections are
conceptuallyFIFOqueueswhichpermitinitial tokensonthem.
The operation in which an actor consumes a certain number of tokens from its incoming
edges and thenstarts executing is known asan actor ring. Theset of rules thatcontrol this
ring, namely the minimum number of tokens present in the incoming edges, is know as the
ring rules.
If actors arepermitted to produce and consume onlyone token peractivation, theresulting
graph is designated as a Single Rate Data Flow graph. If, on the other hand, an actor can
consumeand produce multipletokensinits activations,thegraphisnow knownasaMulti Rate
Data Flow graph. Independently of the rateofconsumption and production oftheactors,ifthe
quantity oftokensinanyactor operation isconstant and well dened,we obtain a Synchronous
Data Flow graph[5].
All these concepts will be addressedingreater detailinfuture chapters.
1.7 Problem description
Embedded platforms for streaming applications are expected to handle several streams at
thesame time,each one withits ownrate. Thisfunctionality can be divided injobs. Ajob is a
groupof communicating tasks that are started and stopped independently. The approach that
hasbeentakensofarforanalysisresortstothemodellingofthesesystemsusingdataowgraphs
[21].
The overall scheduling strategy used mixes static (compile-time) and dynamic techniques
(run time). The scheduling of tasks that belong to the same job, or intra-job scheduling, is
handled by means of static order, i.e, per job and per processor, a static ordering of actor is
foundthatrespectsthe Real-Timerequirementswhiletryingto minimizeprocessorusage.
Inter-jobscheduling is handledbymeansof local TimeDivision Multiplex(TDM)schedulers.
The biggest disadvantage of TDM schedulers is that they waste many resources for
low-latency,low throughputtasks.
The goal of this project is to investigate how the ow must be changed to allow the usage
of a non-budget-based scheduler, such asFixed Priority. In order to achieve this goal, we must
follow thefollowing steps:
•
Determinewhetherthe dataowanalysisisstill possible undertheseconditions andunderwhich conditions analysiscan still be carriedout.
•
Propose a method for priority assignment per processor per job and design the scheduler•
The resourcemanager hasto beadapted to handlea FixedPriority schedule.Theprocessorusageisanimportantfactortotakeintoaccount. Insharedresourcesplatform,
like the MPSoC devices that we refer in this document, the response-time of a taskor a job is
related to the capacity that a particular resource, specically a processor, has to process that
instance. Ontheotherhand,thiscapacityorprocessoravailabilityisrelatedtothecomputational
loadrequired fromotherjobsor tasks withhigherpriority.
Theanalysisandcharacterizationofthecomputationalloadofasharedprocessorisalsoone
ofthefocalpointsofour project.
1.8 Developed work
The organization ofthis projectfollowed thepointsestablished intheprevious section. The
contributions ofthis projectto the state oftheartcan be summarizedinto thefollowing points:
1. Data Flow Analysis - A comprehensive analysis of data ow models of xed-priority
systems comprisedthe bulkofour initial work. Thisanalysiswascentredinthe
character-ization of best and worst case response-times for xed-priority dataow graphs. Initially
this analysisconsideredthewholesystemmapped onasingleprocessingunitandlater on,
thebehaviourof thesame typeofsystemsmapped ondierentplatforms wasstudied,
giv-ing emphasis to the dependenceand interference between tasksthe same job but mapped
on dierent processing units.
2. Computational load analysis - We formalized the concept that quanties the amount
of work required from a processor by a particular task. In a Fixed-Priority scheduling, it
is useful to characterize the amount of time that a processor is busy with a high priority
task, thus allowing us to determine the availability of thesame processorto execute lower
prioritytasks.
3. Extension ofthe tools available- Forthe analysisofallthesystemsconceivedtostudy
the xed-priority approach to this scheduling problem we had at our disposition a set of
softwaretools,namely adataowgraph simulator.
These toolsdid not contemplate eitherthe simulation of xedpriority dataow graphsor
thefunctionalitiesto perform loadanalysisofaprocessing unit. Inorderto obtainreliable
results to supportour study,itwasnecessaryto addthese functionalities.
In order to simplify the readability of the results provided by this set of tools, we also
included thenecessarychanges for anintegration withan external visualizationtool.
1.9 Thesis organization
The remainder ofthis thesis is organized as follows: in chapter 2 we reviewdata ow
com-putation models and their analytical properties. The mathematical notation for representing
data ow graphs is also introduced in this chapter. The software framework used throughout
this project is introduced in chapter 4, which includes a detailed explanation of the usage and
functioningof the set of toolsavailable. The changesand implementations made to provide the
implementations to obtainresults. Chapter 6 follows a similar template of theprevious chapter
butnowrelativeto inter-graphxedpriorityanalysis. Thepracticalresults,eitherfromsoftware
simulationsor from analysisof practical examples, and their respective discussionarepresented
Data Flow computation models
This dissertation uses data ow computation models for modelling and analysing various
systems. In this chapter, we present the notation for the data ow model that we will use
throughout this document and the propertiesof several dataow computation models that are
relevant to our work. Thisis reference material and most of itcan be found in [26 ] [5 ] [28] [32]
[21].
2.1 Graphs
Inthis dissertation,we usedataowanalysis,whichinturnusesgraph theoryinits
formal-ization. Therefore we need to rstintroduce graph theory.
2.1.1 Directed graphs
Denition 2.1. A directed graph
G
is an ordered pairG = (V, E)
, whereV
is the set ofvertexesor nodes and
E
is the set of edgesor arcs. Each edge isan ordered pair(i, j)
wherei, j ∈ V
. Ife = (i, j) ∈ E
, we saythate
is directedfromi
toj
.i
issaid tobe the source nodeof
e
andj
is the sink node ofe
. We also denote the source and sink nodes ofe
assrc(e)
andsnk(e)
, respectively.A
B
C
Figure2.1: Anexampleofadirectedgraph
The graph depicted on thepreviousgure isdescribed bythefollowing sets:
E = {(A, B), (B, C), (B, B), (C, A)}
(2.2)It isalso a directedgraph: node
A
isdirected tonodeB
,nodeB
isdirected to nodeC
anditselfthrougha self-edge andnode
C
isdirected to nodeA
.2.1.2 Path and cycles in a graph
A path in a directed graph is a nite, nonempty sequence
e
1
, e
2
, ...., e
n
of edges such thatsnk(e
i
) = src(e
i+1
)
, fori = 1, 2, ..., n − 1
. We say that path(e
1
, e
2
, ..., e
n
)
is directed fromsrc(e
1
)
tosnk(e
n
)
; we also say that this path transversessrc(e
1
), src(e
2
), ..., src(e
n
)
andsnk(e
n
)
; the path is simple if each node is traversed once, that issrc(e
1
), ..., src(e
n
), snk(e
n
)
are all distinct; the path is a circuit if it contains edges
e
k
ande
k+m
such thatsrc(e
k
) =
snk(e
k+m
), m ≥ 0
;acycle isa path suchthat thesubsequence(e
1
, e
2
, ..., e
n−1
)
isasimple path andsrc(e
1
) = snk(e
n
)
[26 ].E
C
A
B
D
Figure2.2: Exampleofagraphwithasimplepath
In thepreviousgure, thesimple path
{(A, C), (C, D), (D, B), (B, A)}
describesacycle.2.2 Data Flow
Data ow is a natural paradigm for describing Digital Signal Processing applications for
the concurrent implementation on parallel hardware. Data ow programs for signal processing
are directed graphs where each node represents a function and each arc represents a signal
path. More specically, in a data ow graph, nodes represent actors. An actor is a time
consumingentity associatedwith ring rules. An edge or arcina dataowgraph represents
a First-In-First-Out queue that directs values from the output of an actor to the input of
another.
In dataow, datais transportedindiscrete chunks, referred to astokens. When an actor
startsanexecution,it consumes adened numberof tokensfrom itsincoming edges.
Concep-tually,this consumption is areading operation of thedatatokens thatareneeded forbeginning
theexecution. Thesetokensremainintheedge(FIFO)duringtheexecutionoftheactor. Bythe
end of that execution, the actorproduces a dened number of tokens into its outgoing edges.
Thisproductionprocessisawritingoperationontotheoutgoingedges(FIFOs). Itisalsopossible
toperformareservation ofspaceintheoutgoing edgesduringthestart oftheexecutioninorder
tokensproduced or consumed is specieda priori.
Thedataowprincipleisthatanyactorcanre(performitscomputation) wheneverinput
dataare available on all of its incoming edges. A actor withno input edges may reat any
time. This implies that many actors may re simultaneously, hence the concurrency. Because
theprogram executioniscontrolledbytheavailabilityof data,dataowprogramsaresaid tobe
data-driven [21].
2.2.1 Actor rings
Atthispoint,itisusefultodene theringconceptinthedataowcontext,sincethesame
willbereferredin thefuture.
As described in theprevious section, in data ow, every edge has also two associated
val-uations:
prod : E → N
andcons : E → N
. For a given edgee ∈ E
,prod(e)
gives the constantnumberof tokens produced by
src(e)
one
ineach ring andcons(e)
givestheconstant numberoftokensconsumed by
snk(e)
ineachring.An actor ring is an indivisible quantum of computation. A set of ring rules give
preconditions for a ring. Firing consumes tokens from the input streams and produces tokens
intotheoutputstreams. Theringsthemselvescanbedescribedasfunctions,andtheinvocation
ofthese rings iscontrolledbyring rules [20 ].
Thestarttimeofaringreferstothetimeinstant atwhichtheringrules areveried and
thetokensfrom theinputstreams areconsumed. We aregoingto usethefollowing notation:
s(i, k) = m,
m ∈ N
0
(2.3)where
i
is denotes the actorandk
the instance oftheactivation.As such, the nish time of a ring corresponds to the time instant at which the tokens
resultant from the computation are produced into the output streams. Just like for the start
time,to indicate aparticular nish timewe referto a similarnotation
f (i, k) = m,
m ∈ N
0
(2.4)An actor ring can be designated asa task instance insome contexts. Task instances
are used mostly in classical real-time theory while actor rings are their counterpart in data
ow.
2.3 Temporal analysis
Execution time of an actor -
τ
Before the denition of TimedSRDF itis important todene theconceptof execution time ofan actor.
Denition 2.2. The execution time
τ (i)
of an actori
is the elapsed time between the starttimeofthe ringforthatactorandthenish timeofthe ring,attheendofthatexecution.
The execution time can be dened in a more general sense, as
τ (i)
, in which it isassumed thatall executions of actor
i
have a constant execution time, or it can be specied asτ (i, k)
, wherek
inadvance, for analyticalpurposes itis oftenconvenient to useboundsto thisvalue.
A given execution timeof an actor
i
can be upper bounded by a worst-case execution timeˆ
τ (i, k)
andbe lowerbounded by abest-case execution timeτ (i, k)
ˇ
. The following property mustalways hold:
ˇ
τ (i) ≤ τ (i, k) ≤ ˆ
τ (i),
∀i ∈ G, ∀k ∈ N
0
(2.5)2.3.1 Schedules
Inthecontext ofthisproblem, itisnecessarytodevelopaconcisedenitionof schedulethat
is consistent with the type of result that we plan to obtain. At this point it is important to
makeadistinctionbetween schedulers inanimplementation, asforexampleFixedPriority,Time
DivisionMultiplexing,RoundRobinetc.,andtheexecutionofdataowgraphsusingaschedule,
asfor instance aSelf-timedor a StaticPeriodicschedule. Thissectionwill addressthelatter. It
is important to indicate from the beginning that, in this context, we will work with Self-timed
schedules. Amoreintuitivedesignationforthistypeofschedules isASAPSchedules (AsSoonAs
Possible)since the start timesvector for every actor isdetermined fromtheprinciple thatevery
taskshouldstartassoonasithasconditionsforit. So,inasimilarwaythatschedulershadbeen
denedinothersituations, a scheduleris denedto aspecic actor
i
,which,inour denition, isprecededbyanother actor
j
. The edgeconnectingboth actors possesa numberd(i, j)
oftokensonit, asthe next gureillustrate:
j
i
d(i, j)
Figure2.3: Simplearrangementoftwoactorsconnectedthroughanedge
Fromthis arrangement, we canwrite the following expressionfor theschedule ofactor
i
:s
Self T imed
=
−∞,
k < 0
max(max
∀(i,j)∈E
(s(j, (k − d(i, j))) + τ (j)), 0), k ≥ 0
(2.6)For atwoactorarrangementastheoneinthepreviousgure,we canelaboratethefollowing
logic:
A
B
d(A, B) = 0
Figure2.4: Twoactorsconnectedthroughanedgewithnotokens
From this gurewe can writethat:
s(B, k) ≥ s(A, k) + τ (A)
(2.7)The starttime of the
k
th
iteration ofactor
B
isgoing tobealwaysτ (A)
timeafterthestarttimeofthe
k
th
iterationoftheprecedentactor
A
. Butifwehavesometokensbetweentheactors,A
B
d(A, B) = 1
Figure2.5: Twoactorsconnectedthoughanedgewithone token
Witha token inthe edgeconnectingthetwo actors, thepreviousexpression 2.7needsto be
adapted:
s(B, k) ≥ s(A, k − 1) + τ (A)
(2.8)Since nowactor
B
doesnot need to waitfor actorA
to produce at leastone token for ittostart executing, the start time of this actor is now referenced to the
(k − 1)
th
iteration of the
precedent actor. If we expandthis logic to
d(A, B)
tokens intheinterconnecting edge, we reachthebottom branch of expression 2.6. Since a negative value for the start time of an execution
does not make sense in the context of this problem, we included the
0
argument in the maxexpression,sothat insucha case, theminimum start timeofa execution isgoingto bezero.
TheWorst-Case Self-TimedSchedule ofanSRDFgraphistheself-timedscheduleofan
SRDFwhereeveryiterationofeveryactor
i
takesˆ
τ (i)
toexecuteandwhereˆ
τ (i)
istheworst-caseexecutiontime of the actor. Notethatthe WCSTS ofan SRDFgraph isunique.
2.3.2 Single Rate Data Flow
If ina dataowgraph we can verify that
prod(e) = cons(e)
for every edgee ∈ E
, thenthegraphisaSingle Rate Data Flow(SRDF)graph. A SRDFgraphisone where every actorin
itconsumesand producesthe same numberof datatokens. We can formalize this conceptwith
G
SRDF
= (V, E, d, τ )
(2.9)V
andE
are alreadydened indenition 2.1.d
is a valuationd : E → N
0
.d(i, j)
is calledthe delayof edge
(i, j)
andrepresents thenumberofinitial tokensinarc(i, j)
.2.3.3 Timed Single Rate Data Flow graphs
We can now include the execution time of every actor of the graph into consideration and
dene aTimed SRDF graph:
G
T imedSRDF
= (V, E, d, ˆ
τ )
(2.10)where
τ
ˆ
representstheworst-case response-time ofan actor.2.3.4 Application graphs
Inthecourse ofourwork,werealizedthatweneed tofurtherspecifythedenitionofTimed
SRDFreferredabove,byincludinga newparameterinto consideration. Ournewgraph instance
diersjustslightly from equation2.10 :
G
app
= (V, E, d, ˇ
τ , ˆ
τ )
(2.11)Inordertousedataowtoanalyseaparticularschedule,rstitneedtobemodelledusingthe
dataow paradigm. Inthe present section we will present strategies to perform this modelling,
usingconcreteexamples astoillustrate the process.
2.4.1 Task scheduling
There are two types of task scheduling mechanisms that we are interested in modelling:
Compile-Timeand Run-Time Scheduling.
Compile-Time Scheduling (CTS) encompasses scheduling decisions that are xed at
compile-time, such asstatic orderscheduling.
Run-Time Scheduling (RTS) refers to scheduling decisions that cannot be resolved at
compile-time, becausethey depend onthe run-time task-to-processor assignment,which inturn
depends on the dynamic job-mix. This is handled by the local scheduling mechanism of the
processor. Modelling theworst-caseeect of thelocal scheduler on theexecution of an actor is
neededto include inthe compile-time analysistheeects ofsharing processing resources among
jobs. IftheWCETofthetask,thesettingsofthelocaldispatcher,andtheamount ofcomputing
resourcestobegiventothetaskareknown,thentheactorexecutiontimecanbesettoreectthe
worst-case response-time of that task running in that local dispatcher, with that particular
amount ofallocatedresources [26].
2.4.2 Time DivisionMultiplexing scheduling (TDM)
TheeectofaTDMschedulingcanbemodelledbyreplacingtheworst-caseexecutiontimeof
theactorbyits worst-caseresponse-timeunderTDMscheduling. Theresponse-time ofan actor
i
is the total time necessary to complete rei
, when resource arbitration eects (scheduling,preemption, etc) are taken into account. This is counted from the moment the actor meets its
enabling conditions to themoment theringis completed. Assuming thata TDMwheelperiod
P
isimplementedonthe processorandthatatimeslicewithdurationS
isallocatedfortheringof
i
,suchthatS ≤ P
,a timeintervalequalor longerthanτ (i)
passesfromthemomentan actoris enabled by the availability of enough input tokens to the completion of its rings. The rst
ofthis isthe arbitration time, i.e,thetimeittakesuntil theTDM scheduler grantsexecution
resources to the actor, oncethe ring conditions of the actor are met. In the worst-case,
i
getsenabledwhen its timeslice hasjustended, which means thatthearbitration timeisthe timeit
takes for the slice of
i
to start again. Ifwe denote the worst-case arbitration time asˆ
r(i)
then[12]:
ˆ
r(i) = P − S
(2.12)2.4.3 Non-Preemptive Non-Blocking
Round-Robin scheduling
InaNon-PreemptiveNon-BlockingRound-Robin(NPNBRR)scheduler,allclustersassigned
tothesame processorareput ina circular scheduling list. The run-time scheduler goesthrough
this list continuously. It picks an actor from thelist and tries to execute it. The actor (or the
such that the actor can consume and produce tokens according to its ring rules, the actor
executesuntilthe ring isover, ifnot,the actoris skipped. Theprocessis repeatedfor thenext
actorinthecircular scheduling list, andsoon.
The worst-casearbitration timeofan actoris given by thesum oftheexecution timesof all
other actors mapped to the same NPNBRR-scheduled processor. The processing time is equal
to theactor'sexecution time, since there isno preemption. Thetotal response-timeis therefore
equal to the sum of execution times of all actors mapped to the NPNBRR-scheduled processor
[27].
2.4.4 Static-Order scheduling
Astatic-orderscheduleofasetofactors
A = {a
0
, a
1
, ..., a
n
}
mappedtothesameprocessorisasequence of execution
so = |a
k
, a
l
, ..., a
m
|
that generatesextra precedenceconstraintsbetweentheactor in
A
suchthatfromthe start oftheexecutionofthegraph,a
k
mustbe therst onetoexecute, followed by
a
l
and soon, up toa
m
. Aftera
m
executes, the execution restarts froma
k
forthenext iteration of the graph.
Anystatic orderimposedto agroup of Single RateDataFlow actorsexecuting inthesame
processorcanberepresentedbyaddingedgeswithnotokensbetween them. Fromthelasttothe
rst actor in the static order, an edge is also added, witha single initial token. This construct
reectsthefactthat,thegraphexecutionbeingiterative,whenthestaticordernishesexecution
fora giveniteration, itrestarts itfrom therst actorinthestatic orderfor thenextiteration.
Notice that the new edges represent a series of sequence constraints enforced by the static
orderschedule anddo not represent anyreal exchange of databetween the actors.
2.4.5 Static periodic schedulers
A Static Periodic Scheduler (SPS) of an SRDF graph is a schedule such that, for all
nodes
i ∈ V
,and allk > 0
:s(i, k) = s(i, 0) + T · k
(2.13)where
T
is the designed period of the SPS. Please note that an SPS can be representeduniquelyby
T
and the valuesofs(i, 0), ∀i ∈ V
[26].2.5 Data Flow temporal analysis techniques
Temporal analysis is required inorder to verifywhether a given timed dataow graphcan
meetarequiredthroughputorlatencyrequirementofanapplication. Inthissectionwewillcover
someof the analysismethods available inthisregard.
2.5.1 Throughput analysis
In some systems, rate constraints are often imposed by designers on the execution rate of
eachprocess inthe systeminorder to ensurecorrect timingbehaviour andachieve performance
one. Thesephases occurinthe same order asthey were mentioned. When theexecution of the
graphisinitiated,thetransientphasebegins. Thisphasehasalimiteddurationwhere theinitial
tokens are distributed through the edges of the graph. Eventually the graph enters the next
phase: theperiodic one.
The state of a graph is dened by the amount of tokens present at each one of its edges.
Whenevera graphenters inthe periodic phase, thesame sequence ofstates repeatsitself
recur-rently. The time period required to repeat the same sequence of states is dened as the graph
period.
In thetransitionphase, the throughput analysiscan bederived bysimulatingtheexecution
of the data ow graph, given worst-case execution times to all actors [26]. Another known
technique for temporal analysisis theMaximum Cycle Mean. A simple explanation ofthese
twotechniques follows:
2.5.1.1 Simulation
This isperhaps the most direct approach to this problem. By running a reliable simulation
of the data ow graph, it is possible to verify if the throughput requirements are met or not.
Thesimulationtoolthatwe used, whichisgoing to be described indetailinchapter 3 ,provides
enough information sothat, incaseof violationof thethroughput specications, one can adjust
the graphcharacteristics (ifpossible)inorderto obtain a throughputcompliant graph.
2.5.1.2 Maximum Cycle Mean
Theaverageweightofadirectedcycleisthequotientbetweenthesummationoftheexecution
timeof all of itsactors and the total number ofinitial tokens present inthecycle, and is called
cycle mean. The maximum mean cycle problem for a directed graph with cycles is to nd a
cycle having the maximum average weight, called the maximum cycle mean, over all directed
cycles in the graph. Such a cycle is called a critical cycle. The maximum mean cycle problem
hasapplications inndingthe iteration bound ofa data owgraph for digitalsignalprocessing,
in performance analysis of synchronous, asynchronous, or mixed systems, and on throughput
analysisfor embedded systems[10 ].
The Maximum Cycle Mean (MCM),
µ(G)
ofa SRDFgraphG
isdened as:µ(G) = max
c∈C(G)
P
i∈N(c)
τ
i
P
e∈E(c)
d
e
(2.14)where
C(G)
is thesetof simplecycles ingraphG
.Theorem 2.1. For an SRDF graph
G = (V, E, d, τ )
, it is possible to nd a Static PeriodicSchedule (SPS)if an onlyif
T ≥ µ(G)
. IfT < µ(G)
, then no SPS existswithperiodT
.Thistheorem andrespective proof arefoundingreater detail in[26 ].
2.5.1.3 Monotonicity
The monotonicity of a function, or in our case, of a self-timed execution, is a important
conceptto introduce at this stage since ithadbeen proved very usefulinthis context.
Denition 2.3. Monotonicity: A function
f (n)
ismonotonic increasing ifm ≤ n
impliesf (m) ≤ f (n)
. Similarly, it is monotonically decreasing ifm ≤ n
impliesf (m) ≥ f (n)
. Afunction
f (n)
isstrictly increasing ifm < n
impliesf (m) < f (n)
and strictly decreasing ifm < n
impliesf (m) > f (n)
[8].But we are more interested inthe application of themonotonicityconcept in a Single Rate
DataFlowcontext, specicallywhenapplied toself-timed schedulers. Assuch,wedene
mono-tonicityinthis context as:
Denition2.4. Monotonicity of aself-timed execution: In aSRDFgraph
G = (V, E, τ, d)
with worst-case self-timed schedule
s
W CST S
, for anyi ∈ V
, andk ≥ 0
, it holds that, for anyself-timed schedule
s
ST S
ofG
s
ST S
(i, k) ≤ s
W CST S
(i, k)
(2.15)Becauseofthemonotonicityof self-timedexecution,ifanygivenringof anactornishesit
executionfaster than its worst-caseexecution time (WCET), thenanysubsequent ringsinany
self-timedschedule canneverhappen later than inthe WCSTS,whichcan be seenasafunction
that bounds all start times for any self-timed execution of the graph. This was dened as a
theoremandproved in[26 ].
2.5.2 Latency analysis
Althoughthroughput isaveryusefulperformanceindicatorfor concurrent real-time
applica-tions,anotherimportantmetricislatency. Especiallyforapplicationssuchasvideoconferencing,
telephony and games, latency beyond a certain limit cannot be tolerated. Usually, the
depen-dencies on a SRDF graphallow some freedom in the execution order of the actors. This order
determines performance properties like throughput, storage requirements and latency [13 ].
La-tencyisthe time intervalbetween two events. We measurelatencyasthedierencebetween the
start timesof twospecic rings oftwo actors, i.e:
L(i, k, j, p) = s(j, p) − s(i, k)
(2.16)where
i
andj
areactors,p
andk
arerings. Wesaythati
isthesourceofthelatencyandj
isthesink. Anotheruseful concept to takeinto account inthissection isthemaximumlatency,
heredened as:
ˆ
L(i, j, n) = max
k≥0
(s(j, k + n) − s(i, k))
(2.17)
where
n
is a xed iteration distance. Next we are going to refer some latency analysistechniquesbypresenting some concretesituationswhere this type ofanalysisis used.
2.5.2.1 Maximum Latency from a periodic source
In data ow, a source is an actor that models a generator of data, such as an antenna for
example. Duetotheunpredictablebehaviourofthemodelleddevice,asourcecanberepresented
asanactorthatdoesnothaveasetofringrulesassociatedbutproducestokensintoitsoutgoing
period
T
that correspondsto the elapsed timebetween twoconsecutive productions.Aswehave seeninsection 2.4.5 ,thestart timesof aperiodicsource aregiven by:
s(i, k) = s(i, 0) + T · k
(2.18)The period oftheexecution ofthe graph isimposedbythesourcesince the sourceexecutes
witha period
T
,theperiodof the execution of the graph islowerbounded by theperiodof thesource. If on the other hand, the graph has a longer period, then it cannot keep up with the
source, and innite token accumulation on some buer will happen for the WCSTS. Therefore,
we willperform thelatencyanalysis underthe assumption that
µ(G) = T
.For the determination of the maximum latency for this case, we rst need to establish the
conceptofRate-OptimalStaticPeriodicSchedule(ROSPS).Thisdesignationisattributed
to a Static Periodic Schedule that hasa period
T
equal to the MCMof theSRDF graphµ(G)
.Consideringthis concept,the maximumlatency fora periodicsource can be written as:
ˆ
L(i, j, n) = max
k≥0
(s
ST S
(j, k + n) − s(i, k)) ≤ ˇ
s
ROSP S
(j, 0) − s(i, 0) + µ(G) · n
(2.19)
Where
s
ˇ
ROSP S
(j, 0)
representsthesmallesttime ofj
inanadmissibleROSPS.Wecandeter-minethemaximumlatency fora periodicsource justbycalculatinganROSPSwiththeearliest
start time
j
and a WCSTS for the earliest start time ofi
. A more extended approach to thissubject, includinga more detailedexplanation onthelogic behindexpression 2.19can befound
in[26 ].
2.5.2.2 Maximum latency from a sporadic source
Inreactivesystems,itfrequentlyhappensthatthesourceisnotstrictlyperiodic,butproduces
tokenssporadically,withaminimuminterval
µ
betweensubsequentrings. Typically,amaximumlatency constraint must be guaranteed. For any given graph with this type of source, it is
mandatory that it has to be able to sustain a throughput of
1/µ
in order to guarantee that itcannotbe overran bysuch a source, operatingat its fastest rate. Thismeans that theMCM of
the graph,
µ(G)
, is such thatµ(G) ≤ µ
. Theproof of this statement relies on the possibility ofbounding the self-timed behaviour of a graph by static periodic
µ
, which is possible as long asµ(G) ≤ µ
.Inordertokeepthesimplicityofthischapter,wepresentthenalexpressionforthemaximum
latencyofagraphwithasporadicsource,omittingallthestepsthatweretakeninitsdeduction:
ˆ
L(i, j, n) ≤ ˇ
s
µ
(j, 0) − s(i, 0) + µ · n
(2.20)The latency
L(i, j, n)
ˆ
with a sporadic source has the same upper bound as the latency forthesamesource
i
,sinkj
and iteration distancen
inthesame graphwithaperiodicsource withperiod
µ
.A bursty source is characterized as a source that may re at most
n
times within anyT
timeinterval, withaminimal
∆t
interval between consecutive rings. Ajob thatprocessessucha source must have
µ(G) ≤ T /n
to be able to guarantee its processing within bounded buerspace. Moreover, if
µ(G) ≤ ∆t
, then we have the previous case, i.e, maximum latency froma sporadic source. If
µ(G) ≥ ∆t
then the latency may accumulate over iterations, as the jobprocesses input tokens slower than the rate at which they arrive. The maximum latency must
occur when the longest burst occurs, with the minimum interval between rings of the source,
thatisaburstof
n
tokenswith∆t
spacing. Becauseofmonotonicity, makingthesource executefastercannot make the sinkexecute slower, but italso cannotguarantee thatitexecutes faster.
source i
source i´
actor j
T
Figure2.6: Arrivaltimesfrom tokensofaburstysourcerelativelytostrictly periodicsource
As depicted in gure 2.6 , the tokens of the bursty source
i
will arrive earlier than for theperiodic source
i
0
. Therefore, at iteration
n − 1
after the beginning of the burst (iteration 0)happensthe earliest time:
s(i, n − 1) = s(j, n − 1) ≤ ˇ
s
ROSP S
(j, 0) + (n − 1) · µ(G)
(2.21)Assuch,aboundon themaximumlatencyis givenby:
ˆ
L(i, j, n) ≤ ˇ
s
ROSP S
(j, 0) − s
ROSP S
(i, 0) + (n − 1)(µ(G) − ∆t)
(2.22)2.6 Conclusion
By choosing a dataow modelasa programming and analysismodel, we can nowusetheir
analytical proprieties to our advantage. In this chapter were dened all the concepts essential
to understand and use the tools provided by the data ow paradigm, which included a brief
introduction to graphtheory. Data ow is very useful for analysing streaming applications and
modelling multiprocessor systems. All the systems that we are going to work with in future
chapters fall into one of these categories, or both, and that is one of the main reasons why we
Software framework
The execution of our project relied heavily on the utilization of a set of software tools for
obtaining results. The core of this set of tools is theHeracles data ow simulator. During the
executionof this project, some questionswereansweredusing theexistingfunctionalities ofthis
simulator,but aswe progressed deeper into our problem, we had to extend the toolswith a set
of functionalities that addressour specic needs. The simulator possessesa modular structure,
which eases the insertion of new functionalities through integration of custom modules or even
throughthemodicationofexistingones, minimizingthenecessityoftamperingwiththecoreof
theprogram.
By default,the resultsprovided bythesimulation toolwere presentedin atext format. On
theendofasuccessfulsimulation,asummarywiththeresultsfromthedataowgraphbehaviour
is shown inthe console while a detailed list with the taskexecutions characteristics was stored
onatext le. Thistype ofvisualizationwaspractical forsome casesbut created someconfusion
inothers. For example,itisdicult toidentify executionsthatoverlapintimejustbyanalysing
alistof its startand nishtimes. So,inorder to make thesimulation results more readable,we
decidedtointegrate anexternal visualizationtoolthatallowed thetimingbehaviourofthetasks
simulatedto bepresentedina coloured Gantt chartform.
In thischapter we explainindetailall thefunctionalitiesand changesmentioned sofar.
3.1 The Heracles data ow simulator
3.1.1 Heracles tool ow
TheHeraclessimulatorwaswritten usingObjectiveCaml(Objective CategoricalAbstract
Machine Language). OCaml isa dialectof the ML (Meta-Language) family of languages, which
derive from the Classic ML language designed by Robin Milner in 1975 for the LCF (Logic of
ComputableFunctions) theoremprover[14].
OCaml sharesmanyfeatureswithotherdialects ofML,andit provides several newfeatures
ofits own. Themain characteristics ofthis language areasfollows:
•
It is afunctional language,meaning thatfunctions aretrated asrst-classvalues.•
It is strongly typed, meaning that the type of every variable and every expression in a•
Related to strongtyping,OCaml usestype inferenceto infertypesfor theexpressionsina program.
•
Thetype systemispolymorphic,meaningthatitispossibletowriteprograms thatworkfor values ofanytype.
Although ML languages are mostly functional, they also include some imperative traits,
which allows that a program written in OCaml can evidence both type of programming traits.
The Heracles simulator was written using this paradigm. For more information regarding this
programminglanguage,please refer to [17],[7 ] and[22].
OCaml oerssome distinctadvantages whencompared tootherimperativelanguages, asfor
example Cor Java. One ofthe most attractive features of this language isthe type inference
preformed by the compiler. This feature allows a user to write programs without explicitly
indicatethe type of data(integer, oat, stringetc.) of its dened variables: the compiler infers
them by observing the context where the variable is inserted. This characteristic allows the
compiler to detect and identify a great number of bugs that otherwise would be dicult to
discover. Thishappensbecausethe compiler,asit infersthetypeof a certainvariable, willalso
checktheconsistencyof therestof thefunction regardingalltheoperationsthatthis variableis
included. Thisway,onlyprograms thatareabletomaintain datatype coherence throughout all
the operations areableto compile successfully.
Another advantageof OCaml is the abstraction of pointers. While this feature issource for
many troublesome bugs in languages as C or C++, in OCaml, the compiler handles all these
references. The user is not allowed to change the intrinsic value of these references: only the
variables thatthey point to. From a userstandpoint,thesyntaxused for deningand changing
memorypositions when referredthrougha pointer(in OCaml we usetheref operator to dene
areferenceor pointer) arerelatively easierto manipulate than itsimperativecounterpart.
Finally,one ofthe mostdistinctcharacteristicsofOCamlisthecodecompression. Withthis
language is possible to write complex applicationsusing a third ofthe code linesthat thesame
programwouldneedwhen writtenonafullyimperativelanguage. Duetotypeinferenceandthe
functional nature of this language, a function of medium complexity can sometimes be written
inasinglecode line.
3.1.2 Explanation of the software model
TheHeraclessimulationmodulewasdevelopedwiththepurposeofprovidingaccuratetiming
simulations. The usage of this toolismainly text based: thesystemto be simulated is inserted
into thetoolvia a text lecontaining adescription of thegraph to be simulated, namely,a list
withalltheactors andtheirrespectiveexecutiontimes, assomeotherrelevant characteristics, as
for example, the mapped processor, associated priority, etc, and a list of all theedges withthe
sourcesand sinksproperlyidentied,theinitialtokens, production andconsumption rates. This
textleisthenparsedsothatallthe componentsandmaincharacteristicsmentionedbeforecan
be extracted into internal variables.
Thesimulatoroperatesonasetofobjectsofarecordtype(commonlyknownalsoasstructure
orclass inotherprogramminglanguages) calledEvent. Alongwithplenty ofuseful information,
anevent also representsthe start or nishof ascheduled task. Theseeventsareaggregated ina