elatório técnico
echnic report
t al
r
IPP-HURRAY! Research Group
Polytechnic Institute of Porto School of Engineering (ISEP-IPP)
Distributed Computer-Controlled Systems: the DEAR-COTS Approach
Paulo VERÍSSIMO (FCUL) António CASIMIRO (FCUL)
Luís Miguel PINHO Francisco VASQUES (FEUP)
Luís RODRIGUES (FCUL) Eduardo TOVAR HURRAY-TR-0021
October 2000
THIS WORK IS PARTIALLY SUPPORTED BY FCT UNDER PROJECT DEAR-COTS (Praxis/P/EEI/14187/1998)
Distributed Computer-Controlled Sytems: the DEAR-COTS Approach
Paulo VERÍSSIMO, António CASIMIRO, Luís RODRIGUES University of Lisboa, FCUL
Bloco C5, Campo Grande, 1749-016 Lisboa
Portugal
Tel.: +351.21.7500087
E-mail: {pjv,casim,ler}@di.fc.ul.pt http://www.hurray.isep.ipp.pt
Luís Miguel PINHO, Eduardo TOVAR IPP-HURRAY! Research Group
Polytechnic Institute of Porto (ISEP-IPP) Rua Dr. António Bernardino de Almeida, 431 4200-072 Porto
Portugal
Tel.: +351.22.8340502, Fax: +351.22.8340529 E-mail: {lpinho, llf}@dei.isep.ipp.pt
http://www.hurray.isep.ipp.pt
Francisco VASQUES University of Porto (FEUP) Rua Dr. Roberto Frias 4050-123 Porto Portugal
Tel.: +351.22.5081702, Fax:
E-mail: [email protected] http://www.fe.up.pt/~vasques
Abstract:
This paper proposes a new architecture targeting real-time and reliable Distributed
Computer-Controlled Systems (DCCS). This architecture provides a structured approach for
the integration of soft and/or hard real-time applications with Commercial Off-The-Shelf
(COTS) components. The Timely Computing Base model is used as the reference model to
deal with the heterogeneity of system components with respect to guaranteeing the timeliness
of applications. The reliability and availability requirements of hard real-time applications are
guaranteed by a software-based fault-tolerance approach.
SYSTEMS: THEDEAR-COTS APPROACH 1
P. Verssimo
, A.Casimiro
, L.M. Pinho
,
F.Vasques
, L. Rodrigues
,E. Tovar
UniversityofLisboa, FCUL,BlocoC5,Campo Grande,
1749-016 Lisboa, Portugal,E-mail: pjv,casim,[email protected]
Polytechnic InstituteofPorto, ISEP,Ruade S~ao Tome,
4200-072 Porto,Portugal,E-mail: lpinho,[email protected] t
Universityof Porto, FEUP,RuaDr. RobertoFrias, 4200-465
Porto, Portugal,E-mail: [email protected]
Abstract: This paper proposes a new architecture targeting real-time and reliable
Distributed Computer-Controlled Systems (DCCS). This architecture provides a
structured approach for the integration of soft and/or hard real-time applications
with Commercial O-The-Shelf (COTS) components. The Timely Computing Base
model is used as the reference model to deal with the heterogeneity of system
componentswithrespecttoguaranteeingthetimelinessofapplications.Thereliability
and availability requirements of hard real-time applications are guaranteed by a
software-basedfault-toleranceapproach.
Keywords:Real-Time,FaultTolerance,DistributedEmbeddedSystems,COTS
1. INTRODUCTION
Currently, there is a trend to incorporate Com-
mercialO-The-Shelf(COTS)componentsinDis-
tributed Computer-Controlled Systems (DCCS),
inordertominimisecostsanddevelopmenttime.
Therefore, there is the need forarchitectures al-
lowingtheintegrationofCOTScomponents,hard
real-time reliable applications and non-reliable
and/orsoftreal-timeapplications.
This paper proposes the DEAR-COTS (Dis-
tributedEmbeddedARchitectureusingCommer-
cial O-The-Shelf components) architecture for
supportingreal-timeandreliableDCCS.Thisar-
chitecture provides DCCS with a generic frame-
work,inwhichhardreal-timeapplicationscanex-
ecute,guaranteeingtheirtimelinessandreliability
requirements,whilst, at thesametime, embody-
ing soft real-time or non real-time applications.
1
ThisworkwaspartiallysupportedbytheFCT,through
This framework is put together by means of a
TimelyComputingBase(TCB),whichisusedas
thereferencemodeltodealwiththeheterogeneity
ofsystemcomponentswithrespecttoguarantee-
ingthetimelinessofapplications.
The remainder of this paper is organised as fol-
lows. Section 2 presents some of the relevant
workconcerningdependabledistributedreal-time
systems. Afterwards, Section 3 presents thepro-
posed architecture. Section 4 and 5 present the
HardReal-Time and SoftReal-Time Subsystems
oftheDEAR-COTSarchitecture.Finally,Section
6drawssomeconclusionsandpointsoutpossible
future directions.
2. RELATEDWORK
The problem of reliable real-time DCCS is rea-
sonably well understood when considering syn-
chronousandpredictableenvironments.However,
theuseofCOTScomponentsandtheintegration,
softornonreal-timeapplications introducesnew
problems.
The guarantee of these properties cannot be
achieved byan \ad-hoc" solution,as ithas been
shown by structured approaches to the prob-
lem of designing dependable real-time systems
(e.g. DELTA-4 (Powell, 1991), MARS (Kopetz
et al., 1989) orGUARDS (Powellet al., 1999)).
However,specialisedarchitecturesarecostly,thus,
thereistheneedforafault-tolerantreal-timedis-
tributed architecture forDCCS, based on highly
availableand lowcostCOTStechnologies.
A few works haveaddressedthe implementation
oftimelinessrequirementsinenvironmentswhere
noreal-timeguaranteescanbegiven(Cristianand
Fetzer, 1999; Verssimo andAlmeida, 1995).But
the Timely Computing Basemodel presentedin
(Verssimoet al.,2000)providesagenericframe-
work to deal with this problem. It is used as a
referencemodelinthedesignoftheDEAR-COTS
architecture. The appropriateness of the TCB
model to COTS-based environmentsis discussed
in(Casimiroetal.,2000)foradistributedsystem
composed exclusively by COTS components It
concludes that the system must be able to cope
withtheoccurrenceofresidualtimingfailures.
Fault toleranceis thepreferredmeansto achieve
reliability. A fault-tolerantservice can be imple-
mented by co-ordinating a group of processes
replicated on dierent nodes. There are three
main replication approaches: active replication,
primary-backup (passive) replication and semi-
active replication (Powell, 1991). The use of
COTSinducesfail-uncontrolledreplicas, soitbe-
comesnecessarytheuseofactivereplicationtech-
niques. Inactivereplication, allthereplicas pro-
cessthesameinputs,keepingtheirinternalstate
synchronisedandvotingallonthesameoutputs.
It is also clear that the feasibility of the appli-
cations' requirementsmust be ensured bysound
schedulability analysis techniques. The use of
these techniques (e.g., the well-known Response
Time Analysis (Audsley et al., 1993)) allows
checkingifthetask set willmeet its deadlines,a
requiredconditionforhardreal-timeapplications.
3. THEDEAR-COTS ARCHITECTURE
TheDEAR-COTSarchitectureprovidesanexecu-
tionenvironmentforreal-time applications, with
reliabilityand availability requirements,through
theuseofCOTShardwareandsoftware.Itsmain
purposeistoprovidecontinuousandadequateser-
vicetothecontrolledsystem,inordertoincrease
thecondencelevelputinthecontrollingsystem.
TheDEAR-COTSarchitectureisnottargetedto
strictedset offailureassumptions(Laprie,1992).
Besides the reliability and availability require-
ments to guarantee the correct behaviour of
the supported real-time applications, DCCS also
needstobeinterconnectedwithotherpartsofthe
overall system. Currently, these kind of systems
demand formoreexibilityandinterconnectivity
capabilities, while guaranteeing the availability
andreliabilityrequirementsofthesupportedreal-
time applications. Hence,theintegration ofhard
real-time applications, whose requirements have
tobeguaranteed,withsoftreal-timeapplications,
where a more exible approach can be used, is
anothergoaloftheDEAR-COTS architecture.
Acommoncharacteristicamongalltheseapplica-
tions is theirreal-time behaviour. This real-time
behaviour is specied in compliance with timeli-
nessrequirements,whichinessencecallforasyn-
chronoussystemmodel.However,thedemandfor
an environment with exibility and interconnec-
tivity capabilities and based on possibly hetero-
geneous COTS components, makes the enforce-
mentoftimelinessassumptionsverydiÆcult.The
TimelyComputingBase(TCB)modelprovidesa
genericsolutiontothisproblem.
IntheDEAR-COTSarchitecturetheTCBmodel
is used as a reference model to deal with the
heterogeneityofsystemcomponentsandoftheen-
vironment,withrespecttotimeliness.Fromasys-
temmodelperspectivewedeviseagenericDEAR-
COTS architecture to address the fundamental
problemsinaglobalandintegratedway.Froman
engineeringpointofview,wedevisespecicmech-
anismstodealwiththereliabilityandavailability
requirements of hard real-time applications, and
to deal with the requirements of soft real-time
applications.
3.1 TheTCB model
The TCB model provides a generic framework
to deal with theproblem of implementing appli-
cations with timeliness requirements in environ-
ments that are unpredictable or unreliable. The
fundamentalidea,whichisappliedtotheDEAR-
COTS architecture,is that systemsare assumed
to have two distinct parts with respect to syn-
chrony.InasystemwithaTCBthereisacontrol
part,with synchronousproperties,made of local
TCBmodulesinterconnectedbyacontrolchannel.
Thereisalsoapayloadpart,overaglobalnetwork
orpayload channel, that mayhave any degreeof
synchronism.Inparticular, thepayload partcan
either enjoy synchronousproperties but canalso
becompletelyasynchronous.
The synchronous part is supposed to be a very
small and simple part of the overall system.
chronous properties would have to be assumed
for the overall system. Applications execute in
the payload part of the system and use a set of
basic services provided by the control part, the
TCB: Timely Execution, Duration Measurement
and Timing Failure Detection. The denition of
theseservicesandtherespectiveAPIcanbefound
in(Verssimoetal.,2000).
Applications can benet from the TCB by con-
struction, using TCB services to \be aware" of
theirtimelinessand,therefore,tobehavealwaysin
amannerthatpreservesthesafetyofthesystem.
This is the case of many soft real-time applica-
tions,namelyallthosethatcanlivewithsporadic
failures, that can adapt to the available quality
of service, orthose that canswitch to afail-safe
statewhensometimingfailureis detected.
Infact,timingfaults aectingcomponentsorap-
plicationswithsoftreal-timerequirements,canbe
addressedwiththehelpofatimingfailuredetec-
tion service and by applying adequate tolerance
orsafetymeasures,as explainedinsection5.
3.2 AgenericDEAR-COTSnode
Given the above description of the TCB model,
thearchitectureof agenericDEAR-COTS node,
capable of simultaneously handle the require-
mentsofhardandsoftreal-timeapplications,can
beunderstoodinaveryintuitivemanner.Infact,
the basic idea of casting into the architecture
the heterogeneity in systemsynchrony, has been
appliedtothegenericDEAR-COTSnode,asrep-
resentedbyFigure 1.
Control channel
Real-Time channel General purpose channel
TCB DC H DC S
1
(Timely Computing Base) Synchronous Asynchronous synchronous
DC S
n
real-timeHard Soft
Partially real-time real-timeSoft
Fig.1.DEAR-COTSnodestructure.
The several modules depicted in the gure are
intended to fully characterise a node in terms
of the synchrony assumptions. The TCB mod-
ule acts as a gluing element, being always the
most synchronous part of the system. It pro-
vides timely services to less synchronous mod-
ules through an interface that bridges the syn-
chronygap.TheDC
H ,DC
S1
andDC
Sn
modules
constitute the payload part of the system, and
represent theseveral possible environments with
respectto synchronythat mayexistin aDEAR-
withdierentrequirementswillbeexecuted.DC
H
represents a hard real-time subsystem (HRTS),
with synchronous properties, which is supposed
toexistineverynoderunningdistributedand/or
replicated hard real-time applications. Dierent
DC
S
subsystems can be dened, accordingly to
thesynchronypropertiesthattheyenjoy,namely
the asynchronous one, DC
S1
, and several par-
tial synchrony models, DC
S
n
(Cristian and Fet-
zer,1999;VerssimoandAlmeida,1995).Withthe
helpofaTCB,allthesesubsystems(includingthe
asynchronous one) can support the execution of
certainapplicationswithtimelinessrequirements.
Therefore,wewillrefertoallofthemassoftreal-
timesubsystems(SRTS).
Eachnodemayhaveaccesstodierentcommuni-
cation channels with respect tosynchrony. As in
the TCB model, we assume that TCB modules
communicate through a control channel, which
canbeimplementedasavirtualchanneloverthe
physical (real-time) network or can be a sepa-
rate network by itself. There is also a real-time
channel serving DC
H
subsystems, implemented
over areal-time network, and a general purpose
channelserving DC
S
subsystems, whereno real-
timeguaranteesareprovided.
This generic node architecture canindeed be in-
stantiated into dierent node congurations, de-
pendingonthemodulesusedinanode.ADEAR-
COTS systemis builtby interconnecting several
DEAR-COTSnodes,choosingsuitablecongura-
tionsforeachnode.
3.3 ImplementingaDEAR-COTSsystem
ADEAR-COTSsystemisbuiltusingdistributed
processing nodes, where distributed hard real-
time andsoftreal-time applicationsmaycoexist.
Toensurethedesiredleveloffaulttolerance(relia-
bilityandavailability)tothesupportedhardreal-
time applications, specic components of these
applications may be replicated. To ensure the
timelinesspropertiesofsoftreal-timeapplications
andallowsystemresourcestobesharedbyevery
(soft and hard real-time) application, processing
nodesaremodelledasDEAR-COTSnodes.
From ageneric DEAR-COTS node it is possible
to dene particular node instances, with dier-
entcharacteristicsandpurposes.ADEAR-COTS
nodeischaracterisedbythesynchronysubsystems
itiscomposedof.Thereareessentiallythreebasic
node types: Hard real-time nodes (H), soft real-
timenodes(S)andgatewaynodes(H/S).
Hard real-time nodes are those where only a
hard real-time subsystem (HRTS) exists. There-
fore, they will exclusively be used to provide a
frameworkto supportreliableandavailablehard
thesenodesisnotessentialtoallowtheimplemen-
tationofhardreal-timeapplications.However,as
in any system assumed to be synchronous, and
evenmoreinaCOTSbasedone,thereisalwaysa
small probabilitythat timing failures occur.The
needforaTCB, asaninstrumentto amplifythe
coverage of synchronous assumptions, has to be
equatedintermsofthedependabilityrequiredby
thesupportedDCCSapplication.
Softreal-time nodesonlyinclude asoftreal-time
subsystem (SRTS). Thesoftreal-time subsystem
providestheexecutionenvironmentfortheremote
supervision and remote management of the Dis-
tributedComputerControlSystem.Theexistence
of a TCB in these nodes is crucial to allow the
implementationofapplicationswithtimelinessre-
quirements.
A gatewaynodeintegratesbothahard real-time
subsystem(HRTS)andasoftreal-timesubsystem
(SRTS).Inthesenodestherearetwodistinctand
well-dened executionenvironments.The idea is
to allowhardreal-time components,executingin
the HRTS, to interact in a controlled manner
with softreal-time components,executing in the
SRTS. The communication mechanisms between
bothsubsystemsmustbecarefullydesigned,guar-
anteeing that failures in the soft real-time sub-
system (less reliable) do not interfere with the
hard real-time subsystem(concerning itstiming,
availability and reliability requirements). There-
fore, mechanisms for memory partitioning must
be provided, and also the inter-communication
mechanismsmustguaranteetheintegrityofdata
transferred from the SRTS to the HRTS, byup-
gradingitscondencelevel.
Wide Area Network
DEAR-COTS
H Node H/S
Node
"DC Gateway"
H Node
S Node S Node
DEAR-COTS DEAR-COTS
DEAR-COTS DEAR-COTS
Sensors/Actuators
Controller Area Network
General purpose
Real-Time DEAR-COTS nodes:
H/S Node - Gateway node S Node - Soft Real-Time node H Node - Hard Real-Time node
HRT App
SRT App
SRT App
HRT App SRT App HRT
App
Fig.2.GenericDEAR-COTSsystem.
Figure 2 represents a system containing all the
abovenodecongurations.H andH/SNodesare
interconnectedbyareal-timenetwork,whichpro-
vides the communication infrastructure for the
hardreal-time applications (interconnectingcon-
trollers, sensors and actuators). This real-time
networkalsoprovidestheDEAR-COTSarchitec-
turewithacommunicationinfrastructuretosup-
portthereplicamanagementmechanisms.Atthe
above level, as there is the need to interconnect
these nodes with the upper levels of the DCCS
remote management), there is ageneral-purpose
network interconnecting H/S and S nodes. The
control channel requiredfor theTCB isnotnec-
essarilyanindependentnetwork.Theassumption
of a restricted channel with predictable timing
characteristics(control) coexisting with possibly
asynchronous channels is feasible in some of the
currentnetworks(Prycker,1995).
At the hard real-time level, the HRTS is re-
sponsible for providing a framework for reliable
execution. Hence, applications have guaranteed
execution resources, including processing power,
memory and communication infrastructure.This
is the main reason for the need of a separated
real-time communicationnetwork for the HRTS,
wheremessagessentfromonenodetoanotherare
receivedandprocessedinaboundedtimeinterval.
TheSRTSprovidesasetofservicestosupportthe
supervisionand management levelof the DCCS.
At this system level, exibility is a major goal,
sincenewservicescanbeprovidedasthesystem
evolves. However, since there are no hard real-
timeguarantees,eithertheTCBservicesareused
to allowapplications tobeawareof theavailable
qualityofservice(andadaptthemselves,ifpossi-
ble), ortechniques such asbest-eortscheduling
orvalue-basedschedulingmustbeused.
4. THEHARDREAL-TIMESUBSYSTEM
ThesetofHandH/Snodesprovidesaframework
tosupportdistributedhardreal-timeapplications
(Figure3).Thereliabilityandavailabilityrequire-
mentsare guaranteed bythesoftwarereplication
of COTScomponents, rather than the usual so-
lution, which is to build software on top of spe-
cialisedhardware.
Real-Time Network
Kernel Hardware Kernel
Hardware Kernel
Hardware
Kernel Hardware
Hard Real-Time Application Hard Real-Time Application Hard Real-Time Application
Fig.3.HRTSstructure.
One hard real-time application is constituted by
several tasks (processing units). These tasks are
distributedovertheHandH/Snodes.Eachnode
has its own (non-distributed) COTS real-time
kerneland hardware, which provides thedesired
multitaskingsupport.Anadvantageofusingboth
a COTS kernel and hardware is that it allows
for the easy upgradability and portabilityof the
system.
ThesetofHandH/Snodesalsosupportstheac-
tivereplicationofsoftware(Figure4)withdissim-
tolerancetodesignfaultsisincreased,sincenodes
are considered as independent from the point of
view of failures. At the same time exibility is
increased,since nodesarenotjust copies ofeach
other, allowingforamoreexible designof real-
timeapplications.
τ
1Replica Manager Communication Manager
τ
1‘ τ
2τ
2’ τ
3τ
3‘
HRTS Support Software
Fig.4.ReplicatedHardReal-TimeApplication.
The goal of the HRTS support software (Figure
4) isto providethedistributionsupport (includ-
ing both the application distribution itself and
the replication management) to hard real-time
applications. A layeredapproachto theHRTS is
providedtosimplifythedevelopmentoftheHRTS
supportsoftware:
The Communication Manager layer is re-
sponsible for the adequate communication
services;
TheReplicaManagerlayerisresponsiblefor
transparentlymanagethereplicatedcompo-
nents.
4.1 Schedulingmodel
IntheHRTS,eachapplicationconsistsofasetof
related tasks (
1 ...
n
), beingeach task asingle
processingunit.Tasksfrom thesameapplication
canbeallocatedtodierentnodes.Inordertoal-
lowtheuseofcurrento-lineschedulabilityanal-
ysis techniques (e.g., the well-known Response
Time Analysis (Audsley et al., 1993)),each task
isreleased onlybyoneinvocation event, butcan
be released an unbounded number of times. A
periodictaskisreleasedbytheruntime(temporal
invocation),whileasporadictaskcanbereleased
either by another task or by the environment.
Afterbeingreleased,ataskcannot suspend itself
orbeblockedwhileaccessingremotedata(remote
blocking).
Tasksareallowedtocommunicatewitheachother
either throughshared data objectsor byrelease
eventobjects(whichcanalsocarrydata).Shared
dataobjectsareusedforasynchronousdatacom-
municationbetweentasks,whilereleaseeventob-
jects are used for the release of sporadic tasks.
Tasks are designed as small processing units,
which, in each invocation, readinputs; carryout
theprocessing;andoutputtheresults.Thegoalis
to minimisetaskinteraction,in orderto improve
the schedulability analysis and increase the sys-
CeilingProtocols(Shaetal.,1990).
4.2 ReplicationModel
Asthereisthetargetofreliabilitythroughreplica-
tion,itisimportanttodenethereplicationunit.
Fromtheabovedenitions, two dierententities
could be considered: a single task or the com-
plete hard real-time application. However, both
solutionswould beveryrestrictive. Inthelatter,
in order to replicate part of the application, all
of its tasks would have to be replicated, which
wouldunnecessarilyincreasetheprocessing load.
Intheformer,eachtaskoutputwouldhavetobe
consolidated,whichwouldincreasetheinter-task
communicationload.
Therefore,thenotionofcomponent isintroduced.
Applicationsaredividedincomponents,eachone
being a set of tasks and resources that interact
to perform a common job. The component can
includetasksandresourcesfromseveralnodes,or
it canbelocatedin just one node. Ineach node,
several componentsmay coexist.As anexample,
Figure5showsareal-timeapplicationwith4tasks
(
1 ,
2 ,
3 and
4
). The application is divided
in twodierent components(C
1 and C
2
), which
ensurethedesiredlevelofreliability.
C1
C1’
C2’ C2
τ
1τ
2τ
1’ τ
2’
τ
3τ
4τ
3’ τ
4’
Fig.5.Hardreal-timeapplication.
A similar concept is the \capsule" dened in
the Delta-4 architecture (Powell, 1991). As the
DEAR-COTS \component", aDelta-4 \capsule"
is the unit of replication, embodying a set of
tasks(referred asthreads) andobjects.However,
a \capsule" has its own thread scheduling and
separated memory space, and is also the unit of
distribution. Thus, the Delta-4 concept of \cap-
sule" is more related to Unix processes, whilst
the presented component is a more lightweight
concept, which is used to structure replication
units.
By creating components, it is possible to dene
thereplicationdegreeofspecicpartsofthereal-
time application, accordinglyto thereliabilityof
its components and the desired level of reliabil-
ity for the application. However, by replicating
components,eÆciencydecreasesasthenumberof
tasksandmessagesincreases.Hence,itispossible
to trade reliability for eÆciency and vice-versa.
AlthougheÆciencyshouldnotberegardedasthe