Appendix - St. Petersburg, Russia, September/October 2013 Proceedings

Algorithm 1.(Translation of a TRDA to a Timed Automaton) Here we show that an internal timed transition diagram of a TRDA (without resource expressions) can be reperesented as a Timed Automaton [1,2]. It is suﬃcient to present a transformation of a single transition.

Consider a transition s^W[a;b];F[c;d]

−→ s.In the resulting TA it will be modelled by three locations (ls, l_s and lt) and two transitions (ls

t−→start lt and lt t_stop

−→ l_s).

Additionally we introduce two clocks —waitsfor the states andf iretfor the transitiont.Transitiontstartis labelled by guard expressionwaits≥a∧waits≤ band reset expressionf iret:= 0.Transitiontstopis labelled by guard expression f iret≥c∧f iret≤dand reset expressionwaits := 0.

This is a syntactical transformation, and the resulting TA has exactly the same structure and timed behaviour as the TRDA (modulo resources).

Algorithm 2.(Translation of a TRDA to a Time Petri Net) Here we show that a TRDA (with resource expressions) can be reperesented as a TPN.

TRDA Nets for Distributed Real-Time Systems Modelling 25 First consider a transition diagram of TRDA. It is replaced by an automaton Petri net (where each transition of a Petri net has one input and one output place) of the same graph topology, but with “doubled” transitions — each tran- sitiontof TRDA is replaced in TPN by a sequence of two transitions — tstart

andtstop. AssumingW(t) = [a;b] and F(t) = [c;d],we set the ﬁring interval of tstart to [a;b], and the ﬁring interval oftstop to [c;d].

At this stage the resulting TPN behaves exactly like an internal timed transition diagram of a given TRDA (without resource expressions). Now consider all resource expressions. LetΩ={A₁, . . . , A_k} be the set of all types (the set of all possible “colours” of resource tokens). We add to the net so-called “resource places” — one for every type — denoted byp_A₁, . . . , p_A_k.

And the ﬁnal step: if the original TRDA transitiont is labeled by a resource expressionπ?aof typeAi,then we add an arc from placepA_ito transitiontstart, if it is labelled by a resource expressionπ!aof typeAi,then we add an arc from tstoptopA_i.Like the previous algorithm, this is just a syntactical transformation.

Algorithm 3.(Approximation of a TRDA-net by a Time Petri Net) Here we show that a whole TRDA-net (with resource expressions) can be reperesented as a Time Petri Net. However, the modeling is not exact (in the sence of dense time simulations) — and it cannot be exact since in TRDA-net the number of simultaneously ﬁring transitions is unbounded (we can produce any number of TRDA tokens). In TPN the number of transitions is ﬁxed, hence we can model TRDA-net by a TPN only in a weaker sense — considering some kind of approximation of TRDA dense time state space.

The complete transformation uses the method, introduced in the previous algorithm. However, now we construct a set of TPN — one net for each pair (TRDA type, system node), and a set of resource places — one place for each pair (TRDA type, system node). The resource-consuming and resource-producing arcs are introduced just like in the previous algorithm.

for the Conjugate Gradient Methods

Oleg Bessonov

Institute for Problems in Mechanics of the Russian Academy of Sciences 101, Vernadsky ave., 119526 Moscow, Russia

bess@ipmnet.ru

Abstract. In this paper we present the analysis of parallelization properties of several typical preconditioners for the Conjugate Gradient methods. For implicit preconditioners, geometric and algebraic parallelization approaches are discussed. Additionally, diﬀerent optimization techniques are suggested. Some implementation details are given for each method.

Finally, parallel performance results are presented and discussed.

1 Introduction

Conjugate Gradient methods are widely used for solving large linear systems arising in discretizations of partial differential equations in many areas (fluid dynamics, semiconductor devices, quantum problems). They can be applied to ill-conditioned linear systems, both symmetric (plain CG) and non-symmetric (BiCGStab, GMRES etc.). In order to accelerate convergence, these methods require preconditioning. Now, with the proliferation of multicore and manycore processors, efficient parallelization of preconditioners becomes very important.

There are two main classes of preconditioners: explicit, that apply only a matrix-vector multiplication, and implicit, that require solution of auxiliary linear systems based on the incomplete decomposition of the original matrix.

Explicit preconditioners act locally by means of a stencil of limited size and propagate information through the domain with low speed, while implicit preconditioners operate globally and propagate information instantly. Due to this implicit preconditioners work much faster and have better than linear depen- dence of convergence on the geometric size of the problem.

Parallel properties of preconditioners strongly depend on how information is propagated in the algorithm. For this reason implicit preconditioners can’t be easily parallelized, and many efforts are needed for finding geometric and algebraic approached of parallelization. There exists a separate class of implicit methods, Multigrid, which possesses very good convergence and parallelization properties. However, Multigrid is extremely difficult for implementation, and in some cases it can’t be applied at all. Due to this classical (explicit and implicit) preconditioners are still widely used in many numerical applications.

Thereby, in this paper we will analyze parallelization properties and performance of several preconditioners for diﬀerent discretizations and geometries, and their implementation details on modern multicore processors.

V. Malyshkin (Ed.): PaCT 2013, LNCS 7979, pp. 26–36, 2013.

c Springer-Verlag Berlin Heidelberg 2013

Parallelization Properties of Preconditioners for the CG Methods 27

No documento St. Petersburg, Russia, September/October 2013 Proceedings (páginas 37-40)