• Nenhum resultado encontrado

Performance Characterization of the Alya Fluid Dynamics Simulator

N/A
N/A
Protected

Academic year: 2022

Share "Performance Characterization of the Alya Fluid Dynamics Simulator"

Copied!
30
0
0

Texto

(1)

Performance Characterization of the Alya Fluid Dynamics Simulator

Guilherme Antonio Camelo, Lucas Mello Schnorr {gacamelo,schnorr}@inf.ufrgs.br

GPPD/INF/UFRGS, Porto Alegre, Brazil

INSTITUTO DE INFORMÁTICA UFRGS

Porto Alegre, September 2016 WSPPD 2016

1 / 18

(2)

Outline

1 Context: Numerical Simulation, Alya

2 Motivation

3 Execution Environment

4 Methodology

5 Preliminary Results

6 Conclusion and Future Work

(3)

Context: Numeric Simulation and HPC

ˆ Numeric simulation is used to understand/study a real life scenario

ˆ High performance computers used as execution platforms

ˆ Problems often require high computational power

ˆ Distributed and parallel programming techniques provide high computational power e.g OpenMPI

ˆ OpenMPI is ideal to deal with complex physical problems such as uid dynamic simulation.

3 / 18

(4)

Context: Numeric Simulation and HPC

ˆ Numeric simulation is used to understand/study a real life scenario

ˆ High performance computers used as execution platforms

ˆ Problems often require high computational power

ˆ Distributed and parallel programming techniques provide high computational power e.g OpenMPI

ˆ OpenMPI is ideal to deal with complex physical problems such as uid dynamic simulation.

(5)

Context: Numeric Simulation and HPC

ˆ Numeric simulation is used to understand/study a real life scenario

ˆ High performance computers used as execution platforms

ˆ Problems often require high computational power

ˆ Distributed and parallel programming techniques provide high computational power e.g OpenMPI

ˆ OpenMPI is ideal to deal with complex physical problems such as uid dynamic simulation.

3 / 18

(6)

Context: Alya - Fluid Dynamic Simulation tool

MPI-based framework to numerically solve physical problems

ˆ Simulates uid dynamic scenarios

ˆ Open source project

ˆ Parallel execution

ˆ Part of the Prace Benchmark from BSC (Barcelona Supercomputing Center)

(7)

Context: Alya - Fluid Dynamic Simulation tool

MPI-based framework to numerically solve physical problems

ˆ Simulates uid dynamic scenarios

ˆ Open source project

ˆ Parallel execution

ˆ Part of the Prace Benchmark from BSC (Barcelona Supercomputing Center)

4 / 18

(8)

Motivation: Irregular Load

Often numerical simulation applications have irregular load during execution.

ˆ Irregular control and data structure

ˆ Adaptive mesh renement

ˆ Irregular iteraction patterns among processes

Those factors usually lead to load imbalance among both resources and time as the simulation advances

(9)

Motivation: Irregular Load

Often numerical simulation applications have irregular load during execution.

ˆ Irregular control and data structure

ˆ Adaptive mesh renement

ˆ Irregular iteraction patterns among processes

Those factors usually lead to load imbalance among both resources and time as the simulation advances

5 / 18

(10)

Motivation: Irregular Load

Often numerical simulation applications have irregular load during execution.

ˆ Irregular control and data structure

ˆ Adaptive mesh renement

ˆ Irregular iteraction patterns among processes

Those factors usually lead to load imbalance among both resources and time as the simulation advances

(11)

Motivation: Load Imbalance

Finding out application behavior is key to apply optimization techniques such as:

ˆ Better balancing algorithms

ˆ More appropriate communication patterns.

Tracing techniques are used to acquire this information

ˆ Keeps track of information regarding the application

ˆ Record events

ˆ Record MPI operations

The Extrae tracing tool from BSC is used in this experiment

6 / 18

(12)

Motivation: Load Imbalance

Finding out application behavior is key to apply optimization techniques such as:

ˆ Better balancing algorithms

ˆ More appropriate communication patterns.

Tracing techniques are used to acquire this information

ˆ Keeps track of information regarding the application

ˆ Record events

ˆ Record MPI operations

The Extrae tracing tool from BSC is used in this experiment

(13)

Goal

Characterize Performance of Alya for the case studied

ˆ Identify irregular execution behavior regarding the resources and the time

ˆ Understand Alya's behavior in a smaller scale plataform

7 / 18

(14)

Execution Environment

Execution performed on 4 nodes that are part of the Draco cluster at INF UFRGS (Draco has 8 nodes).

Each node equipped with:

ˆ Intel (R) Xeon (R) E5-2640 v2 CPU @ 2.00GHz processor

ˆ 16 physical cores (32 with hyperthreading)

ˆ 64 gigabytes of RAM

ˆ Ubuntu 14.04.4 LTS

(15)

Execution Environment

Execution performed on 4 nodes that are part of the Draco cluster at INF UFRGS (Draco has 8 nodes).

Each node equipped with:

ˆ Intel (R) Xeon (R) E5-2640 v2 CPU @ 2.00GHz processor

ˆ 16 physical cores (32 with hyperthreading)

ˆ 64 gigabytes of RAM

ˆ Ubuntu 14.04.4 LTS

8 / 18

(16)

Methodology

After experimenting with various congurations, only 4 node and 64 cores were used in the experiment, running only until the 3rd

timesteps.

Trace size 2,7 Gb.

Files generated by the tracing tool Extrae are too big. Thus, reducing the number of timesteps was necessary.

(17)

Methodology

After experimenting with various congurations, only 4 node and 64 cores were used in the experiment, running only until the 3rd

timesteps.

Trace size 2,7 Gb.

Files generated by the tracing tool Extrae are too big.

Thus, reducing the number of timesteps was necessary.

9 / 18

(18)

Methodology

Execution: OpenMPI 1.10.2

ˆ Outputs simulation result Tracing: Extrae 3.3.0

ˆ Creates individual traces for each processor

ˆ Record communication and execution information

ˆ Trace les created in .mpits format Converting:

ˆ Trace les are converted to .prv format with mpi2prv

ˆ .prv les are converted to .CSV (Comma-Separated Values) using a Perl script

Plot and Analysis:

ˆ Using R, dlyr and ggplot2 dierent graphics and analysis are made

(19)

Methodology

Execution: OpenMPI 1.10.2

ˆ Outputs simulation result Tracing: Extrae 3.3.0

ˆ Creates individual traces for each processor

ˆ Record communication and execution information

ˆ Trace les created in .mpits format Converting:

ˆ Trace les are converted to .prv format with mpi2prv

ˆ .prv les are converted to .CSV (Comma-Separated Values) using a Perl script

Plot and Analysis:

ˆ Using R, dlyr and ggplot2 dierent graphics and analysis are made

10 / 18

(20)

Results

All results are extracted from the same execution with 4 nodes 64 cores and 3 timesteps.

(21)

Overview of Load Imbalance

● ● ● ● ●

● ●

● ● ●

● ●

● ● ● ● ●

●● ● ●

● ● ● ● ● ● ● ●

● ●

● ● ● ● ● ●

0 100 200 300

0 20 40 60

MPI Process Rank

Computation Time (s)

12 / 18

(22)

Space/Time Execution

(23)

Process Behavior by State

Blocking Send Group Communication Running

Send Receive Waiting a message 0

100 200 300

0 100 200 300

0 20 40 60 0 20 40 60

MPI Process Rank

Amount of Time (s)

14 / 18

(24)

Percent Imbalance

The percent imbalance formula is used to calculate the load balance of applications. It characterizes the uneven distribution of work.

ˆ Lmax - is the value of the process with the highest load (seconds executing)

ˆ L is the average computational load among processes

λ= (Lmax

L −1)∗100 (1)

The metric is calculated considering the entire execution. One global metric. Load imbalance is 75%, it indicates that if the load were better distributed we could have potential performance improvement

(25)

Conclusion

ˆ The execution traced using the Extrae, enabled us to discover relevant information about the core operation of Alya in the addressed case

ˆ Signicant time wasted in the beginning of the execution probably to divide the problem

ˆ Creates overhead since remaining processes are kept idle

ˆ Anomalies were detected in some processes

ˆ Root process (zero) sends the partitioned data sequentially to dierent nodes, thus, start of application is considerably slow and not scalable.

ˆ Percent imbalance metric indicates that there is a potential performance improvement. The reason for such imbalance is still being investigated.

16 / 18

(26)

Conclusion

ˆ The execution traced using the Extrae, enabled us to discover relevant information about the core operation of Alya in the addressed case

ˆ Signicant time wasted in the beginning of the execution probably to divide the problem

ˆ Creates overhead since remaining processes are kept idle

ˆ Anomalies were detected in some processes

ˆ Root process (zero) sends the partitioned data sequentially to dierent nodes, thus, start of application is considerably slow and not scalable.

ˆ Percent imbalance metric indicates that there is a potential performance improvement. The reason for such imbalance is still being investigated.

(27)

Conclusion

ˆ The execution traced using the Extrae, enabled us to discover relevant information about the core operation of Alya in the addressed case

ˆ Signicant time wasted in the beginning of the execution probably to divide the problem

ˆ Creates overhead since remaining processes are kept idle

ˆ Anomalies were detected in some processes

ˆ Root process (zero) sends the partitioned data sequentially to dierent nodes, thus, start of application is considerably slow and not scalable.

ˆ Percent imbalance metric indicates that there is a potential performance improvement. The reason for such imbalance is still being investigated.

16 / 18

(28)

Conclusion

ˆ The execution traced using the Extrae, enabled us to discover relevant information about the core operation of Alya in the addressed case

ˆ Signicant time wasted in the beginning of the execution probably to divide the problem

ˆ Creates overhead since remaining processes are kept idle

ˆ Anomalies were detected in some processes

ˆ Root process (zero) sends the partitioned data sequentially to dierent nodes, thus, start of application is considerably slow and not scalable.

ˆ Percent imbalance metric indicates that there is a potential

(29)

Future Work

ˆ Study possible change in the code to improve load balance

ˆ Execute Similar experiment with other congurations to conrm results varying:

ˆ Number of nodes

ˆ Number of cores

ˆ Number of timesteps

ˆ Mark the beginning of each iteration so balancing metrics can be applies in each phase of computation/communication

ˆ Trace with dierent tracing tool that generates smaller trace les e.g. ScoreP

17 / 18

(30)

Thank you

Thank you

Referências

Documentos relacionados

O direito de exoneração conferido ao sócio discordante de deliberação que decida a transferência da sede para o estrangeiro é-lhe atribuído quando a transferência diga respeito

The probability of attending school four our group of interest in this region increased by 6.5 percentage points after the expansion of the Bolsa Família program in 2007 and

Em Pedro Páramo, por exemplo, tem-se uma escritura (des)organizada em 70 fragmentos narrativos, que se desenvolvem em diferentes planos, nos quais a evoca- ção, a fantasia, o mito e

Sobrepondo-se à imagem divina representada pelo formato da águia, que sur- ge como um símbolo natural, ainda que tenha sido resgatado dos livros de história e trazido para

Dessa forma os carregamentos oriundos do vento agindo sobre os cabos condutores mais próximos ao solo foram mantidos, já para os demais cabos, incluindo o cabo para-raios, esses

É possível que a resposta à questão de terceiro nível dependa, sem dúvida, como Nietzsche pensava, de uma “crença” ou “fé”. Para Kierke- gaard, como é sabido, a questão

To cite this article: Falk Huettig, Thomas Lachmann, Alexandra Reis & Karl Magnus Petersson (2018) Distinguishing cause from effect – many deficits associated with

Domains of Mathematical Knowledge for Teaching (Ball et al., 2008, p 403) Within SMK is included common content knowledge (CCK), which corresponds to the sort of general