KAAPI and the Grid’5000 Topology - Some Visualization Models applied to the Analysis of Paralle

5.6 Summary

6.2.3 KAAPI and the Grid’5000 Topology

This Section describes the results obtained with real application traces gathered from different experiments with KAAPI applications on the Grid’5000 platform. We selected six different sce-narios to present these results, which consider as network interconnection the topology present in the Grid’5000.

Scenario A: 26 processes, two sites, two clusters

The first scenario is a KAAPI application composed of 26 processes. Each process is assigned to one distinct machine, resulting in an allocation of 26 machines. Half of them are allocated in the cluster xiru, at portoalegre, and the other half in the cluster grelon, at nancy site. Figure 6.11 depicts the 3D visualization generated by the Triva prototype of the application trace. The visualization base is configured to hold the network topology that interconnects both sites. In this example, we are using a hypothetical topology just to illustrate the analysis. The actual interconnection between portoalegre site and the rest of the Grid’5000 is a VPN, with several physical hops through the internet.

Figure 6.11 – A side-view generated by Triva with traces from 26 processes.

The first thing to be noticed on Figure 6.11 is the vertical bars representing the processes of the KAAPI application. The light gray represents the state Run and the dark gray represents the state Steal of a given process, as indicated in the leftmost part of the Figure. We can also observe in this Figure the horizontal lines connecting the processes from different sites. They represent the work stealing requests performed among the processes of the application. When the user is interacting with such visualization, it is possible to obtain information for every state and link represented. If a resource description with additional data about the interconnections is provided to the prototype, the user is capable to obtain such data through the visualization, by pointing the mouse to the squares and lines in the base. We can also notice in the Figure the distribution of steal requests in time.

6.2. 3D VISUALIZATIONS 97 Scenario B: 60 processes, two sites, three clusters

The second scenario is a KAAPI application composed of 60 processes, one per machine, that are executed in two sites of the Grid’5000. The site nancy contributes to the execution with 30 machines from the cluster grelon, at the same time that the site rennes has an allocation of 25 machines from cluster paramount and 5 machines from cluster paraquad. We consider in this case a topology where every site has its own router, where all clusters from that site are connected to. The routers of the two sites have a direct connection. Therefore, in this example when a message is sent from a cluster in one site to a cluster in other site, it has to go through the two sites routers.

Figure 6.12 shows two screenshots of the Triva Prototype generated during the visualization of the trace file for this scenario. The text and dashed lines were manually inserted to improve the understanding of the example. The image A of this Figure shows the total execution time with a small time scale, making all objects close to the visualization base. The dashed line on this image depicts the site separation between rennes with two clusters and nancy, with only one cluster. We can observe in this time scale that a large number of work stealing requests occur between grelon and paraquad clusters, mostly because of the higher number of processes executed on them. Analyzing these requests with the network topology, the Triva prototype allows the user to view that all the requests from these clusters must go through two routers of the interconnection. Such situation might lead to performance issues. A hierarchical work stealing is under investigation by the KAAPI team in order to overcome these problems.

Figure 6.12 – Two screenshots of the prototype Triva during the visualization of an application composed of 60 processes, with different time scales.

The prototype also allows the dynamic change of the time scale, using the mouse wheel. The image B of Figure 6.12 shows the total execution time for the traces of this scenario, but with a larger time scale. Through this image, it is possible to see differences in the work stealing be-havior in different intervals of time of the execution. It can be noticed that in the beginning there is less work stealing requests when compared to the end. It is during the end of the execution that less tasks are available for execution and processes start to try to steal more. This behavior

is expected considering the current implementation of KAAPI, where random steal requests are performed when processes are idle.

Scenario C: 100 processes, three sites, four clusters

The third scenario is an application composed by 100 processes, one per machine, allocated in four clusters that are in three different sites of Grid’5000. The allocation is as follows: clus-ter grelon with 30 machines at nancy site; pastel with 40 at toulouse; and 25 machines from paramount and 5 from paraquad at rennes site. The network interconnection here is constructed as in the previous example. In this scenario, we consider that the three routers are fully con-nected.

In previous scenarios, we observed screenshots where all the execution time is represented, sometimes with different time scales. The Figure 6.13 shows two screenshots where only a part of the execution time is drawn. This is possible in the prototype through an interactive configuration where the user specifies which time slice is rendered. The image A of the Figure shows the work stealing requests at the beginning of the application. The dashed lines separates the three different sites. As on previous cases, each cluster name has a number which indicates how many processes are executed on that cluster. We can clearly observe that in the beginning the number of stealing requests is considerably lower compared to the end of execution, shown on the image B.

Figure 6.13 – Two visualizations with different time slices of an application composed of 100 processes.

The image B of Figure 6.13 also shows, through the dashed arrow, the path that all work stealing requests must follow from the cluster pastel to the cluster grelon and vice-versa. We can see with the rendering of the network topology that these requests must go through two routers in order to arrive in the destination. The visualization in this case may suggest that big cluster allocations for this particular execution should be placed in the same site, avoiding two hops for stealing requests. Small allocations could then be placed on other sites, because of the smaller number of steal requests generated by these small allocations.

6.2. 3D VISUALIZATIONS 99 Scenario D: 200 processes, 200 machines, two sites, five clusters

The KAAPI application of scenario D is composed of 200 processes, in 200 machines. The machine allocation is divided in two sites: rennes and nancy. The number of machines allocated in each site is equal, but inside each site the allocation differs in number of machines per cluster.

The image A of Figure 6.14 shows the number of machines for each cluster allocated and also the network topology that interconnects the two sites. As in previous scenarios, the dashed line is used to separate the sites. In order to illustrate another benefit of our visualization, we consider for this scenario additional information regarding the network interconnection between the routers and three clusters. We consider here that the bandwidth available between paravent and grillon clusters, through the two routers, is of 100 megabits. The link between the grelon cluster and its router is of 1 megabit, as depicted in image A of the Figure.

Figure 6.14 – Two top-views with a network topology annotated with bandwidth limitations, showing the benefits brought by the 3D approach.

In this scenario, there are 87 processes running on grelon, and 61 on paravent cluster. Let us consider only the work stealing requests between these two clusters, as depicted by the dashed circle of the right image of Figure 6.14. The dashed arrow of the same image indicates that these requests must pass through the 1 megabit link. The visualization suggests that a smaller number of processes should be placed in a cluster with such a slow bandwidth. If, for instance, the processes of cluster grelon were executed on cluster grillon instead, the execution could have a better performance.

Through the example of this scenario, we can notice the importance of analyzing the ap-plication performance together with a topological representation. If this type of visualization, such as the one present in image B of Figure 6.14, is not present, the analyst could obtain wrong conclusions about the performance of its application.

Scenario E: 648 processes, two sites, five clusters

The KAAPI library has a random work stealing mechanism. It means that whenever a process has no further tasks to process, it selects randomly another process to perform a stealing request.

This random behavior is an easy and simple way to perform load balancing, being a distributed solution that scales well. The scenario E intends to show the resulting communication pattern caused by the KAAPI work stealing implementation in a large-scale situation with topological data. The network topology configuration is the same of scenario D, and the same number of machines is used to the execution of the application. The only difference here is that a higher number of processes is launched, resulting in 648 processes.

Figure 6.15 shows a screenshot of the Triva prototype when configured to show the behavior of all the execution time on top of the network topology. We can see the processes distribution among the clusters, which square size in the base is directly related to the number of processes in the cluster. Considering the five clusters of this execution and the random work stealing mechanism, it is expected to find steal requests from all clusters to all others. The four arrows, drawn manually on the view, put in evidence this behavior for the cluster grelon. We can see that other clusters also perform steal requests the same way, having as targets processes from all other clusters.

Scenario F: 2900 processes, four sites, thirteen clusters

The last scenario is an application composed of 2900 processes, executed in 310 machines that were allocated in clusters of four Grid’5000 sites. The machine allocation is as follows: 60 machines from lille site (41 - chinqchint, 10 - chti, 3 - chuque, 6 - chicon); 100 from rennes (61 - paravent, 6 - paramount, 33 - paraquad); 50 from bordeaux (5 - bordereau, 22 - bordeplage, 23 - bordermer); and 100 from sophia site (48 - azur, 42 - sol, 10 - helios). The objective of this scenario is to illustrate different work stealing patterns that arise in different intervals of time during the execution of a large-scale application. The interconnection topology follows the same policies as before: each site with a router, all the clusters of a site connected to the site router. The image A of Figure 6.16 shows the overall organization of the network topology, with dashed lines dividing the sites and each cluster representation with its respective name and number of processes allocated to it.

The total execution time of this application is 74 seconds. The image A of Figure 6.16 shows the work stealing requests that happened from the sixth to the sixteenth second of execution.

In this time slice, most of the requests are performed between the paraquad and paramount clusters. The image B shows the time slice between the seconds 16 and 26, showing a higher number of steal requests inside the rennes site. The image C shows another time slice, from the seconds 26 and 36, with even more steal requests among the clusters and image D shows the time slice from the second 36 to 50. This last image has too many steal requests, causing problems in the perception of the network topology in the visualization base. This problem can be alleviated in the prototype by changing the transparency configuration of the links representation. Even so, the example shows an expected behavior from the KAAPI library, with more steal requests to the end of the application execution.

No documento Some Visualization Models applied to the Analysis of Parallel Applications (páginas 104-109)