Introduction to MPI

Basic concepts of message passing libraries and how a library should provide more than basic data passing primitives. Background and development history of the Message Passing Interface (MPI). We have seen, therefore, that messaging applications require more than the ability to send data or messages.

In practice, all message passing libraries (of which there are many) provide a core set of capabilities for the application developer. As mentioned earlier, a naive view of message passing is the transfer of data between processes. While this publication of research products is usually a good thing, it becomes a hindrance when a large number of message passing libraries become available, few of which are properly developed.

Until recently, the parallel developer coming to the message for the first time had to make a choice between a large number of implementations that all did the same job but were mutually incompatible. To make this process easier, these implementations were actually wrappers that sat on top of existing message passing libraries.

3 Fundamentals of MPI

The SPMD Model
MPI Preliminaries
Communicators
Library Construction

While MPI implementations try to hide as much of their inner workings as possible, there are some cases where it is useful to pass some data to the application that is meaningful only to a particular MPI implementation. For example in MPI, as we'll see, it's possible to start an operation, do some useful work in the application, and then have the application ask the MPI library if the operation was successful. To do this, the MPI library passes to the application a token that refers to that operation.

As we've seen, message passing libraries offer a variety of capabilities, but perhaps the most important is the way they allow you to send data between programs. All message passing libraries have the concept of an address for a particular process, but MPI is unusual in that it has both a communicator and a rank for an address. As we will see, user applications that use message passing achieve all of their coordination by exchanging messages.

However, if a library is to be useful to the application, it is important to keep library messages separate from application messages. In most message-passing libraries this is impossible, since the library and application simply specify an address, and the destination process reads using a single function, i.e., the read function has difficulty distinguishing between messages destined for and messages destined for a library. (or more strictly the inner workings of the library).

4 Programming in MPI

Starting MPI
Stopping MPI
Communicators
Writing MPI Programs
Exercise 1
A Note on mpich

This requires a communicator and an error code that is passed to the applications being terminated. As discussed in Figure 2-1, most MPI programs adhere to the SPMD model, i.e., at runtime, the program determines what it will do. MPI provides two functions relevant to this, the first of which obtains the rank of the process within a given communicator;

Together, these functions are usually used in the initial SPMD code to determine which routine to execute. However, we need to know one more piece of information to write an MPI application - how we link in the library. We have now covered the basics of the concepts and design of MPI, and to strengthen them we now.

This program can be completed using the functions described in the previous chapters, which are summarized in Table 1 and Table 2. This course is intended to be taught using one of the major MPI PD implementations, mpich.

5 Message Passing in MPI

A Message
Constructing Messages
Point to Point Communication
Synchronous Send

Data Transmission
Back to Synchronous Send

Buffered Sends

Application Buffers
Buffered Send

Other Communication Modes

Standard Mode
Ready Sends

Receiving in MPI

Tags
The Blocking Receive

An Example
Note on Wildcard Receives
Non-Blocking Communications

Non-Blocking Theory

Non-Blocking Sends

N.B. Sync Sends
Other N.B. Sends

Non-Blocking Receives
Testing for Completion

More Advanced Completion Tests

Basic Message Functions

MPI tries to solve this problem by requiring the programmer to tell it what type of data is being sent to the other machine, ie. the message content. So at send time, the MPI implementation can decide which parts of the message to transform without the programmer having to be aware of this. In other words, the MPI implementation will assume the responsibility of ensuring that the message can be interpreted by the destination machine, regardless of its internal processor data types.

As we have seen, MPI messages are assumed to be a sequence of data types, all of the same type. All the MPI message passing functions lie on top of this (somewhat idealized) version of the message passing process. Initially, both do useful work, then the source process makes a synchronous send call to the destination.

We want to do some useful work while we're waiting for the message to be sent, and synchronous dispatch can't let us do this because the function call doesn't return until the transmission is completely complete. This sending is completed when the message has been sent out by the source machine, which may or may not indicate that the message has arrived at its destination. Most MPI implementations mean that the message will be deleted, but the sender will not be told this.

Before looking at the function call, it's worth discussing how the function specifies the message it wants to wait for. This is especially important in MPI, because the receive operation must specify the content of the message in the receive call. If the wrong type is specified, an error may occur. The message will consist of datatype count instances, and will be placed in memory starting at addressbuf.

The label and source parameters can be integers to enable the program to be very complex in the message it receives, but they can also be special parameters to allow wildcard matching. If an application chooses to receive a message type with wildcards (ie, not to be too picky), it may want to find the source or label of the message afterwards. It does this by calling a synchronous dispatch and the message is transferred within MPI_COMM_WORLD.

It receives the message for the result using a blocking receive and then uses the my_status variable to find where it came from. These calls are all called blocking calls because they do not terminate (ie, the calling program cannot continue) until they have completed the work they are doing.

6 Derived Datatypes

Background
What is a derived datatype?
Variables in Memory
Making a new type
Type Maps
Deriving a Structure

Datatype Column
Offset Column
Blocks
Describing the Structure
Committing the Datatype
Using the Datatype

Deriving Vectors

An MPI Vector
Describing the Vector
Committing the Vector

Exercise
Advanced Message Functions

One of the weaknesses of the derived data type technique is that at first glance it looks much more complicated than it needs to be. First, we build a description of the data type in a format that MPI will be able to understand later. MPI then adds this information to its list of recognized types, and from then on you can use it as if it were one of the intrinsic types.

A type map is the cliché that MPI will use when it wants to derive, or write, a structure of the type you've defined. The MPI extension function calculates the current length of the data and adds this offset to it. The second column is that of the offsets from the beginning of the structure, which, as we saw, we get by calling MPI_type_extent.

Note that the entries in this array are the offsets from the start of the structure, in other words, the size of the previous entries, not the current one. If we wanted to describe this structure in our simplified type map, we would have to fill in 12 separate entries for each of the 12 variables. The designers of MPI felt that since MPI already handled arrays of similar variables so well, it should be possible to put arrays into array type components, so we don't say.

Given the various elements of the type description, we can now tell the MPI about this description. An MPI vector draws its element from a set of variables of the same type. The offset from the beginning of the block to the beginning of the next one, called the step.

In the vector example shown in Figure 14 there are 3 elements of the old type we are interested in, and 2 empty elements at the beginning of the next block (so this means a step of 5). We provide most of the code except for the messaging calls, which you will provide. The master divides the function range into a number of strips and assigns each strip parcel to a different processor.

7 Conclusions

A Example MPI Programs

a Exercise 1
b Makefile
c Main program (C Version)
d Main Program (Fortran version)
e Exercise 2
f Program (C version)
g Fortran Version

1 # This is the makefile for the first example in the MAN T&EC Course 2 # "Introduction to MPI". 6 # This Makefile takes the form of the default makefiles for the ANL 7 # implementation of MPI, which is based on the p4 message passing library. 1 /* This is the sample answer for the first exercise in the MANTEC 2 course, “Introduction to MPI”.

The Fortran version of the first exercise is very similar, the only major difference being the addition of error variables that are passed to most MPI functions. 1 c This is a sample answer for the first exercise in the MANTEC 2 c “Introduction to MPI” course. The second exercise in the course is designed to show how data can be passed between programs as a way to enforce application logic across multiple processes.

When the worker receives this information, it performs an approximate integration and then sends a message to the worker containing the region below this range of parameters. The worker collects these values, sums them, and prints the result to the user. 1 /* This is a sample answer for the second exercise in the MANTEC 2 course, “Introduction to MPI”.

17/* and the definition of the work_packet that will be sent via MPI calls to the 18 employees. This is the structure that contains the work package that we send to the employees. After doing the computation, we perform a synchronous transmission to the root (working on processor 0).

1 c This is a sample answer for the second exercise in the MANTEC 2 c “Introduction to MPI” course. 68c That's the end of the main program, now define the functions 69c it uses. 99c This is the main process that distributes work to workers 100 integer nw, mytype.

Define a common block so that num_strips & start_value match the derived data type pattern. 145c Now send a message to the worker process telling it to process 146c num_this strips, starting at cur_value.