[ Previous Page ] [ Next Page ] [ Contents ] [ Index Page ]
Last modified on: Wednesday, October 15 1997 at 11:09am

3 Understanding HP MPI

This chapter provides information about the HP MPI implementation of MPI. The topics covered are:

Directory structure

All HP MPI files are stored in the /opt/mpi directory. The directory structure is organized as shown in Table 1.

Table 1 Organization of the /opt/mpi directory

Subdirectory

Contents

bin

Command files for the HP MPI utilities

doc/html

HTML version of the HP MPI User's Guide. View cover.html to browse the guide.

help

Source files for the example programs

include

Header files

lib/X11/app-defaults

Application default settings for the XMPI trace utility

lib/pa1.1/libfmpi.a

MPI library for 32-bit Fortran applications

lib/pa1.1/libmpi.a

MPI library for 32-bit C and C++ applications

lib/pa1.1/libpmpi.a

MPI 32-bit profiling interface library

lib/pa20_64/libfmpi.a

MPI library for 64-bit Fortran applications

lib/pa20_64/libmpi.a

MPI library for 64-bit C and C++ applications

lib/pa20_64/libpmpi.a

MPI 64-bit profiling interface library

newconfig/

Configuration files and release notes

share/man/man1.Z

Man pages for the HP MPI utilities

share/man/man3.Z

Man pages for HP MPI library and library functions

The man pages located in the /opt/mpi/share/man/man1.Z subdirectory are grouped into three categories: compilation, general, and run time. The compilation and run-time categories correspond to available types of HP MPI utilities. All three categories are described in Table 2.

Table 2 Man page categories

Man page category

Description

Compilation

Describes the available compilation utilities. Refer to "Compiling applications" for more information.

General

Describes the general features of
HP MPI. The man page is called MPI.1.

Run time

Describes the available run-time utilities. Refer to "Run-time utility commands" on page 42 for more information.

Compatibility issues

Several compatibility issues exist for HP MPI V1.3:

Compiling applications

The compiler used to build HP MPI applications depends upon which programming language you use. HP MPI provides separate compilation utilities and default compilers for the languages shown in Table 3.

Table 3 Compilation utilities

Language

Utility

Default compiler

C

mpicc

/opt/ansic/bin/cc

C++

mpiCC

/opt/aCC/bin/aCC

Fortran 77

mpif77

/opt/fortran/bin/f77

Fortran 90

mpif90

/opt/fortran90/bin/f90

Note: If aCC is not available, mpiCC uses CC as the default C++ compiler.
Note: Even though the mpiCC and mpif90 compilation utilities are shipped with HP MPI, all C++ and Fortran 90 applications use C and Fortran 77 bindings respectively.

If you want to use a compiler other than the default one assigned to each utility, you can set the environment variables shown in Table 4.

Table 4 Compilation environment variables

Utility

Environment variable

mpicc

MPI_CC

mpiCC

MPI_CXX

mpif77

MPI_F77

mpif90

MPI_F90

To set a compilation environment variable, enter:

% setenv compilation_environment_variable path

where compilation_environment_variable is the name of the variable you want to set and path specifies the path to the compiler you want to use.

64-bit support

HP-UX 11.0 is available as a 64-bit operating system for PA2.0 architectures and as a 32-bit operating system for older PA-RISC processor architectures. You must run 64-bit executables on the 64-bit system (though you can build 64-bit executables on the 32-bit system).

HP MPI supports a 64-bit version of the MPI library on platforms running HP-UX 11.0. Both 32- and 64-bit versions of the library are shipped with HP-UX 11.0 (only a 32-bit version is shipped with
HP-UX 10.20).

The mpicc and mpiCC commands link the 64-bit version of the library if you compile with the +DA2.0W or +DD64 options. The mpif90 command links the 64-bit version of the library if you compile with the +DA2.0W option. Otherwise, the 32-bit version is used.

Statically-bound binaries built on HP-UX 10.20 platforms can run on HP-UX 11.0 systems. However, dynamically-bound binaries can only run on the HP-UX platform on which they were built.

Language interoperability

HP MPI complies with the language interoperability requirements of the MPI-2 standard. Language interoperability allows you to write
mixed-language applications or applications that call library routines written in another language. For example, you can write applications in Fortran or C that call MPI library routines written in C or Fortran respectively.

MPI provides a special set of conversion functions for converting objects between languages. The types of objects that you can convert include MPI communicators, data types, groups, requests, reduction operations, and status. See "MPI 2.0 extensions" for a list of these MPI conversion functions.

Running applications

Most HP MPI applications are run using the mpirun command. You should invoke the mpirun command with the -j option. This option displays the job ID of your job. The job ID is useful during troubleshooting if you want to check for a hung job using the mpijob command or want to terminate your job using the mpiclean command.

In some cases, you can use the executable -np # syntax to start your application. For example, to start an executable named hello_world with four processes, enter:

% hello_world -j -np 4

For multiprotocol applications that span multiple subcomplexes or multiple hosts, you must use mpirun together with an appfile. For applications that run on a single host and have a single executable, you can use executable -np # syntax, although mpirun is still recommended.

Types of applications

HP MPI supports two programming styles: SPMD applications and MPMD applications.

Running SPMD applications

A single program multiple data (SPMD) application consists of a single program that is executed by each process in the application. Each process normally acts upon different data. Even though this style simplifies the execution of an application, using SPMD can also make the executable larger and more complicated.

Each process calls MPI_Comm_rank to distinguish itself from all other processes in the application. It then determines what processing to do.

To run a SPMD application, use the mpirun command like this:

% mpirun -np # program

where # is the number of processors and program is the name of your application.

Suppose you want to build a C application called poisson and run it using five processes to do the computation. To do this, use the following command sequence:

% mpicc -o poisson poisson.c
% mpirun -np 5 poisson

Running MPMD applications

A multiple program multiple data (MPMD) application uses two or more separate programs to functionally decompose a problem.

This style can be used to simplify the application source and reduce the size of spawned processes. Each process can execute a different program.

To run an MPMD application, the mpirun command must reference an appfile that contains the number of processes to be created from each program and the list of programs to be run.

A simple invocation of an MPMD application looks like this:

% mpirun -f appfile

where appfile is the path name to a file that contains process counts and a list of programs.

Suppose you decompose the poisson application into two source files: poisson_master (uses a single master process) and poisson_child (uses four child processes).

The appfile for the example application contains the two lines shown below:

-np 1 poisson_master
-np 4 poisson_child

To build and run the example application, use the following command sequence:

% mpicc -o poisson_master poisson_master.c
% mpicc -o poisson_child poisson_child.c
% mpirun -f appfile

See "Creating an appfile" for more information about using appfiles.

Multiprotocol messaging

Multiprotocol messaging refers to process communication that uses different protocols depending upon where the processes are located and what type of Exemplar system is used.

An example configuration for an X-Class server is shown in Figure 1.

Figure 1 Multiprotocol messaging with an X-Class server

(Graphic)

The circles within each hypernode represent processes. The arrows represent message passing. An arrow originates from the sending process and terminates at the receiving process.

Point-to-point and collective protocols on an X-Class server support messaging between:

The communication speed of protocols for servers running under
SPP-UX is fastest for processes on the same hypernode, slower for processes on different hypernodes in the same host, and slowest for processes on different hosts.

An example configuration for a K-Class server is shown in Figure 2.

Figure 2 Multiprotocol messaging with a K-Class server

(Graphic)

The circles within each host represent processes. The arrows represent message passing. An arrow originates from the sending process and terminates at the receiving process.

Point-to-point and collective protocols on servers running under
HP-UX support messaging between:

Run-time environment variables

Environment variables are used to alter the way HP MPI executes an application. The variable settings determine how an application behaves and how an application allocates internal resources at run time.

Many applications run without setting any environment variables. However, applications that use a large number of nonblocking messaging requests, require debugging support, or need to control process placement may need a more customized configuration.

Environment variables are always local to the system where mpirun is running. To propagate environment variables to remote hosts, you must specify each variable in an appfile using the -e option. See "Creating an appfile" for more information.

The environment variables listed below affect the behavior of HP MPI at run time:

MPI_FLAGS

MPI_FLAGS modifies the general behavior of HP MPI. The MPI_FLAGS syntax is shown below:

[ecxdb,][edde,][exdb,][egdb,][j,][l,][s[a|p][#],][v,][+E2]

where

ecxdb

Starts a separate CXdb session for each process. The debugger must be in the command search path. This option is only provided for backward compatibility on servers running under SPP-UX. See "Debugging HP MPI applications" for more information.

edde

Starts the application under the DDE debugger. The debugger must be in the command search path. This option is only supported on servers running under
HP-UX. See "Debugging HP MPI applications" on page 92 for more information.

exdb

Starts the application under the xdb debugger. The debugger must be in the command search path. This option is only supported on servers running under
HP-UX. See "Debugging HP MPI applications" on page 92 for more information.

egdb

Starts the application under the gdb debugger. The debugger must be in the command search path. This option is only supported on servers running under
HP-UX. See "Debugging HP MPI applications" on page 92 for more information.

j

Prints the HP MPI job identifier.

l

Reports memory leaks caused by erroneous handling of HP MPI objects. Setting this option may decrease performance.

s[a|p][#]

Selects signal and maximum time-delay for guaranteed message progression. The sa option selects SIGALRM. The sp option selects SIGPROF. The # option is the number of seconds to wait before issuing a signal to trigger message progression. The default value of this option is sp604800, which issues a SIGPROF once a week.

This mechanism is used to guarantee message progression in applications that use nonblocking messaging requests followed by prolonged periods of time in which HP MPI routines are not called.

Note: The SIGPROF option is not supported on servers running under
SPP-UX when your application executable is in Extended Standard Object Module format.
v

Prints the version number.

+E2

Sets -1 as the value of .TRUE. and 0 as the value for FALSE. when returning logical values from HP MPI routines called within Fortran 77 applications.

MPI_GLOBMEMSIZE

MPI_GLOBMEMSIZE specifies the amount of shared memory allocated for all processes in an HP MPI application. MPI_GLOBMEMSIZE has the following syntax:

amount

where amount specifies the total amount of shared memory in bytes for all processes. The default is 2 Mbytes for up to 64-way applications and
4 Mbytes for larger applications.

Note: Be sure that the value specified for MPI_GLOBMEMSIZE is less than the amount of global shared memory allocated for the subcomplex when working with X-Class servers. Otherwise, swapping overhead will degrade application performance.

MPI_TOPOLOGY

MPI_TOPOLOGY controls application process placement within a subcomplex on servers running under SPP-UX (the value is ignored on HP-UX systems). MPI_TOPOLOGY has the following syntax:

[[sc]/[hypernode]:][topology]

where

sc

Identifies the name of a subcomplex.

hypernode

Specifies the logical hypernode within the subcomplex on which to start the first process. By default, the initial logical hypernode is chosen by the operating system.

topology

Is a comma-separated list that specifies the number of processes to start on each logical hypernode in the subcomplex, beginning with logical hypernode 0.

HP MPI uses logical hypernode numbering. The operating system handles the mapping from physical to logical hypernodes. This mapping follows the lowest-to-highest sorted order of physical hypernode numbers. For example, in a 2-node subcomplex using physical hypernodes 3 and 4, physical hypernode 3 would map to logical hypernode 0, and physical hypernode 4 would map to logical
hypernode 1.

An MPI_TOPOLOGY value of System/3:4,0,4,4 specifies that logical hypernodes zero, two, and three of the subcomplex System each run four processes. The first application process is started on logical hypernode three.

When running a multinode application where some processes run different executables, MPI_TOPOLOGY settings in the appfile override any settings you might have specified by setting MPI_TOPOLOGY from the command line. See "Creating an appfile" for more information.

The number of processes specified using MPI_TOPOLOGY must match the number of processes specified in mpirun. For example, if you set MPI_TOPOLOGY to 2,3 and invoke mpirun with -np 6, the system generates an error message and terminates your job.

Also, be sure that the number of hypernodes specified in MPI_TOPOLOGY matches the number of available hypernodes on the subcomplex you want to use. For example, if you set MPI_TOPOLOGY to 6,2,3 and System only contains hypernodes 0 and 1, the system will generate an error message and terminate your job. To prevent this, use the scm utility to determine the configuration of system subcomplexes before invoking mpirun.

Note: The default subcomplex on all systems is called System. Use the mpa utility to change the default to another subcomplex.

MPI_SHMEMCNTL

MPI_SHMEMCNTL controls the subdivision of each process's shared memory for the purposes of point-to-point and collective communications. MPI_SHMEMCNTL syntax is shown below:

[nenv,][frag,][generic]

where

nenv

Specifies the number of envelopes per process pair. The default is 8.

frag

Denotes the size in bytes of the message-passing fragments region. The default is 87.5 percent of shared memory.

generic

Specifies the size in bytes of the generic-shared memory region. The default is 12.5 percent of shared memory.

MPI_TMPDIR

By default, HP MPI uses the /tmp directory to store temporary files needed for its operations. MPI_TMPDIR is used to point to a different temporary directory. MPI_TMPDIR syntax is shown below:

directory

where directory specifies an existing directory used to store temporary files.

MPI_XMPI

MPI_XMPI specifies options for run-time raw trace generation. These options represent an alternate way to set tracing rather than using the trace options supplied with mpirun.

The argument list for MPI_XMPI contains the prefix name for the file where each process writes its own raw trace data. Each process creates its own filename by concatenating the prefix, a period, and the process's global rank number.

For example, if a process has rank 0 and the prefix is hello_world, the process's raw trace file would be hello_world.0. If the file prefix name does not begin with a forward slash (/) (for example, /tmp/test), the raw trace file is stored in the directory in which the process is executing MPI_Init.

MPI_XMPI syntax is shown below:

prefix[:bs###][:nc][:off][:s][:nf][:k]

where

prefix

Specifies the tracing output file prefix. This is a required parameter.

bs###

Denotes the buffering size in kbytes for dumping raw trace data. Actual buffering size may be rounded up by the system. The default buffering size is 4096 kbytes. Specifying a large buffering size reduces the need to flush raw trace data to a file when process buffers reach capacity. Flushing too frequently can cause communication routines to run slower. If this problem occurs, increase the buffering size.

nc

Specifies no clobber, which means that an HP MPI application aborts if a file with the name specified in prefix already exists.

off

Denotes that trace generation is initially turned off and only begins after all processes collectively call MPIHP_Trace_on.

s

Specifies a simpler tracing mode by omitting tracing for MPI_Test, MPI_Testall, MPI_Testany, and MPI_Testsome calls that do not complete a request. This option may reduce the size of trace data so that xmpi runs faster.

nf

Denotes that a consolidated trace file is not generated. In addition, raw trace files are not deleted. You may want to use this option if your application contains a large number of processes, and you do not want to wait for MPI_Finalize to consolidate the raw trace files before your application terminates.

k

Specifies that raw trace files are kept.

Note: Even though you can specify tracing options through the MPI_XMPI environment variable, the recommended approach is to use the mpirun command with the -t option instead. In this case, the specifications you provide with the -t option take precedence over any specifications you may have set with MPI_XMPI. Using mpirun to specify tracing options guarantees that multihost applications do tracing in a consistent manner. See "mpirun" for more information.
Note: Trace-file generation (in conjunction with XMPI) and counter instrumentation are mutually exclusive profiling techniques.

MPI_WORKDIR

By default, HP MPI applications execute in the directory where they are started. MPI_WORKDIR changes the execution directory. MPI_WORKDIR has the following syntax:

directory

where directory specifies an existing directory where you want the application to execute.

MPI_CHECKPOINT

You can checkpoint and restart HP MPI applications running under SPP-UX on a single subcomplex by setting MPI_CHECKPOINT. In this case, you cannot start your application using mpirun. MPI_CHECKPOINT does not require specific arguments. For example, to checkpoint and restart the hello_world application:

% setenv MPI_CHECKPOINT
% hello_world -np 4

When you use MPI_CHECKPOINT, the following limitations apply:

MPI_INSTR

MPI_INSTR enables counter instrumentation for profiling HP MPI applications. The measurements collected are similar to the reports generated by mpitrstat. MPI_INSTR has the following syntax:

prefix[:b#1,#2][:nc][:off][:nl][:np][:nm][:c]

where

prefix

Specifies the instrumentation output file prefix. The rank zero process writes the application's measurement data to prefix.instr. If the prefix does not represent an absolute pathname, the instrumentation output file is opened in the working directory of the rank zero process when MPI_Init is called.

b#1,#2

Redefines the instrumentation message bins to include a bin having byte range #1 and #2 inclusive. The high bound of the range can be infinity, representing the largest possible message size.

nc

Specifies no clobber. If the instrumentation output file exists, MPI_Init aborts.

off

Denotes that counter instrumentation is initially turned off and only begins after all processes collectively call MPIHP_Trace_on.

nl

Specifies not to dump a long breakdown of the measurement data to the instrumentation output file (in this case, do not dump minimum, maximum, and average time data).

np

Denotes not to dump a per-process breakdown of the measurement data to the instrumentation output file.

nm

Specifies not to dump message-size measurement data to the instrumentation output file.

c

Specifies not to dump time measurement data to the instrumentation output file.

See "Using counter instrumentation" for more information.

Note: Even though you can specify profiling options through the MPI_INSTR environment variable, the recommended approach is to use the mpirun command with the -i option instead. Using mpirun to specify profiling options guarantees that multihost applications do profiling in a consistent manner. See "mpirun" for more information.
Note: Counter instrumentation and trace-file generation (used in conjunction with XMPI) are mutually exclusive profiling techniques.

Run-time utility commands

HP MPI provides a set of utility commands to supplement the MPI library routines. These commands include:

mpirun

mpirun starts an HP MPI application.

mpirun syntax has two forms:

where

-np #

Specifies the number of processes to run.

-help

Prints usage information for the utility.

-version

Prints the version information.

-j

Prints the HP MPI job ID.

-p

Turns on pretend mode. That is, go through the motions of starting an HP MPI application but do not create any processes. This is useful for debugging and checking whether the appfile (if used) is setup correctly.

-v

Turns on verbose mode.

-W

Does not wait for the application to terminate before returning.

-t spec

Enables run-time raw trace generation for all processes. spec specifies options used when tracing. See "MPI_XMPI" for the list of options you can use.

-i spec

Enables run-time instrumentation profiling for all processes. spec specifies options used when profiling. See "MPI_INSTR" for the list of options you can use.

-h host

Starts the processes on host (default is localhost).

-l user

Specifies the user name on the target host (default is local username).

-e var[=val]

Sets the environment variable var for the program and gives it the value val if provided. Environment variable substitutions (for example, $FOO) are supported in the val argument.

-sp paths

Sets the target shell PATH environment variable to paths. Search paths are separated by the colon (:) character.

program

Specifies the name of the executable to run.

args

Specifies command-line arguments to the program.

-f appfile

Starts the application described in appfile.

The first syntax is used for applications where all processes execute the same program on the same host. For example:

% mpirun -j -np 3 send_receive

runs the send_receive application with three processes and prints out the job ID.

The second syntax must be used for applications that consist of multiple programs or that run on multiple hosts or subcomplexes. In this case, each program called by the application is listed in a file called an appfile. For example:

% mpirun -t my_trace:k -f my_appfile

enables tracing, sets the prefix of the tracing output file to my_trace, specifies that the raw trace files are kept, and runs an appfile named my_appfile.

Creating an appfile

The format of entries in an appfile is line oriented. Lines that end with the blackslash (\) character are continued on the next line, forming a single logical line. A logical line starting with the pound (#) character is treated as a comment. Each program, along with its arguments, is listed on a separate logical line.

You can specify the -h, -l, -np, -e, and -sp options (from the mpirun command) in an appfile. Options following a program name are treated as the program's command line arguments and are not processed by mpirun.

The ranks of the processes in MPI_COMM_WORLD are guaranteed to be ordered according to their sequential order in an appfile.

The general form of an appfile entry is:

[-h remote_host][-e var[=val][...]] [-l user] [-sp paths] [-np #] program [args]

where

-h remote_host

Specifies the remote host where a remote executable is stored (defaults to local host). remote_host is either a host name or an IP address.

-e var=val

Sets the environment variable var for the program and gives it the value val if provided (defaults to not setting environment variables).

-l user

Specifies the user name on the target host (default is current user name).

-sp paths

Sets the target shell PATH environment variable to paths. Search paths are separated by the colon (:) character (default is do not override the path).

-np #

Specifies the number of processes to run (defaults to one).

program

Specifies the name of the executable to run. The executable is searched for in $PATH.

args

Specifies command line arguments to the program.

One way to set environment variables on remote hosts is to use the -e option in the appfile:

-h remote_host -e MPI_TOPOLOGY=val [-np #] program [args]

Alternatively, you can set environment variables using the .cshrc file on each remote host (only for users that use a /bin/csh-based shell).

mpijob

mpijob lists the HP MPI jobs running on the system. The mpijob syntax is shown below:

mpijob [-help] [-a] [-u] [-j id [...]]

where

-help

Prints usage information for the utility.

-a

Lists jobs for all users.

-u

Sorts jobs by user name.

-j id

Provides process status for job id.

When invoked, mpijob reports the following information for each job:

JOB

HP MPI job identifier.

USER

User name of the owner.

NPROCS

Number of processes.

PROGNAME

Program names used in the HP MPI application.

By default, your jobs are listed by job ID in increasing order. However, you can specify the -a and -u options to change the default behavior.

If you specify the -j option, mpijob reports the following information for each job:

RANK

Rank for each process in the job.

HOST

Host where the job is running.

PID

Process identifier for each process in the job.

LIVE

Option that indicates whether the process is running (an x is used) or has been terminated.

PROGNAME

Program names used in the HP MPI application.

An mpijob output using the -a and -u options is shown below. The output lists jobs for all users and sorts them by user name.


JOB        USER      NPROCS   PROGNAME
22623      charlie     12     /home/watts
22573      keith       14     /home/richards
22617      mick       100     /home/jagger
22677      ron          4     /home/wood

Note: You should invoke mpijob on the host on which you initiated mpirun.

mpiclean

mpiclean kills lingering processes in a running HP MPI application. mpiclean syntax has three forms:

where

-help

Prints usage information for the utility.

-v

Turns on verbose mode.

-m

Cleans up your shared-memory segments.

-j id

Kills the processes of job number id. You can specify multiple job IDs.

-sc name

Restricts the operation to the named subcomplex. This option is mutually exclusive with the -scid option.

-scid id

Restricts the operation to subcomplex number id. This option is mutually exclusive with the -sc option.

prog

Specifies the binary filename to kill. You can specify multiple filenames.

The first syntax is used for all servers. The second syntax is provided for backward compatibility on servers running under SPP-UX. The third syntax is used when an application aborts during MPI_Init, and the termination of processes does not destroy the allocated shared-memory segments.

The MPI library checks for the abnormal termination of processes while your application is running. In some cases, application bugs can cause processes to deadlock and linger in the system. When this occurs, you can use mpijob to identify hung jobs and mpiclean to kill all processes in the hung application.

There are two ways to kill an HP MPI application.The preferred way is to provide mpiclean with the application's job ID (obtained by using the
-j
option when invoking mpirun). However, you can only kill jobs that you own.

The second way is only provided on servers running under SPP-UX for backward compatibility. In this approach, you specify mpiclean with a list of binary filenames you own. mpiclean locates the matching processes and kills them.

You can restrict the second cleanup method to a single subcomplex by using the -sc or -scid options. This is helpful in cases where the same code is running independently on several subcomplexes and only one of these applications needs to be killed.

Note: You should invoke mpiclean on the host on which you initiated mpirun.

xmpi

xmpi invokes the XMPI utility. The xmpi syntax is shown below:

xmpi [-h][-bg arg][-bd arg][-bw arg][-display arg][-fg arg]
[-geometry arg][-iconic][-title arg]

where

-h

Prints usage information for the utility.

-bg arg

Specifies the background color.

-bd arg

Denotes the border color.

-bw arg

Specifies the width of the border in pixels.

-display arg

Designates the X-window display server to use.

-fg arg

Specifies the foreground color.

-geometry arg

Specifies size and position.

-iconic

Designates that the application start as an icon.

-title arg

Specifies the title of the application.

For more information, see "Using XMPI".

mpitrget

mpitrget combines raw trace files into a consolidated output file with
a .tr suffix. This output file is then loaded and reviewed on XMPI.

Note: mpitrget is obsolete for HP MPI V1.3. Consolidation of raw trace files now occurs automatically when your application calls MPI_Finalize.

mpitrstat

mpitrstat provides profiling information for HP MPI applications. Counter instrumentation has subsumed the functionality provided by mpitrstat. However, if you want more information about this command, see the appropriate man page.


[ Previous Page ] [ Next Page ] [ Contents ] [ Index Page ]