This chapter provides information about utilities used to analyze
HP MPI applications. The topics covered are:
Counter instrumentation provides cumulative statistics for your
applications. Counter instrumentation is the recommended method for
collecting profiling data because it is faster and less intrusive than
mpitrstat.
To create an instrumentation profile, enter:
% mpirun -i spec -np # program
where
-i spec
Enables run-time instrumentation profiling for all processes. spec provides options used when profiling. See "MPI_INSTR" for information about options you can use.
You must specify the -i option before the program
name.
-np #
Specifies the number of processes to run.
Specifies the name of the executable to run.
HP MPI provides the nonstandard MPIHP_Trace_on and
MPIHP_Trace_off routines to collect profile information for selected
code sections only (by default, the entire application is profiled from
MPI_Init to MPI_Finalize). You insert the MPIHP_Trace_on and
MPIHP_Trace_off pair around code that you want to profile. Then, you
build the application and invoke mpirun using the appropriate syntax.
A sample instrumentation profile for the compute_pi.f application is shown below. In this case, instrumentation was invoked by entering:
% mpirun -i -np 2 compute_pi.exe
The overhead time in the profile represents the time a process or routine spends inside MPI. For example, the time a process spends doing message packing.
The blocking time in the profile represents the time a process or routine is blocked, waiting for communication to complete before resuming execution.
Version: HP MPI 01.03.00.00 - HP-UX 10.20
Date: Thu Nov 6 11:12:28 1997
Scale: Wall Clock Seconds
Processes: 2
User: 25.23%
MPI: 74.77% [Overhead:74.77% Blocking:0.00%]
Application Summary by Rank:
Rate Duration Overhead Blocking User MPI
------------------------------------------------------------------
0 0.710857 0.504448 0.000000 29.04% 70.96%
1 0.723631 0.537273 0.000000 25.75% 74.25%
Routine Summary:
Routine Calls Overhead Blocking
------------------------------------------------
MPI_Bcast 2 1.898292 0.000000
min 0.157720 0.000000
max 0.249885 0.000000
avg 0.189829 0.000000
MPI_Init 2 1.616565 0.000000
min 0.135646 0.000000
max 0.179480 0.000000
avg 0.161656 0.000000
MPI_Reduce 2 0.944703 0.000000
min 0.000089 0.000000
max 0.128702 0.000000
avg 0.094470 0.000000
MPI_Finalize 2 0.941750 0.000000
min 0.061623 0.000000
max 0.118828 0.000000
avg 0.094175 0.000000
Routine Summary by Rank:
Routine Rank Calls Overhead Blocking
--------------------------------------------------------
MPI_Bcast 0 1 0.249885 0.000000
1 1 0.157720 0.000000
MPI_Init 0 1 0.135646 0.000000
1 1 0.157633 0.000000
MPI_Reduce 0 1 0.000089 0.000000
1 1 0.119893 0.000000
MPI_Finalize 0 1 0.118828 0.000000
1 1 0.102027 0.000000
Message Summary:
Routine Message Bin Count
------------------------------------
MPI_Bcast [0.32] 2
MPI_Reduce [0.32] 2
Message Summary by Rank:
Routine Message Bin Count Rank
------------------------------------------
MPI_Bcast [0.32] 1 0
[0.32] 1 1
MPI_Reduce [0.32] 1 0
[0.32] 1 1
XMPI is an X/Motif graphical user interface for running applications, monitoring processes and messages, and viewing trace files. XMPI provides a graphical display of the state of processes within an HP MPI application.
XMPI is useful when analyzing programs at the application level (for example, examining HP MPI data types and communicators). Unlike other profilers and debuggers, you can run XMPI without having to recompile or relink your application.
XMPI runs in one of two modes: postmortem mode or interactive mode. In postmortem mode, you can view trace information for each process in your application. In interactive mode, you can monitor process communications by taking snapshots while your application is running.
The default X resource settings that determine how XMPI displays on your workstation are stored in /opt/mpi/lib/X11/app-defaults/XMPI. See "XMPI resource file" for a list of these settings.
To use XMPI's postmortem mode, you must first create a trace file. Then, you can load this file into XMPI to view state information for each process in your application.
To create a trace file, enter:
% mpirun -t spec -np # program
where
-t spec
Enables run-time raw trace generation for all processes. spec specifies options used when tracing. See "MPI_XMPI" for information about options you can specify.
You must specify the -t option before the program
name.
-np #
Specifies the number of processes to run.
program
Specifies the name of the executable to run.
When you use the -t option to enable trace generation, you must specify
the prefix name used for each raw trace file as part of spec. Then, when
mpirun is invoked, a raw trace dump, prefix.n, is created for each
application process where n ranges from 0 to (# - 1). MPI_Finalize
consolidates all the raw trace dump files into a single file (prefix.tr) that
you can load into XMPI.
HP MPI provides the nonstandard MPIHP_Trace_on and
MPIHP_Trace_off routines to help troubleshoot application problems.
You insert the MPIHP_Trace_on and MPIHP_Trace_off pair around
suspect code in your application. Then, you build the application and
invoke mpirun with -t:off to enable application tracing. The trace
information collected is only for the code between MPIHP_Trace_on and
MPIHP_Trace_off. You can then run the trace file in XMPI to identify
problems during application execution.
Use these instructions to view a trace file:
xmpi to open the XMPI main window (see "xmpi" for
information about other options you can specify).Note: When viewing trace files containing multiple segments (that is, multipleMPIHP_Trace_onandMPIHP_Trace_offpairs), XMPI prompts you for the number of the segment you want to view. If you want to view a different segment later, simply reload the trace file and specify the new segment number when prompted.
The XPMI Trace dialog consists of an icon bar across the top, the current magnification and dial time just below, and a trace log display area below that.
The icon bar contains icons that (from left to right):
To set the magnification for viewing a trace file, select the Increase or Decrease icon on the icon bar.
The dial time indicates how long the application has been running in seconds.
The trace log display area shows a separate trace for each process in the application. The dial time is represented as a vertical line. The rank for each process is shown where the dial time line intersects a process trace.
The state of a process at any time is indicated by one of three colors:
Signifies that a process is running outside MPI.
Denotes that a process is blocked, waiting for communication to finish before the process resumes execution.
Represents a process's overhead time inside MPI (for example, time spent doing message packing).
Blocking point-to-point communications are represented by a trace for each process showing the time spent in system overhead and time spent blocked waiting for communication. A line is drawn connecting the appropriate send and receive trace segments. The line starts at the beginning of the send segment and ends at the end of the receive segment.
For nonblocking point-to-point communications, a system overhead segment is drawn when a send and receive are initiated. When the communication is completed using a wait or a test, segments are drawn showing system overhead and blocking time. Lines are also drawn between matching sends and receives, except in this case, the line is drawn from the segment where the send was initiated to the segment where the corresponding receive completed.
Collective communications are also represented by a trace for each process showing the time spent in system overhead and time spent blocked waiting for communication.
Owing to the use of partial tracing, some send and receive segments may not have a matching segment. In this case, a stub line is drawn out of the send segment or into the receive segment.
To play the trace file, select the Play or Fast forward icons on the icon bar. For any given dial time, the state of the trace file is reflected in the main window, the Focus dialog, the Datatype dialog, and the Kiviat dialog.
Use these instructions to view process information from a trace.
The current state of a process is indicated by the color of the signal light (either green, red, or yellow) in the hexagon. This color corresponds to the elapsed run time (current dial time) of the trace file in the XMPI Trace dialog. As the trace file is played, the color changes as processes communicate with each other.
The XMPI Focus dialog consists of a process area and a message queue area.
The values in the process area and message queue area fields correspond to the current dial time of the trace file in the XMPI Trace dialog. As the trace file is played, the values in the fields change as processes communicate with each other.
The process area describes the state of a process together with the name and arguments for the HP MPI function being executed. The fields include:
Displays the rank of the displayed function's peer process. A process is identified by its rank in MPI_COMM_WORLD, a slash (/), and the rank of the process within the current communicator.
Shows the communicator being used by the HP MPI function. If you select the icon to the right of the comm field, the hexagons for processes that belong to the communicator are highlighted in the XMPI main window.
Displays the value of the tag argument associated with the message.
Shows the count of the message data elements associated with the message when it was sent. Select the icon to the right of the cnt field to open the XMPI Datatype dialog.
The XMPI Datatype dialog displays the type map of the data type associated with the message when it was sent. This data type can be one of the predefined data types or a user-defined data type.
The data type shown corresponds to the current dial time of the trace file in the XMPI Trace dialog. As the trace file is played, the data type changes as processes communicate with each other.
The message queue area describes the current state of the queue of messages sent to the process but not yet received. The fields include:
Displays the rank of the process sending the message. A process is identified by its rank in MPI_COMM_WORLD, a slash (/), and the rank of the process within the current communicator.
Shows the communicator being used by the HP MPI function. If you select the icon to the right of the comm field, the hexagons for processes that belong to the communicator are highlighted in the XMPI main window.
Displays the value of the tag argument associated with the message when it was sent.
Shows the count of the message data elements associated with the message when it was sent. If you select the icon to the right of the cnt field, the XMPI Datatype dialog displays. The XMPI Datatype dialog displays the type map of the data type associated with the message when it was sent.
Displays the number of copies of the message that was sent. For example, if a process sends 10 messages to another process where six of the messages have one type of message envelope and the remaining four have another, the copy field toggles between 6 of 10 and 4 of 10. In this case, a message envelope consists of the sender, the communicator, the tag, the count, and the data type.
This behavior results from treating the six messages that all have the same envelope as one copy and the remaining four messages as a different copy. That way, if a communication involves a hundred messages all having the same envelope, you can work with a single copy rather than a hundred copies.
Kiviat graphs are used to display performance data. Use these instructions to view kiviat information from a trace file.
The XMPI Kiviat window shows, in a segmented pie-chart format, the cumulative time up to the current dial time spent by each process in the running, overhead, and blocked states.
The cumulative time for each process corresponds to the current dial time of the trace file in the XMPI Trace dialog. As the trace file is played, the cumulative time changes as processes communicate with each other.
You can use the kiviat view to determine whether processes are load balanced and applications are synchronized. If an application is load balanced, the amount of time processes spend in each state should be equal. If an application is synchronized, the segments representing each of the three states should be concentric.
Interactive mode allows you to load and run an existing appfile to view state information for each process in your application.
Use these instructions to run and view an appfile:
xmpi to open the XMPI main window (see "xmpi" for
information about other options you can specify).The current state of a process is indicated by the color of the signal light (either green, red, or yellow) in the hexagon. These process hexagons disappear when the application has run to completion.
Interactive mode provides the snapshot utility to help debug
applications that hang. If automatic snapshot is enabled, XMPI takes
periodic snapshots of the application and displays state information for
each process on the XMPI main window, the XMPI Focus dialog, and the
XMPI Datatype dialog. You can use this information to view the state of
each process while the application hangs.
If automatic snapshot is disabled, XMPI displays information for each process when the application begins, but this information is not updated.
Regardless of whether automatic snapshot is enabled, you can take application snapshots manually by selecting Snapshot from the Application menu. In this case, XMPI displays information for each process, but this information is not updated until you take the next snapshot.
You can take snapshots only when an appfile is running. Also, you cannot replay snapshots like trace files.
At any time while your application is running, you can select Dump from the Trace menu to open the XMPI Dump dialog.
The Dump option is only available if you have previously selected the
Tracing button on the mpirun options trace dialog. Selecting Dump
consolidates all raw trace-file data collected up to that point into a
single .tr output file.
The single field specifies the name of the consolidated .tr output file. The value you specified for the Prefix field in the mpirun options trace dialog is automatically loaded. You can use this name or choose another. After you have created the .tr output file, you can resume snapshot monitoring.
You can also select Express from the Trace menu while your application is running to open the XMPI Express dialog.
As with the Dump option, the Express option is only available if you have previously selected the Tracing button on the mpirun options trace dialog.
The fields include:
Specifies that the contents of each process buffer (whether partial or full up to that point) are written to a raw trace file. These raw trace files are then consolidated in a .tr output file (previously specified in the Prefix field of the mpirun options trace dialog). Last, the .tr output file is loaded and displayed in the XMPI Trace dialog for viewing.
When you select this field, the XMPI Confirmation dialog displays asking if you are sure you want to terminate the application. You must select Yes before processing will continue.
After the .tr output file is loaded and displayed in the XMPI Trace dialog, you cannot resume snapshot monitoring (the application should have already terminated).
Specifies that the contents of each process buffer are written to a raw trace file only after the buffer becomes full. These raw trace files are then consolidated to a .tr output file (previously specified in the Prefix field of the mpirun options trace dialog). Last, the .tr output file is loaded and displayed in the XMPI Trace dialog for viewing.
After the .tr output file is loaded and displayed in the XMPI Trace dialog, you cannot resume snapshot monitoring even though the application may still be running.
When using interactive mode, XMPI gathers and displays data from the running appfile or a trace file.
When an application is running, the data source is the appfile, and automatic snapshot is enabled. Even though the application may be creating trace data, the snapshot function does not use it. Instead, the snapshot function acquires data from internal hooks in HP MPI.
At any point in interactive mode, you can load and view a trace file using the View or Express commands under the Trace menu. In this case, the data source switches to the loaded trace file, and the snapshot function is disabled. For HP MPI V1.3, you must rerun your application to switch the data source from a trace file back to an appfile.
You should initially run your appfile using the XMPI default settings. You can change these default settings and your viewing options later if you like.
Use these instructions to change XMPI's default settings and your viewing options:
xmpi to open the XMPI main window (see "xmpi" for
information about other options you can specify).The fields include:
Enables the automatic snapshot function. If automatic snapshot is enabled, XMPI takes snapshots of the application you are running and displays state information for each process.
If automatic snapshot is disabled, XMPI displays information for each process when the application begins. However, you can only update this information manually. Disabling automatic snapshot may lead to buffer overflow problems because the contents of each process buffer are unloaded every time a snapshot is taken. For communication-intensive applications, process buffers can quickly fill and overflow.
You can enable or disable automatic snapshot while your application is running. This could be useful during troubleshooting when the application has run to a certain point and you want to disable automatic snapshot to study process state information.
Determines how often XMPI takes a snapshot when automatic snapshot is enabled.
The single field specifies the size of each process buffer. When you run an application, state information for each process is stored in a separate buffer. You may need to increase buffer size if overflow problems occur.
The fields include:
Enables printing of the HP MPI job ID.
Enables verbose mode.
Enables run-time raw trace generation for all application processes. If you select the Tracing button, the mpirun options trace dialog is opened.
The fields include:
Specifies the prefix name for the file where each process writes its own raw trace data. Each process creates its own filename by concatenating the prefix, a period, and the process's global rank number. This is a required field.
Specifies no clobber, which means that an HP MPI application aborts if a file with the name specified in the Prefix field already exists.
Specifies that trace generation is initially turned off.
Specifies a simpler tracing mode by omitting
MPI_Test, MPI_Testall, MPI_Testany, and
MPI_Testsome calls that do not complete a request.
Specifies that raw trace files are not consolidated into a
single .tr output file when MPI_Finalize is called.
Raw trace-file consolidation can add substantially to
the MPI_Finalize time when working with large
applications.
Specifies that raw trace files are saved after they are
consolidated by MPI_Finalize. The default is to
delete raw trace files after consolidation.
Denotes the buffering size in kilobytes for dumping raw trace data. Actual buffering size may be rounded up by the system. The default buffering size is 4096 kilobytes. Specifying a large buffering size reduces the need to flush raw trace data to a file when process buffers reach capacity. Flushing too frequently can increase the overhead for I/O. If this problem occurs, increase the buffering size.
CXperf allows you to profile each process in an HP MPI application. CXperf replaces the functionality formally provided by CXpa. The profile information is stored in a separate performance data file. During analysis, you merge the data from these separate files into a single performance data file for the application.
With CXperf, you can analyze data using one or more of the following metrics:
You can display the data as a 3D profile, a 2D profile, a report, or a dynamic call graph. For more information, see Using CXperf.
MPI provides a profiling interface for collecting statistics and measuring performance. The profiling interface allows you to intercept calls to the MPI library at link time and perform some action. For example, you may want to measure the time spent in each call to a certain library routine or create a logfile.
All routines in the MPI library begin with the MPI prefix. Based on the
MPI standard, these routines are also callable using the PMPI prefix (for
example, PMPI_Send).
To use the profiling interface, you write wrapper versions of the MPI
library routines you want the linker to intercept. These wrapper routines
collect data for some statistic or perform some other action. The wrapper
then calls its corresponding routine in the MPI library using its PMPI
prefix.
For example, suppose you want to measure the elapsed time for each call
to MPI_Send. In this case, you create a wrapper called MPI_Send that
uses MPI_Wtime to measure the elapsed time for each call. After
MPI_Wtime completes, your wrapper then calls PMPI_Send from the
MPI library to actually send the message.