| ACADEMIC COMPUTING and COMMUNICATIONS CENTER | |||||||||
| ||||||||||||||||||||||||||||||||||
ARGO-NEW: Running Jobs | ||||||||||||||||||||||||||||||||||
| Overview | ||||||||||||||||||||||||||||||||||
|
How one runs a program on a cluster is VERY DIFFERENT from how one runs a job on a single machine with one or multiple CPUs (for example, tigger). To begin with, you do not run your executable on the machine (the master) where you create the executable. The following point cannot be emphasized enough:
There are monitors that alert systems to user programs running on the master. Running a program on the master is a violation of ACCC policy and can result in suspension and termination of your argo account. There are two types of programs that may be executed on the cluster:
For the purposes of the ACCC cluster, a sequential job is a single instance program that runs on one and only one node. A parallel job is composed of:
Serial version of the classic hello_world program - source in C
#include <stdio.h>
void main(int argc, char** argv) {
printf("Hello-world\n");
}
Parallel version of the classic hello_world program using MPI - source in C
#include <stdio.h>
#include "mpi.h"
void main(int argc, char **argv) {
int rank;
int size;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
printf("Hello-world, I'm rank %d; Size is %d\n", rank, size);
MPI_Finalize();
}
|
||||||||||||||||||||||||||||||||||
| Torque | ||||||||||||||||||||||||||||||||||
|
Torque is a networked subsystem for submitting, monitoring, and controlling a workload of jobs on the cluster. ALL USER JOBS MUST be run via torque. Years ago, only batch jobs could execute on the cluster. THAT IS NOT THE CASE NOW: torque does not restrict jobs to just batch execution; interactive jobs with GUIs and users interacting with the GUI may be run. |
||||||||||||||||||||||||||||||||||
| Queues | ||||||||||||||||||||||||||||||||||
|
Jobs (programs) are submitted to queues for execution. There is one available queue (others may be added if the need arises):
|
||||||||||||||||||||||||||||||||||
| Environmental Variables | ||||||||||||||||||||||||||||||||||
|
There are two environments, each with its own define variables, available to you:
Shell environmental variablesTo see a list of your shell environmental variables, type env | more at your shell prompt. To pass ALL the variables (not just a subset) to your job, include the -V option on the qsub command.Torque environmetal variablesEvery user job has the following torque enviromental variables available to it:
|
||||||||||||||||||||||||||||||||||
| Commands | ||||||||||||||||||||||||||||||||||
The following five commands are important and you will use them often:
|
||||||||||||||||||||||||||||||||||
| Job Output and Management | ||||||||||||||||||||||||||||||||||
|
After submitting the job, a job id is assigned in the format: xxx.argo-new.cc.uic.edu where xxx is the job-id. To see the status of your job, use: qstat job-id For stdout and stderr, batch creates two files. The names of the files are constructed from the job name, the letter e (for stderr) or o (for stdout), and the job number. So for your hello world run that had job-id 338, you would have the following files:
Let's take a look:
Well the error file is empty so that's a good sign. Let's see what we have:
Gives:
And, that's what we should have. |
||||||||||||||||||||||||||||||||||
| Node Selection and Properties | ||||||||||||||||||||||||||||||||||
|
Every node has multiple properties associated with it. The property that clients are most familiar is the node name and it serves as the most-commonly used criteria for selecting a node. Other properties may be used to identify nodes. The following table list all the properties associated with the compute nodes:
The property cpu.XXXXXX gives the type of processor on the machine. Currently, there are are two types of processors available: cpu.amd (for AMD Opteron) and cpu.xeon (for an Intel Xeon). The property smp identifies machines that are dual processors whereas the property no.smp means a uniprocessor. The generic sytax of the qsub command is:
where node_spec is:
A series of examples follows:
Multiple virtual processors per node can be expressed by adding the term ppn=# (for processor per node) to a node expression. For example, to request two virtual processors on each of three nodes: qsub -l nodes=3:ppn=2 |
||||||||||||||||||||||||||||||||||
| How Much Is Argo Being Used? | ||||||||||||||||||||||||||||||||||
|
Want to check how much work argo has done? There's are Web pages that summarize usage on argo-new, which include links to personalized info for each user. Info from previous months are also available, with URLs of the form:
or just click here For example, the July 2005 document is available at www.uic.edu/depts/accc/hardware/argo-new/200507.html The information on the current month's page is updated every four hours. |
||||||||||||||||||||||||||||||||||
| Argo-new Compute Cluster | Previous: Available Software | Next: MPI |
| 2007-3-7 ACCC Systems Group |
|