| |
The format of the qsub command is:
qsub script_file
where script_file is a text file containing, among other things:
- the name of your executable program
- options that tell torque how to run your program
Repeat: the operand to the qsub command MAY NOT BE an executable program (a binary
file).
WRONG: qsub my_executable_program.
Doing so will result in your program not running. Your job will be assigned a job id and will be
APPEAR if you immediately do a qstat command:
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
2256.argo-new.c jsmith batch a.out 6427 64 -- -- 336:0 R --
argo4-4/0
Looks like it's running - the status column (S) contains an R for running. But, after a minute or
so, the job will cancel and a message like the following will be in your standard error file:
-bash: line 1: /usr/spool/PBS/mom_priv/jobs/2256.argo-n.SC: cannot execute binary file
The 2256 in the message is the job id; your id, obviously, will be something else.
|
|
| |
What follows will be a series of examples, the first of which will present the most basic of
commands/options. Each subsequent example will show an additional option (shown in bold text). The
standard "hello_world" program, a program whose sole function
is to print the line "Hello World" to stdout and exit, will be used.
However, before doing so, let's review some basic batch job output and management
management principles.
After submitting the job to batch, a job id is assigned in the format: xxx.argo-new.cc.uic.edu where
xxx is the job-id. Assume your job is assigned job id 338. (You don't need the stuff after the number.)
To see the status of the job
Generic command |
Example |
| qstat job-id |
qstat 338 |
| tracejob job-id |
tracejob 338 |
For stdout and stderr, torque creates two files. The names of the files are
constructed from the job name, the letter e (for stderr) or o (for stdout),
and the job number.
So for your hello_world run that had job-id 338, you would have the following
files:
hello_world.o338 <-- this is stderr
hello_world.e338 <-- this is stdout
Example 1:
Content of file
/home/homes51/jsmith/hello_world
Explanation
The script file, consisting of a single line, has the path to and the name
of the executable:
Path: /home/homes51/jsmith
Name of executable: hello_world
Submission
[jsmith@argo-new jsmith]$ qsub script
1074.argo-new.cc.uic.edu
Results:
[jsmith@argo jsmith]$ ls -al *1074*
-rw------- 1 jsmith sys 0 Jan 17 2006 script.e1074
-rw------- 1 jsmith sys 12 Jan 17 2006 script.o1074
smith@argo-new jsmith]$ cat script.o1074
Hello World:
Example 2:
Content of file
#PBS -N hello_world
/home/homes51/jsmith/hello_world
Explanation
Line 1: Instead of using the default name for the job-the default is the
name of the script file, use hello_world. The pound sign must
be the first character on the line followed by the capital letters PBS (there
is no space between the # character and the letter P).
Line 2: explained in example 1.
Submission
[jsmith@argo-new jsmith]$ qsub script
1074.argo-new.cc.uic.edu
Results:
[jsmith@argo-new jsmith]$ ls -al *1074*
ls: *1074*: No such file or directory
jsmith@argo-new jsmith]$ ls -al *hello_world*
-rw------- 1 jsmith sys 0 Jan 17 2006 hello_world.e1074
-rw------- 1 jsmith sys 12 Jan 17 2006 hello_world.o1074
jsmith@argo-new jsmith]$ cat hello_world.o1074
Example 3:
Content of file
#PBS -o /home/homes51/jsmith/xxx.output
#PBS -N hello_world
/home/homes51/jsmith/hello_world
Explanation
Line 1: Instead of writing messages for stdout to the default
file (script.o<jobid>, write the messages to the file xxx.output
in my home directory.
Line 2: explained in example 2.
Line 3: explained in example 1.
Submission
[jsmith@argo-new jsmith]$ qsub script
1074.argo-new.cc.uic.edu
Results:
jsmith@argo-new jsmith]$ ls -al *1074*
ls: *1074*: No such file or directory
jsmith@argo-new jsmith]$ ls -al hello_world.e*
-rw------- 1 jsmith sys 0 Jan 17 2006 hello_world.e1074
jsmith@argo-new jsmith]$ ls -al xxx.output
-rw------- 1 jsmith sys 12 Jan 17 2006 xxx.output
jsmith@argo-new jsmith]$ cat xxx.output
Example 4:
Content of file
#PBS -e /home/homes51/jsmith/xxx.error
#PBS -o /home/homes51/jsmith/hello_world.output
#PBS -N new_name
/home/homes51/jsmith/hello_world
Explanation
Line 1: Instead of writing job error messages for stderr to the default
file (script.e<jobid>, write the error messages to the file xxx.error
in my home directory.
Line 2: explained in example 3.
Line 3: explained in example 2.
Line 4: explained in example 1.
Submission
[jsmith@argo-new jsmith]$ qsub script
1074.argo-new.cc.uic.edu
Results:
jsmith@argo-new jsmith]$ ls -al *1074*
ls: *1074*: No such file or directory
jsmith@argo-new jsmith]$ ls -al hello_world.e*
ls: hello_world.e*: No such file or directory
jsmith@argo-new jsmith]$ ls -al xxx*
-rw------- 1 jsmith sys 0 Jan 17 2006 xxx.error
-rw------- 1 jsmith sys 12 Jan 17 2006 xxx.output
jsmith@argo-new jsmith]$ cat xxx.output
Example 5:
Content of file
#PBS -m bea
#PBS -e /home/homes51/jsmith/hello_world.error
#PBS -o /home/homes51/jsmith/hello_world.output
#PBS -N new_name
/home/homes51/jsmith/hello_world
Explanation
Line 1: Send email to my mailbox when the job begins (b),
when the job ends (e), and if the job aborts with an error (e).
You may use one, two, or all three flags. If you use multiple flags, they
must be on one line. The maildrop is identified in the .forward file in your home
directory. The default location is your UIC mailbox though you may change it to anything
you want.
Line 2: explained in example 4.
Line 3: explained in example 3.
Line 4: explained in example 2.
Line 5: explained in example 1.
Submission
[jsmith@argo-new jsmith]$ qsub script
1074.argo-new.cc.uic.edu
Results:
On my UIC email account, tigger, there are two pieces of email.
Date: Tue, 17 Jan 2006 13:08:43 -0600
From: adm <adm@argo-new.cc.uic.edu>
To: jsmith@argo.cc.uic.edu
Subject: PBS JOB 1074.argo-new.cc.uic.edu
PBS Job Id: 1074.argo-new.cc.uic.edu
Job Name: hello
Begun execution
Date: Tue, 17 Jan 2006 13:08:44 -0600
From: adm <adm@argo-new.cc.uic.edu>
To: jsmith@argo-new.cc.uic.edu
Subject: PBS JOB 1074.argo-new.cc.uic.edu
PBS Job Id: 1074.argo-new.cc.uic.edu
Job Name: hello
Execution terminated
Exit_status=0
resources_used.cput=00:00:00
resources_used.mem=0kb
resources_used.vmem=0kb
resources_used.walltime=00:00:01
Since the program did not abort, no mail was sent for the "a"
option.
Example 6:
Content of file
#PBS -V
#PBS -m bea
#PBS -e /home/homes51/jsmith/hello_world.error
#PBS -o /home/homes51/jsmith/hello_world.output
#PBS -N new_name
/home/homes51/jsmith/hello_world
Explanation
Line 1: Export all my session environmental variables to the job.
Line 2: explained in example 5.
Line 3: explained in example 4.
Line 4: explained in example 3.
Line 5: explained in example 2.
Line 6: explained in example 1.
Submission
[jsmith@argo-new jsmith]$ qsub script
1074.argo-new.cc.uic.edu
Results:
No results to display
Some additional options; there are more but these seem relevant
#PBS -l ncpus=x
Number of CPUs to be used
#PBS -l mem=xxxmb
Amount of memory to be used
#PBS -l walltime=hh:mm:ss
Maximum wall time for job execution
#PBS -l nodes=xxxxx
Identify one or more nodes on which the program should execute. For example
-l nodes=argo4-4
It is very important that the node assignment statement, if used, comes before
the statement that identifies the executable. In the above example, the line is
/home/homes51/jsmith/hello_world.
|
|