| ACADEMIC COMPUTING and COMMUNICATIONS CENTER | |||||||||
Gaussian 03 on argo-new | ||
| To use Gaussian 03, you need: | ||
|
||
| C Login Shell | ||
|
Your login shell must be C. By default, all newly-created accounts use the bash shell (/bin/bash). To find out what shell you currently use:
If you see /bin/bash, then you are using bash. If you see /bin/csh, then you're using C. To change your shell to C, enter the following command:
The result should be
If you include your netid in the command:
you will be prompted to enter your argo login password for security reasons:
The C-shell will be available to you the next time you login. |
||
| .Cshrc file | ||
|
You will need a .cshrc file (the leading period is required followed by all lowercase letters) in your home directory. If it doesn't exist, create it. The file must include the following lines (copy it exactly as it appears below):
Warning: avoid using Windows/DOS utilities to cut and paste the above into your .cshrc file; it may add DOS-type carriage return/line feeds that are not the same as those in LINUX, thereby causing an error when you login. The permissions for the file should be 755:
|
||
| Default.Route file | ||
|
It is recommended that you have a copy of the Gaussian03 system default file. Changes made to your copy will override system defaults. To get a copy:
The system defaults are:
The permissions for the file should be 755:
|
||
| .Tsnet.config file | ||
|
The .tsnet.config (leading period is required followed by all lowercase letters) is required. The file is your local copy of the Linda global system configuration file and it will allow you to customize the Linda/g03 environment. To get a copy which must be placed in your home directory:
The permissions for the file should be 644:
To understand the changes you must make to the file, you will first need to know more about the subg03 script. |
||
| Subg03 script | ||
|
The subg03 script (also referred to as g03sub) starts a Gaussian job. To get a copy:
Change two lines in your copy; the easier of the two to explain is the second one:
The filename must have as its last three characters .in. If the name does not end with .in, use the Linux mv command to rename the file. For example, your input file is named mydata:
The name is placed after the equal sign in the set input line:
The location of your input file is defined in the third line:
If you prefer to have your input files in your home directory, then:
The first line in the subg03 script indicates the number of worker nodes to use to run your job. There are two mutually exclusive formats for the number of nodes line. Use one or the other but not both in the same run. The two generic formats:
The first format:
Here, you are telling the batch system to execute your g03 job on four nodes. Since you identify only the number of nodes and not which ones, the batch system will use system load averages as the basis for deciding which four nodes. The second format:
Here, you identify the particular nodes to be used as workers. Examples:
The order of the nodes in the set NODES statement is not important; the nodes do not need to be in ascending or descending order. Also, it is not required that the nodes be in the same group though it is STRONGLY recommended. The following is an example of mixing nodes from different groups as well as having them in a random order:
Three of the nodes are from group one; the other, from group three. There is significance to the first node in the set NODES statement. More on that later. Back to the .tsnet.config file. The first line in the file is
where xxxxxx xxxxxx xxxxxx xxxxxx are the names of the nodes listed in the set NODES statement. For example, if you have:
then the Tsnet.Appl.nodelist must have those same nodes:
However, it is not necessary for the order of nodes in the two statements to be the same. The following is permitted:
Notice that in the Tnset.Appl.nodelist statement the first node is argo1-3; the first node in the set NODES statement is argo1-4. |
||
| Work files - Handling and Location | ||
|
There are several work files that are used in the course of a g03 computation:
The files all have the same name; it is the file extension that identifies the type. By default, these files are given a name that appends the job process ID to the characters Gau-. For example, the following all have the name Gau-13089:
-rw-r--r-- 1 jsmith student 21037056 Feb 14 11:48 Gau-13089.rwf
-rw-r--r-- 1 jsmith student 5107056 Feb 14 11:48 Gau-13089.chk
-rw-r--r-- 1 jsmith student 0 Feb 14 11:47 Gau-13089.int
-rw-r--r-- 1 jsmith student 0 Feb 14 11:47 Gau-13089.d2e
-rw-r--r-- 1 jsmith student 524288 Feb 14 11:48 Gau-13089.scr
If your job completes successfully, then the subg03 script will delete the files. If you want to retain the files, the comment out the following lines in your subg03 file:
if ( "$exits" == "Normal termination" ) then
echo "Normal termination - erasing: "%JOB%.rwf %JOB%.chk %JOB%.int %JOB%.2de JOB%.src
rm -f %JOB%.rwf; rm -f %JOB%.chk; rm -f %JOB%.int; rm -f %JOB%.2de; rm -f %JOB%.src
endif
One of the reasons to keep the files is that they may be used with GaussView. However, the files
will use excessive disk space. Retain only those files that are necessary.
You may override the default location and place all of the files in some other directory via the environmental variable GAUSS_SCRDIR. For example, to place them in your scratch directory, enter the following command at the LINUX prompt:
As a result, all subsequent runs during the current login session will place the files in your scratch directory. To make this change permanent, enter the command in your .cshrc file. If, instead, you wish to relocate one or more of the individual files, this is done by a statement in your input file. For example, if you want to place the rwf file in your scratch directory but leave the remaining files in your home, then you would do two things. First, you would undo any global change made via the setenv statement. Two, you would include the following in your input file:
As a result, the Read_Write file is written to scratch while the other files are written to your home directory. Notice, that you must hardcode your netid into the statement; using the environmental variable $USER will not work. If you want to relocate a file and then rename it, include the file name after the path:
The system will append the appropriate extension, in this example, rwf. Argo is a 32-bit machine. No file may be no larger than 2GB. If any of your scratch files will exceed 2GB, then you must "break up" the file into multiple pieces with each piece no larger than 2GB. What follows is the generic syntax to do this:
The format also applies to the Integral file and/or the Derivative file. Obviously, for those files, the rwf would be changed to the appropriate extension. What follows is an example of how to partition the rwf file into two separate pieces in the scratch filesystem:
Several things need to be addressed:
You are limited to a maximum of 8 partitions. One of the advantages of using the scratch system is that it has a substantial amount of space. However, there is a performance penalty for using scratch, which is an NFS-mounted filesystem. If your rwf file is less than 5GB, you may use the /tmp filesystem on one of the worker nodes running your job. To do that (the rwf file will again be used for example purposes):
This will partition the file into two 1.9 GB pieces. Which node among the worker nodes in your job is used; each of those worker nodes has its own /tmp directory? This is when the name of the first node in your set NODES comes into play. The node that is listed first is the one whose /tmp filesystem is used. Examples:
Since each of the /tmp filesystems is only 5GB, it is best to select a node that is currently not in use. To do that, use the qstat command:
In the following example output, only the Gaussian jobs have been included
and the nodes have been highlighted in bold blue text:
501.argo.cc.uic user1 scali_ex job1.pbs 2218 2 -- -- -- R 43:38
argo2-4/0+argo2-3/0
505.argo.cc.uic user2 scali_ex job2.pbs 897 2 -- -- -- R 43:27
argo2-2/0+argo2-1/0
509.argo.cc.uic user3 scali_ex job3.pbs 26061 4 -- -- -- R 28:10
argo1-4/0+argo1-3/0+argo1-2/0+argo1-1/0
520.argo.cc.uic user4 scali_ex job4.pbs 8512 4 -- -- -- R 19:50
argo1-3/1+argo1-4/1+argo1-2/1+argo1-1/1
Notice the first nodes (argo2-4, argo2-2, argo1-4, argo1-3) in the list. Gaussian
users are not required to place their scratch files in /tmp. But, assume each
user does. Therefore, amend your set NODES statement accordingly. If you opt
to use four worker nodes in group one (a group that is already being used by
two other Gaussian jobs: #509 and 520), make the first node in your set NODES
statement either argo1-2 or argo1-1:
As a result, there is a greater likelihood of the availablility of space in the /tmp filesystem because you will not be sharing it. You may have partitions on different types of filesystems: combining a local filesystem (/tmp) with an NFS-mounted filesystem (/scratch). For example:
Notice that the first partition, piece1, is written to the /tmp filesystem on the local node whereas the second partition, piece2 is written to the /scratch. This has two advantages:
|
||
| Warnings and Problems | ||
Once you now where to find the files, you have the ability to remote shell (rsh) into the /tmp directory of that node to locate, and then delete, the offending file: rsh argox-x ls -alrt /tmp Assume, for example, your netid is jsmith and you've run out of space in /tmp on argo1-1. To list what files are there (all files - Gaussian as well as anything else):
Sample output:
srwx------ 1 root 99 0 Sep 12 2005 .fam_socket
drwxrwxrwt 2 xfs xfs 4096 Mar 20 2007 .font-unix
drwx------ 2 root root 16384 Sep 12 2005 lost+found
-rw-r--r-- 1 jsmith student 21037056 Feb 14 11:48 Gau-13089.rwf
-rw-r--r-- 1 jsmith student 5107056 Feb 14 11:48 Gau-13089.chk
-rw-r--r-- 1 jsmith student 0 Feb 14 11:47 Gau-13089.int
-rw-r--r-- 1 jsmith student 0 Feb 14 11:47 Gau-13089.d2e
-rw-r--r-- 1 jsmith student 23411 Feb 14 11:48 Gau-13089.scr
-rw-r--r-- 1 mhoma student 31037056 Feb 11 11:48 Gau-13089.rwf
-rw-r--r-- 1 mhoma student 104686 Feb 11 11:48 Gau-13089.chk
-rw-r--r-- 1 mhoma student 234666 Feb 11 11:47 Gau-13089.int
-rw-r--r-- 1 mhoma student 0 Feb 11 11:47 Gau-13089.d2e
-rw-r--r-- 1 mhoma student 0 Feb 11 11:48 Gau-13089.scr
srwxrwxrwx 1 mhoma student 163840 Oct 23 21:42 mpd2.console_bwang9
srwxrwxrwx 1 mhoma student 2673333 Apr 25 2007 mpd2.console_ksengu1
srwxrwxrwx 1 mhoma student 67122 Mar 19 2007 mpd2.console_mlahir2
srwxrwxrwx 1 root root 0 Mar 20 2007 mpd2.console_root
The output includes ALL files, yours (jsmith, highlighted in blue for
illustrative purposes) as well as other users. The files are not just those
used by Gaussian; other non-Gaussian jobs may be putting files in /tmp. To
restrict the listing to only your Gaussian files:
The system permits you to delete only files you own. For example, to delete just your rwf file:
To delete all of your Gaussian files:
It's also important to note that deleting your Gaussian files doesn't ensure that your next job submission (using argo1-1 for temporary files) will complete successfully; you may encounter the same error. You've erased your files, freeing space. Remember, there may be other files in /tmp. You don't own those files and you can't delete them. The file may be using just enough space to cause a repeat of your error, just later. If that's the case, you will need to use a different node.
the problem is one of the nodes on which you were running your job has crashed.
the problem is that the job has lost communication with one of the nodes on which you were running your job. Select other nodes on which to run your job and resubmit it. If you see something like:
one possible cause (there are many) is a permission problem with one or more of the nodes specified in the set NODES statement (in the script subg03). For example, if your set NODES statement is:
|
||
| Starting your Gaussian job | ||
|
The commands to execute a Gaussian job are in the file subg03. To run the job, enter:
DO NOT do the following:
|
||
| Recommendations (Important - Please Read) | ||
|
||
| Additional Help | ||
|
|
||
| 2008-4-24 ACCC Systems Group |
|