Queues | Submitting Jobs | Using the Grid | Script Syntax
Job Submission
This page contains information on submitting jobs to the EaStCHEM RCF clusters and compute grid.
The queuing system on the clusters is the Sun Grid Engine (SGE). This system tries to allow fair access to all users to the compute resources. Although the syntax for SGE can take a little getting used to we have put lots of tools together to help you along.
Queues
At the centre of the job submission system are the queues, you will almost always be submitting jobs to a particular queue. The queue names are designed to be mnemonics for the type of job you want to submit. For example, on hare.epcc.ed.ac.uk there is a queue called parallel-short.q which you would use if you wanted to run a short (less than 3 hours) parallel calculation. A list of the queues and there purpose is given in the table below. The descriptions can also be accessed on the clusters using the qlist command, e.g.
> qlist
hare.epcc.ed.ac.uk |
burke.st-andrews.ac.uk |
||
fat-medium.q |
8-processor SMP, 1 week limit |
fat.q |
8-processor SMP, no time limit |
fat-long.q |
8-processor SMP, 4 week limit |
|
|
parallel-short.q |
Up to 24 processors, 3 hour limit |
parallel-short.q |
Up to 24 processors, 3 hour limit |
parallel-medium.q |
Up to 24 processors, 1 week limit |
parallel-medium.q |
Up to 24 processors, 1 week limit |
parallel-long.q |
Up to 24 processors, 4 week limit |
parallel-long.q |
Up to 24 processors, no time limit |
serial-medium.q |
Serial jobs, 1 week limit |
serial-medium.q |
Serial jobs, 1 week limit |
serial-long.q |
Serial jobs, 4 week limit |
serial-long.q |
Serial jobs, no time limit |
test-run.q |
Serial jobs, 3 hour limit |
|
|
Submitting Jobs
How you submit jobs to the queues depends on the software you want to use.
If you are using one of the standard software packages (Gaussian 03, GAMESS-UK, MOLPRO, CASTEP, CPMD, CRYSTAL, DL_POLY, Amber) then you should be able to use the job submission tools designed specifically for that piece of software. For example, to submit Gaussian 03 jobs you would use the g03sub command. A guide to these tools can be found in the EaStCHEM Software Submission Package (SSP) User Guide.
If you are using your own software then you will probably have to write your own job submission script. However, there are tools that will generate a template for you so you do not have to start from scratch (see the EaStCHEM SSP User Guide.) A brief explanation of submission script syntax can be found below. If you have any specific questions, please post them to the forum or contact your RCO.
Using the EaStCHEM RCF Compute Grid
If you have access, you can also submit jobs to other clusters within the EaStCHEM RCF Compute Grid. Information on setting up your account to use the grid and managing grid jobs can be found in the How To setup you account to access the EaStCHEM Grid. As for local job submission, the procedure for submitting jobs depends on the software you are using.
For one of the standard software packages, the same submission tools that are used for local job submission can be used. The guide to these tools can be found in the EaStCHEM SSP User Guide.
For jobs using your own software, you will have to write your own scripts. Tools are available to generate templates. For more information on writing scripts please see the How to write EaStCHEM Grid job submission scripts and for information on generating templates see the EaStCHEM SSP User Guide.
To submit jobs to remote resources you will need to know what queues are available on the system you are submitting the job to. You can use the qlist command in the following way:
> qlist host.domain.com
Where host.domain.com is the hostname of the machine you want to list the queues for. You can also get a list of the standard codes available on a remote system (and the location of the executables) using the clist command thus:
> clist host.domain.com
Submission Script Syntax
This section gives an example of how to generate a submission script template and then goes through the meaning of the script line-by-line. Imagine we used the following command:
me@local> cpmdsub -norun -np 8 -q parallel-short.q h2-wave.inp
(For more information on the cpmdsub command see the EaStCHEM SSP User Guide.) The job submission script produced (in h2-wave.inp.bash) would look like:
#!/bin/bash #$ -cwd -V #$ -q parallel-short.q #$ -N h2-wave.inp_cpm #$ -A rcf #$ -pe mpich 2 #$ -R y #$ -l h_rt=03:00:00 cat $HOME/.mpich/mpich_hosts.$JOB_ID | cut -f 1 -d . | sort | fmt -w 30 sed s/$/:4/ $HOME/.mpich/mpich_hosts.$JOB_ID > $HOME/.mpich/ndfile.$JOB_ID mpirun -np 8 -machinefile $HOME/.mpich/ndfile.$JOB_ID /usr/local/CPMD-3.11.1/BIN/cpmd_mpi.x h2-wave.inp
We will now go through this script to describe what the various lines mean.
The first line
#!/bin/bash
specifies the shell type to use for the script. In our case, this is always bash. This choice affects the format of commands that can be used in the script.
The next set of lines
#$ -cwd -V #$ -q parallel-short.q #$ -N h2-wave.inp_cpm #$ -A rcf #$ -pe mpich 2 #$ -R y #$ -l h_rt=03:00:00
all begin with #$ and specify the options to the batch submission system (SGE in this case):
-cwd changes the working directory to the directory the script was submitted from
-V exports all the current environment variables into the job script
-q parallel-short.q specifies the queue to send the job to
-N h2_wave.inp_cpm gives the job the name h2_wave.inp_cpm
-A rcf tells the job to run in the rcf account
-pe mpich 2 specifies that we want 2 MPICH parallel environment slots for the job (= 2 nodes = 8 processors)
-R y switches resource reservation on. Allows for efficient scheduling of parallel jobs
-l h_rt=03:00:00 specify the resource requests for the job. In this case we are asking for a job length of 3 hours. You can find a list of the resources that must be requested along with a particular queue in the Queue Resource Table
The next two lines
cat $HOME/.mpich/mpich_hosts.$JOB_ID | cut -f 1 -d . | sort | fmt -w 30 sed s/$/:4/ $HOME/.mpich/mpich_hosts.$JOB_ID > $HOME/.mpich/ndfile.$JOB_ID
are required to set up the parallel environment for running the calculation. They basically create a machine file with a list of the nodes used in the calculation so the parallel environment knows where to send the processes.
The final line
mpirun -np 8 -machinefile $HOME/.mpich/ndfile.$JOB_ID /usr/local/CPMD-3.11.1/BIN/cpmd_mpi.x h2-wave.inp
is the line that actually runs the program. As this is a DMP job it uses mpirun to run the CPMD program. You should also notice that it requests 8 processors; uses the machine file containing the list of compute nodes that was created in the previous two lines; specifies the location of the executable and references the input file specified in the cpmdsub command.
You can find more about the SGE syntax in the SGE User Guide and about the syntax for submitting grid jobs in the EaStCHEM SSP User Guide.
Table of SGE resource requests associated with each queue
Queue |
Resources |
hare.epcc.ed.ac.uk |
|
test-run.q |
-l h_rt=03:00:00 |
serial-medium.q |
-l h_rt=168:00:00 |
serial-long.q |
-l h_rt=672:00:00 |
parallel-short.q |
-l h_rt=03:00:00 |
parallel-medium.q |
-l h_rt=168:00:00 |
parallel-long.q |
-l h_rt=672:00:00 |
fat-medium.q |
-l h_rt=168:00:00 -l bigmem=true |
fat-long.q |
-l h_rt=672:00:00 -l bigmem=true |
burke.st-andrews.ac.uk |
|
serial-medium.q |
|
serial-long.q |
|
parallel-short.q |
|
parallel-medium.q |
|
parallel-long.q |
|
fat.q |
-l bigmem=true |



