Basics | Serial Jobs | SMP Jobs | DMP Jobs | Array Jobs
How to write SGE job submission scripts
This how-to will teach you how to write job submission scripts for the SGE batch submission system. It is designed particularly with the EaStCHEM RCF facilities in mind but the information should be useful to anyone wanting to learn to write SGE submission scripts.
More information and advanced tutorials can be found on the SGE wiki.
To use many of the programs on the EaStCHEM RCF you do not need to write the submission scripts yourself. All you need to do is run the appropriate EaStCHEM SSP command. You can even use these commands to generate submission script templates that you can modify for your own use.
The Basics
At its most basic, a SGE submission script can just consist of a set of commands to execute. For example, the following script to show the host name and time would be valid on many systems:
#!/bin/bash # hostname -f date
This script does not set any SGE options. All it does is set the shell (/bin/bash) and run two commands: hostname -f and date. You submit your script (or job) to the SGE queuing system using the qsub command (for a script file called script.bash):
qsub script.bash
Of course, in many cases you will want to include options to the SGE queuing system. These are added to the submission script using lines beginning with #$. For example, to give your submitted job a name we would use the -N option:
#!/bin/bash # #$ -N get_time hostname -f date
This would give this job the name get_time.
A list of the most commonly used options is given in the table below
Option |
Description |
-N name |
Use name as the name for the job. This identifies the job in the queue and sets the name of the files generated by SGE. |
-q queue_name |
Send the job to queue queue_name. |
-A account_name |
Run the job under account account_name. Does not affect charging on the EaStCHEM RCF. |
-l resource_request |
Request the resource specified. The most common example is the time required by the job. e.g. -l h_rt=03:00:00 would request 3 hours of wall clock time. |
-cwd |
Change the working directory for the job to the one that the script was submitted from. |
-V |
Export the current environment variables into the script. |
-pe parallel_environment slots |
Request slots slots of the parallel environment parallel_environment. e.g. -pe mpich 2 would request 2 MPICH slots. |
-t min-max |
Run an array job from index min to index max |
-R y |
Turn on resource reservation for this job. Allows for more efficient scheduling of parallel jos. |
Serial Jobs
Writing a script for a serial job is relatively straightforward as we do not have to worry about parallel execution environments. In this example we will be writing a script (molecule.bash) to run a GAMESS-UK (molecular electronic structure code) job. The input for the code is in a file called molecule.in and the code should be run in the scratch directory due to the large temporary files it generates. The GAMESS-UK serial executable is located in /usr/local/GAMESS-UK-7.0/bin/gamess.
The first step is to write a script to run the job without any SGE options. In this case the script would look like (:molecule1.bash):
#!/bin/bash # # Script to run serial GAMESS-UK job # # Set the current working directory wkdir=`pwd` # Move to the scratch directory cd $scratch # Run the GAMESS-UK job /usr/local/GAMESS-UK/bin/gamess < $wkdir/molecule.in > $wkdir/molecule.out
The next step is to decide which SGE options we need to include. On the EaStCHEM RCF system we need to include a number of mandatory requests: a queue name and resource requests associated with the queue. For more information on the required options see the Job Submission topic. In this case we are going to use the serial-medium.q on hare.epcc.ed.ac.uk. This means we need to request 168 hours of wall clock time and the set the mediumprios resource. We also need to shift to directory that the script (and the input file) are in and export the environment variables (so we have access to the $scratch variable.) The submission script (with the mandatory SGE options) would look like (:molecule2.bash):
#!/bin/bash # # Script to run serial GAMESS-UK job # # SGE submission options #$ -q serial-medium.q # Select the queue #$ -l h_rt=168:00:00 # Set 168 hours of wall clock time #$ -l mediumprios=true # Set mandatory resource request #$ -cwd # Change to current working directory #$ -V # Export environment variables into script # Set the current working directory wkdir=`pwd` # Move to the scratch directory cd $scratch # Run the GAMESS-UK job /usr/local/GAMESS-UK/bin/gamess < $wkdir/molecule.in > $wkdir/molecule.out
Finally, we will give the job a name so that we can easily identify it in the queue and easily access the output files produced by the SGE system. The final, completed SGE submission script looks like (:molecule.bash):
#!/bin/bash # # Script to run serial GAMESS-UK job # # SGE submission options #$ -q serial-medium.q # Select the queue #$ -l h_rt=168:00:00 # Set 168 hours of wall clock time #$ -l mediumprios=true # Set mandatory resource request #$ -cwd # Change to current working directory #$ -V # Export environment variables into script #$ -N molecule_gam # A name for the job # Set the current working directory wkdir=`pwd` # Move to the scratch directory cd $scratch # Run the GAMESS-UK job /usr/local/GAMESS-UK/bin/gamess < $wkdir/molecule.in > $wkdir/molecule.out
Now to submit the job to the SGE system we would simply use the command:
qsub molecule.bash
Shared Memory Parallel (SMP) Jobs
The only difference between writing a submission script for a serial job and writing a submission script for an SMP job is that we now need to specify the SMP parallel environment that we want to run the job under. An example of an SMP job submission script is shown below for a 4-processor Gaussian 03 job (:smp.bash).
#!/bin/bash # # Script to run 4-processor SMP Gaussian 03 job # # SGE submission options #$ -q parallel-medium.q # Select the queue #$ -l h_rt=168:00:00 # Set 168 hours of wall clock time #$ -l mediumpriop=true # Set mandatory resource request #$ -cwd # Change to current working directory #$ -V # Export environment variables into script #$ -N methanol_g03 # A name for the job #$ -pe smp 1 # Select the parallel environment #$ -R y # Switch on resource reservation # Set the scratch directory export GAUSS_SCRDIR=$scratch # Setup the Gaussian 03 environment source $g03root/g03/bsd/g03.bash # Run the Gaussian 03 job g03 methanol
There are a few changes from a serial job. Two obvious changes are: we now ask for a parallel job queue (parallel-medium.q) and we request the corresponding mandatory resource (mediumpriop=true.) We also swithc on resource reservation, this allows SGE to make more sophisticated decisions regarding the scheduling of parallel jobs. The only other change is the requesting of the SMP parallel environment with the line
#$ -pe smp 1
Parallel environments tell SGE which parallel protocol to use to run a multi-processor job. As we have four processors per slot (or node) on the EaStCHEM RCF we should request 1 smp slot (= 1 node = 4 processors.) Once again, the job can be submitted to the queuing system using the 'qsub' command:
qsub smp.bash
Distributed Memory Parallel (DMP) Jobs
DMP job scripts are generally very similar to SMP job scripts. The major differences tend to come when the code is executed rather than in the job specification options. There are a number of DMP parallel environments but we will concentrate on the mpich environment as it is the most commonly used on the EaStCHEM RCF.
An example 8-processor DMP job submission script (for Amber 9) is given below (:dmp.bash). \\ indicates a line continuation.
#!/bin/bash # #$ -cwd -V # Shift directories and export variables #$ -q parallel-short.q # Select the queue #$ -N polyAT_vac_san # Set the name for the job #$ -A rcf # Account to charge to #$ -pe mpich 2 # Set the parallel environment #$ -R y # Switch on resource reservation #$ -l shortpriop=true -l h_rt=03:00:00 # Mandatory resource requests cat /users/aturner/.mpich/mpich_hosts.$JOB_ID | cut -f 1 -d . | sort | fmt -w 30 sed s/$/:4/ /users/aturner/.mpich/mpich_hosts.$JOB_ID > \\ /users/aturner/.mpich/ndfile.$JOB_ID mpirun -np 8 -machinefile /users/aturner/.mpich/ndfile.$JOB_ID \\ /usr/local/amber9/exe/sander.MPI -O -i polyAT_vac.in \\ -o polyAT_vac.out -c polyAT_vac.inpcrd -p polyAT_vac.prmtop \\ -r polyAT_vac.rst
The SGE options are reasonable self-explanatory. On the EaStCHEM resources we have 4-processors per node so we need to request 2 MPICH slots (= 2 nodes = 8 processors.) As for the SMP job, we turn on resource reservation, select a parallel queue and request the appropriate resources.
The next two lines:
cat /users/aturner/.mpich/mpich_hosts.$JOB_ID | cut -f 1 -d . | sort | fmt -w 30 sed s/$/:4/ /users/aturner/.mpich/mpich_hosts.$JOB_ID > \\ /users/aturner/.mpich/ndfile.$JOB_ID
set up a file containing a list of the names of the nodes that the job will be running on and also specifies the number of processors per node. These lines are not needed if you are using a DMP parallel environment that has been tightly integrated into the SGE system (this is not the case on the EaStCHEM RCF.)
The final line:
mpirun -np 8 -machinefile /users/aturner/.mpich/ndfile.$JOB_ID \\ /usr/local/amber9/exe/sander.MPI -O -i polyAT_vac.in \\ -o polyAT_vac.out -c polyAT_vac.inpcrd -p polyAT_vac.prmtop \\ -r polyAT_vac.rst
Uses the mpirun command to run the Amber 9 sander executable (/usr/local/amber9/exe/sander.MPI). As the MPICH parallel environment is not tightly integrated into the SGE system the mpirun call also references the file with the list of machines to run the job on that was created in the previous two lines.
Finally, we can submit the job using:
qsub smp.bash
Array Jobs
Array jobs are a bit different from the three job types that have already been detailed above. They provide a mechanism for submitting a set of related jobs. Basically, every job in the collection is assigned a task ID which can be used in the job script to control the behaviour of that job. The jobs are submitted as a block and controlled by one job ID. This approach has a number of advantages for tasks where you want to repeat a job on a large number of input sets. Rather than submit hundreds or thousands of individual jobs, we can submit a single job that will perform the same task for each input set.
As a more concrete example, consider the following task. We have 100 sets of data in files (data.1 to data.100) on which we want to use the (serial) program analysis_program. Our array submission script would look something like (:array.bash):
#!/bin/bash # # Script to run serial array job # # SGE submission options #$ -q serial-medium.q # Select the queue #$ -l h_rt=168:00:00 # Set 168 hours of wall clock time #$ -l mediumprios=true # Set mandatory resource request #$ -cwd # Change to current working directory #$ -V # Export environment variables into script #$ -t 1-100 # Set the array indices # Run the analysis program $HOME/bin/analysis_program < data.$SGE_TASK_ID > output.$SGE_TASK_ID
The only additional SGE option required (compared to a non-array serial job) is the -t min-max option to specify the range of tasks to perform. We can refer to an individual task using the $SGE_TASK_ID variable which is set and controlled by the SGE job submission system. The SGE system will use as many slots as there are available to run as many tasks concurrently as it can.
The line:
$HOME/bin/analysis_program < data.$SGE_TASK_ID > output.$SGE_TASK_ID
runs the analysis, taking the (standard) input from whichever data.* file corresponds to the current task and sends the (standard) output to output.*. As with all of the submission scripts we would submit the job using:
qsub array.bash
We can also submit array parallel jobs by using the -t and -pe options in combination. In this case the SGE system will try to run as many concurrent array tasks as is possible at once with the specified parallel environment and number of slots.



