http://www.eastchem.ac.uk http://www.eastchem.ac.uk/rcf http://www.st-andrews.ac.uk http://www.ed.ac.uk


Support Pages | How To


Basics

How to write EaStCHEM Grid job submission scripts

This how-to will teach you how to write job submission scripts for EaStCHEM Grid jobs. It assumes that you are familiar with writing job submission scripts for the Sun Grid Engine (although you do not strictly need to know this for Grid jobs.) You can learn about SGE submission scripts in the How To write SGE submission scripts

The EaStCHEM Software Submission Package (SSP) is capable of generating valid EaStCHEM Grid submission scripts for standard computational chemistry software. This How To will show you how to build up the Grid scripts from scratch so a full understanding of the structure can be gained. We will start with writing our own script for a very simple grid job before seeing how to use the EaStCHEM SSP to create a Grid job and finally look at writing a Grid job script for an piece of software not covered by the SSP.

The Basics

To understand how to write a script you need an idea of how a Grid job works. At its most basic level a Grid job consists of three steps:

  1. Transfer input to remote resource
  2. Run job on remote resource
  3. Retrieve output from remote resource

Grid submission scripts give us a way of specifying these three steps. A Grid submission script looks very much like a normal SGE submission script with a few additional options to specify files to transfer and the remote resource. Grid option lines begin with #%. The table below lists the options that can be selected.

Option

Description

-user username

The username on the remote resource that you are submitting the job to.

-host hostname

The hostname of the remote resource to submit the job to.

-input file #1 file #2 ...

Input files to transfer to remote resource.

-monitor file

Output file which you would like to monitor during job.

At the moment, the Grid submission software retrieves all the files found in the job directory. Future versions will allow you to specify the files to retrieve.

A Simple Grid Job

Our first, very simple, job submission script will look like this (globus_test.bash)

#!/bin/bash
#
# SGE Options (for SGE on remote resource)
#$ -cwd -V
#$ -N Globus_Test
#$ -q serial-medium.q

# Grid options
#% -host burke.st-and.ac.uk
#% -input input.txt

/bin/cat input.txt > output.txt

This job simply copies a file called input.txt to burke.st-and.ac.uk, and copies its contents to output.txt.

We can convert our submission file into an actual grid job using the gtsub commnad.

> gtsub globus_test.bash

********************************************************************************
 Creating Globus job.
 Submission script will be in submit_1192616924.bash.
********************************************************************************

Processing script file...
  Remote host: burke.st-and.ac.uk
  Input files: input.txt
...finished. Remote script in rm_globus_test.bash
Creating RSL job descriptions...
  Written 1192616924_mkdir.rsl   <= Create remote directory
  Written 1192616924_unzip.rsl   <= Unzip input files
  Written 1192616924_qsub.rsl    <= Submit job to remote batch system
  Written 1192616924_zip.rsl     <= Zip output
  Written 1192616924_rm.rsl      <= Remove remote directory
...finished
  Using Web Services and XML RSL
Write Globus scripts...
  Written submit_1192616924.bash     <= Submit remote job via Globus
  Written retrieve_1192616924.bash   <= Retrieve remote job output via Globus
  Written tidy_1192616924.bash       <= Tidy up after remote job via Globus
...finished

********************************************************************************
 Globus job created.
 Submit job with submit_1192616924.bash
 Check status with gtstat burke.st-and.ac.uk
 Once job has finished run retrieve_1192616924.bash to retrieve output files.
 Use tidy_1192616924.bash to clean up after remote job.
********************************************************************************

If you read the output from the gtsub command you will see that it has created three executable scripts: submit_1192616924.bash, retrieve_1192616924.bash and tidy_1192616924.bash. the submit script is to actually send the input to burke.st-and.ac.uk and run the job. Once the job has finished, we use the retrieve script to get our output and the tidy script to tidy up.

Lets submit the remote job:

> ./submit_1192616924.bash
  adding: input.txt (stored 0%)
  adding: rm_test.bash (deflated 16%)
  1192616924_mkdir.rsl     <= Make remote directory
  1192616924_unzip.rsl     <= Transfer input files
  1192616924_qsub.rsl      <= Submit remote job

You can see that the submission consisted of three steps; making a directory for the job to live in, transferring the input files and submitting the job to SGE on the remote site.

How do we know if our job has finished or not? We can check the queues on the remote site using gtstat.

> gtstat burke.st-and.ac.uk

Once our job is no longer on the queue, we know it has finished (or crashed!) and we can get the output back.

> ./retrieve_1192616924.bash
 extracting: 1192616924_in.zip       
 extracting: Globus_test.o1201       
 extracting: Globus_test.e1201       
 extracting: output.txt
  1192616924_zip.rsl     <= Get output files

Again, you can see the steps that make up a retrieval: compressing the output and transferring it back and deleting the directory on the remote host. The tidy script removes the remote directory and tidies up the Grid job files that we had lying around.

> ./tidy_1189519278.bash
  1192616924_rm.rsl      <= Remove remote directory
  Removing 1192616924_in.zip
  Removing 1192616924_out.zip
  Removing 1192616924_mkdir.rsl
  Removing 1192616924_qsub.rsl
  Removing 1192616924_rm.rsl
  Removing 1192616924_unzip.rsl
  Removing 1192616924_zip.rsl
  Removing retrieve_1192616924.bash
  Removing submit_1192616924.bash
  Removing tidy_1192616924.bash

Finally, we can check our output

> cat output.txt
If this text appears in output.txt then your Globus Toolkit
test was successful.

Submitting Real Jobs

Submitting real jobs within the EaStCHEM Grid is a relatively simple process, particularly if you are wanting to use one of the computational chemistry programs covered by the EaStCHEM Software Submission Package (SSP.) In this section we will look first at a Grid job created using the EaStCHEM SSP and then at how you would go about creating a Grid submission script for your own job.

GAMESS-UK Grid Job using EaStCHEM SSP

In this example we are going to create a 4-processor GAMESS-UK job where the input is in Upload new attachment "morphine.in".

The first question when using the EaStCHEM Grid is: which hosts can I run my job on? The EaStCHEM SSP includes a way of searching the Grid to find out which hosts are running the software you want to use. In this case we want to run a GAMESS-UK job so we use the -l option of gamesssub to list the hosts on which GAMESS-UK is available.

> gamesssub -l
*******************************************************************************
 EaStCHEM Grid: Software search for GAMESS-UK
*******************************************************************************
 == alchemy.epcc.ed.ac.uk (local) ==

   GAMESS-UK:                    
     Type    np Min. np Max. Location                                        
     serial  1       1       /usr/local/GAMESS-UK-7.0/bin/gamess             
     mpich   1       24      /usr/local/GAMESS-UK-7.0/bin/gamess-uk          

 == burke.st-and.ac.uk ==      

   GAMESS-UK:                    
     Type    np Min. np Max. Location                                        
     serial  1       1       /usr/local/GAMESS-UK-7.0/bin/gamess             
     mpich   1       64      /usr/local/GAMESS-UK-7.0/bin/gamess-uk          

*******************************************************************************

You can see that GAMESS-UK is available on two hosts: alchemy.epcc.ed.ac.uk (the local host) and burke.st-and.ac.uk. In addition, the output reveals that both hosts have both the serial (runs on one processor) and parallel (runs on more than one processor) versions of the code. As I am working from alchemy.epcc.ed.ac.uk I am going to submit this job to burke.st-and.ac.uk.

Now we need to find out which queue on the remote system we want to submit the job to. We can list the queues available on all the Grid hosts using the qlist command.

> qlist
*******************************************************************************
 EaStCHEM Grid: Batch system summary for all hosts
*******************************************************************************

 == alchemy.epcc.ed.ac.uk (localhost) ==

   test-run.q           A queue to test input. Serial only. Time limit of 2 hours
   serial.q             Serial job queue.                                 
   parallel.q           Queue for parallel jobs.                          
   fat.q                Queue for the 8 processor SMP nodes..             

 == burke.st-and.ac.uk ==

   serial-medium.q      Serial job queue. Time limit of 1 week.           
   serial-long.q        Serial job queue. No time limit.                  
   parallel-short.q     Queue for short parallel jobs. Time limit of 3 hours.
   parallel-medium.q    Queue for parallel jobs. Time limit of 1 week.    
   parallel-long.q      Queue for parallel jobs. No time limit            
   fat.q                Queue for the 8 processor SMP node. No time limit.

 == grid.ecdf.ed.ac.uk ==

   ecdf                 The only queue on the system                      
*******************************************************************************

As this is just a short parallel job, we will use the parallel-short.q queue on burke.st-and.ac.uk.

OK, now we have collected all the information we need we can use the gamesssub command to create the Grid job. We specify the number of processors using the -np option, the host using the -h option, and the queue using the -q option.

> gamesssub -np 4 -h burke.st-and.ac.uk -q parallel-short.q morphine
Creating GAMESS-UK job:
  Remote job for aturner@burke.st-and.ac.uk
  Using 4 distributed memory processors (4 each on 1 node[s])
  Charging to account: rcf
  Job File: /users/aturner/dev/eastchem-sub/test/gamess/eastchem/morphine.bash

********************************************************************************
 Creating Globus job.
 Submission script will be in submit_1192623545.bash.
********************************************************************************

Processing script file...
  Remote user: aturner
  Remote host: burke.st-and.ac.uk
  Input files:  morphine.in
  Monitor file: morphine.out
...finished. Remote script in rm_morphine.bash
Creating RSL job descriptions...
  Written 1192623545_mkdir.rsl   <= Create remote directory
  Written 1192623545_unzip.rsl   <= Unzip input files
  Written 1192623545_qsub.rsl    <= Submit job to remote batch system
  Written 1192623545_zip.rsl     <= Zip output
  Written 1192623545_rm.rsl      <= Remove remote directory
...finished
  Using Web Services and XML RSL
Write Globus scripts...
  Written submit_1192623545.bash     <= Submit remote job via Globus
  Written monitor_1192623545.bash    <= Monitor remote job output via Globus
  Written retrieve_1192623545.bash   <= Retrieve remote job output via Globus
  Written tidy_1192623545.bash       <= Tidy up after remote job via Globus
...finished

********************************************************************************
 Globus job created.
 Submit job with submit_1192623545.bash
 Check status with gtstat burke.st-and.ac.uk
 Monitor output with monitor_1192623545.bash
 Once job has finished run retrieve_1192623545.bash to retrieve output files.
 Use tidy_1192623545.bash to clean up after remote job.
********************************************************************************

This has created all the Grid job scripts as we did for the simple example above. Have a look at the morphine.bash file to see the script it has created.

#!/bin/bash
#
# Script created by EaStCHEM Submission Package
#
#
# Settings for Sun Grid Engine
#$ -cwd -V
#$ -q parallel-short.q
#$ -N morphine_GAMESS-UK
#$ -A rcf

# Parallel settings for Sun Grid Engine
#$ -pe mpich 1

# Resources for software
#$ -l InfiniPath

# Resources for queue
#$ -l h_rt=02:00:00

# Options for remote job submission
#% -user aturner
#% -host burke.st-and.ac.uk
#% -input  morphine.in
#% -monitor morphine.out

# Set the job name
jobname=morphine

# Pre-execution commands for parallel environment

cat $HOME/.mpich/mpich_hosts.$JOB_ID | cut -f -d . | sort | fmt -w 30
sed s/$/:4/ $HOME/.mpich/mpich_hosts.$JOB_ID > $HOME/.mpich/ndfile.$JOB_ID
      

# Pre-execution commands for software

cwd=`pwd`
cd /tmp
      

# Execute the job
mpirun -machinefile $HOME/.mpich/ndfile.$JOB_ID /usr/local/GAMESS-UK-7.0/bin/gamess-uk < $cwd/$jobname.in > $cwd/$jobname.out

# Post-execution commands for software

if [ -e ftn058 ]; then
  mv ftn058 $cwd/$jobname.pun
fi
if [ -e $jobname.ed3 ]; then
  mv $jobname.ed3 $cwd/$jobname.ed3
fi

You can see the options are similar to those that we used for our simple job but that the gamesssub command has automatically added all the extra commands and options that are needed to run the job.

We can now use the submit script to run the job, the monitor script to check the output as the job progresses, the retrieve script to get the output once the job has finished and the tidy script to clean up after the grid job. As above, we can use gtstat to monitor the queues on the remote host.

Grid Job for Arbitrary Software

Using the two examples as a starting point above you should be able to write your own Grid job submission script. To illustrate some other ways of using the Grid submission process we are going to write our own, more complex job. For this job we are going to have our own program (written in Fortran 90) that does not exist on the remote host. This means that our job will now consist of the following stages:

  1. Transfer over program source and input files
  2. Compile our source code
  3. Run our newly compiled program using the input files
  4. Get the output back from the remote host

In this case the Fortran 90 program simply reads some numbers from an input file and calculates the sum of them. The source file (Upload new attachment "sum.f90") looks like (do not worry if you are not familiar with Fortran, it is not important.)

program sum
  implicit none

  ! Variable declarations
  integer :: n, i
  real :: a, tot

  ! Read the number of values to sum
  read(5,*) n
  ! Loop over values
  do i = 1, n
    ! Read this value
    read(5,*) a
    ! Sum it into the total
    tot = tot + a
  end do

  ! Write out the answer
  write(6,'(e13.5)') tot

  ! Finish nicely
  stop 0

end program sum

As before, I am going ot run the job on burke.st-and.ac.uk. This time I will use the serial-medium.q queue as this will not be a parallel job. The only additional piece of information I require is the name and location of the Fortran compiler on burke. We can get this by searching the Grid information system as we did for the software and queues in the previous example.

> querygrid -h burke.st-and.ac.uk
*******************************************************************************
 EaStCHEM Grid: Info for burke.st-and.ac.uk
*******************************************************************************

 System:
  Location: St. Andrews                                                 
  Processors: 64        
  Home directories: /users                                            

 Local batch queue system:
  Sun Grid Engine SGE batch submission system                                 

 Grid Middleware:
  Globus Toolkit  The Globus Toolkit Grid middleware.                         

 Compilers:
   Intel Fortran Compiler    The Intel Fortran Compiler 
                   Location: /opt/intel/fce/10.0.023/bin/ifort  

 Software:
   Name                ID              Description                              
   EaStCHEM Submission EaStCHEM-Sub    The EaStCHEM submission package.     
   GAMESS-UK           GAMESS-UK       General purpose molecular electronic
                                       structure code.
   Gaussian 03         G03             General purpose molecular electronic
                                       structure code
   Castep              Castep          Periodic electronic structure code   
   CPMD                CPMD            Periodic electronic structure code   
   CRYSTAL06           CRYSTAL         Localized gaussian periodic
                                       electronic structure code.
   DL_POLY 2           DLPOLY2         Classical molecular dynamics code.   
   MOLPRO              MOLPRO          General purpose molecular electronic
                                       structure code.
   Amber               Amber           Classical simulation code for
                                       biological systems.

*******************************************************************************

If we look under the compilers section we can see that the Fortran compiler is located at /opt/intel/fce/10.0.023/bin/ifort. Now we can write our submission script (Upload new attachment "sum.bash").

#!/bin/bash
#
# SGE Options
#$ -q serial-medium.q
#$ -N CompileAndRun

# Grid Optioins
#% -host burke.st-and.ac.uk
#% -input sum.f90 numbers.in

# Compile the program
/opt/intel/fce/10.0.023/bin/ifort -o sum.x sum.f90

# Run the program
sum.x < numbers.in > sum.out

Next we can produce the grid job scripts using the gtsub command as we saw before.

> gtsub sum.bash
********************************************************************************
 Creating Globus job.
 Submission script will be in submit_1192629576.bash.
********************************************************************************

Processing script file...
  Remote host: burke.st-and.ac.uk
  Input files: sum.f90 numbers.in
...finished. Remote script in rm_sum.bash
Creating RSL job descriptions...
  Written 1192629576_mkdir.rsl   <= Create remote directory
  Written 1192629576_unzip.rsl   <= Unzip input files
  Written 1192629576_qsub.rsl    <= Submit job to remote batch system
  Written 1192629576_zip.rsl     <= Zip output
  Written 1192629576_rm.rsl      <= Remove remote directory
...finished
  Using Web Services and XML RSL
Write Globus scripts...
  Written submit_1192629576.bash     <= Submit remote job via Globus
  Written retrieve_1192629576.bash   <= Retrieve remote job output via Globus
  Written tidy_1192629576.bash       <= Tidy up after remote job via Globus
...finished

********************************************************************************
 Globus job created.
 Submit job with submit_1192629576.bash
 Check status with gtstat burke.st-and.ac.uk
 Once job has finished run retrieve_1192629576.bash to retrieve output files.
 Use tidy_1192629576.bash to clean up after remote job.
********************************************************************************

Now we can actually submit the job

> ./submit_1192629576.bash
  adding: sum.f90 (deflated 39%)
  adding: numbers.in (stored 0%)
  adding: rm_sum.bash (deflated 26%)
  1192629576_mkdir.rsl     <= Make remote directory
  1192629576_unzip.rsl     <= Transfer input files
  1192629576_qsub.rsl      <= Submit remote job

Once we know it has finished, we can get the results

> ./retreive

and check the results

> cat sum.out

and finally tidy up the job

> ./tidy

Now you should be able to write your own Grid submission scripts. If you have any problems or questions do not hesitate to contact your RCO.

Under Construction

ComputationalChemistryActivity/SupportPages/GridScript (last edited 2007-10-17 14:07:58 by AndrewTurner)