The information below relates to the initial phase of the edikt project, which ended in May 2005. Information on the current phase is available via the edikt portal.


e dikt ::BioDAS

Eldas Bioinformatics


Background

ScotGrid is a compute and data cluster based in Glasgow, Edinburgh, and Durham. Glasgow’s ScotGrid site consists of a large Linux farm, while Edinburgh hosts a big disk cluster. Most ScotGrid systems run Linux RedHat 7.3 (complying with European Data Grid technology requirements).

ScotGrid is currently used by, among others, bioinformatics researchers at Glasgow to run BLAST jobs. BLAST is the primary sequence matching application used by biologists to analyse a protein or gene sequence. Running BLAST jobs is currently a manual process, involving a direct SSH connection to the ScotGrid front end, manual preparation of the large data files into smaller chunks for each job, and manual submission of these jobs to the ScotGrid compute back end. Computer systems often sit idle waiting for users to prepare their data and initiate jobs.


Project Goals

1. Provide bio-informaticians with a simple, science-oriented interface for submitting BLAST jobs to a Grid.
2. Develop generic software supporting the required job and data management services.

The BioDAS project automates the process of submitting BLAST jobs to ScotGrid. Edikt’s Eldas technology is used to retrieve protein and gene sequence data as input to auto-scheduled BLAST jobs.


Who is involved?

Edikt is working with BRIDGES , a joint research project, to support the Cardiovascular Functional Genomics (CFG) project funded by the Wellcome Trust. The BRIDGES project is funded to work with CFG and provide analysis and data management tools to facilitate bioinformatics research. Edikt is working with BRIDGES to develop these tools and services using Edikt’s Eldas technology.


Project Results

The BioDAS software will enable automated scheduling and data retrieval allowing CFG biologists to request a set of BLAST executions in a single simple request. Computer systems will be better utilised and biologists will expand their research efforts, investigating many more sequences in a short time. By developing generic software solution, the project will also create reusable components to support data management services coordinated with job scheduling services to other application domains, such as particle physics and the geosciences.