Managing Digital Biological Information
Computers are used in all aspects of modern biological research. The key most valuable resource we have as researchers, is our data, stored on computer systems. The funding bodies now insist that digital data is securely stored for a minimum of 5 years, which means that, in practice, data from a particular project should be stored for a minimum of 10 years from the beginning of the project. As in other branches of human activity, data volumes are growing exponentially. This presents biological researchers with two major challenges. First, the rapidly growing volumes of data must be stored securely, backed up against disaster and available for retrieval. Secondly, of particular relevance to biological data, the huge number of files and their associated descriptive data (known as metadata) must be organised and annotated so that they can be identified and interrogated. The first problem can be solved by purchasing scalable enterprise storage solutions for hierarchical storage management of data, and will be the subject of other applications. This project is to tackle the second problem by building upon pre-existing open source software solutions (see below), to develop and implement a user-friendly system to manage and annotate biological data in a flexible and time efficient manner. We will start with a specific application to microscope image data, based on our promising preliminary trials of OME at theWTCCB (in collaboration with Jason Swedlow and his team in Dundee). We will then extend these tools to other digital data acquired in all branches of modern biology, such as systems biology, computational biology, mass spectroscopy, genomics, proteomics and molecular biology.
As an example of the amount of data generated; we have applied for a new generation of ultra high resolution and speed microscope and we predict that in the grant’s 5 year lifespan 100Terabytes (100,000Gigabytes) of multidimensional microscope data will be generated. A typical researcher in the Davis lab, currently leaves the lab after 3-4 years of work having generated 35,000 microscope image files.
Open Source Software tools to be implemented and customized:
OME: http://www.openmicroscopy.org/ An open source software project to develop a database-driven system for the quantitative analysis and collation of biological images. OME is a collaborative effort among academic labs and a number of commercial entities. OME programmers have developed standardized file formats for the exchange of image data and database schema for simplifying the automated analysis of images.
caLIMS: http://calims.nci.nih.gov/ NIH has developed a general open source Laboratory Information Management (LIMS) system and is freely available for download including the source codes for customization. Note that is Oracle based but could be adapted for another backend.
BASE: http://base.thep.lu.se/ A free, comprehensive, web-based database solution for collation and annotation of the massive amounts of data generated by microarray analysis.
Links
Davis lab Davis Lab
Wellcome Trust Centre for Cell Biology http://www.wcb.ed.ac.uk
Centre Optical Instrumentation Laboratory (COIL) http://www.wcb.ed.ac.uk/COIL/
School of Biology http://www.biology.ed.ac.uk/
People
RussellHamilton (edikt2 Data Manager, Systems Develper)
IlanDavis (PI, WTCCB)
Publications
See Publications
