Oncology Project
The Oncology Project began in February 2006 and is a collaboration between EPCC and the Colon Cancer Genetics Group (CCGG) of the MRC Human Genetics Unit at the Western General Hospital. The first phase of the project lasted 12 months ending in February 2007 and the second phase began in April 2008.
The aim of the project is to investigate the relationship between genetic markers and colorectal cancer with the ultimate goal being to identify individuals at risk of the disease and take appropriate preventative measures.
The researchers at CCGG have access to a unique and extensive dataset consisting of 565,000 genetic markers with real data from 1000 cancer cases and 1000 matched controls.
The first phase of the project involved porting a FORTRAN serial code which investigates the effects of each genetic marker individually to the BlueGene computer. Initial estimates of the runtime suggested that the code would take around 10,800 days to run on a standard desktop machine. After optimising and parallelising, the code ran in 6.5 hours on 128 BlueGene/L processors. The results have led to a publication in Nature Genetics pdf.
The second phase of the project began in April 2008 and will investigate the interactions between pairs of genetic markers, with the final goal being to obtain a ranked list of the pairs which show the greatest interaction. As each of the 565,000 markers must be tested against all other markers this results in a truly vast problem, requiring over 1.6 billion pairs of markers to be tested and ranked. A calculation of this size is simply not feasible (in terms of either memory or cpu time) to perform on a desktop PC and therefore access to a parallel computer is essential.
We have devised a complex 2-dimensional decomposition strategy and will also parallelise the researcher’s code to enable the code to run on a large number of processors. We will also devise a parallel sorting strategy to produce the final ranked list of marker pairs.
The collaboration takes advantage of EPCC's considerable expertise in parallel and high-performance computing, releasing the CCGG researchers to focus on the algorithm for the analysis of the results.
KostasKavoussanakis gave a presentation on the Oncolocy Activity at the September, 2006 technical workshop.
Florian Scharinger will be giving a presentation at the 8th technical workshop in June 2008.
Articles relating to the project are available in EPCC News, Issue 59 and EPCC News, Issue 60
Publications
See Publications