Eddie and the ECDF
The building blocks of Eddie are relatively standard machines, known as worker nodes, selected for an excellent price:performance ratio and are specified based upon the requirements of researchers. They will run an industry standard Linux based operating system which should make transitioning users tasks onto the systems as painless as possible. There are a number of additional machines for data access and system management which will be discussed later.
In the first phase there will be 128 worker nodes each containing four processing cores. The second phase is expected to be an additional 128 nodes but with 8 cores each. Thus both phases combined will give a total of 1536 cores. The worker nodes are interconnected by gigabit Ethernet networking and a small number shall be additionally equipped with a faster, low latency, interconnect (Infiniband).
The type of task that Eddie is ideally suited to is that of 'trivially or embarrassingly parallel' jobs. An example would be the analysis of several hundred data files that would ordinarily take a single machine many weeks to process. Because each of these files is independent they can be split up and sent to different machines. Another example would be of parameter sweeps in simulations where many different starting values or conditions are required. This independence allows a near linear increase in speed with the number of worker nodes one can apply to the problem.
Eddie can also deal with distributed memory parallel tasks where some inter-node communication is required. The Infiniband equipped portion is especially suited to this role. There will be 58 nodes in this section giving the capability to run up to 232 way parallel jobs, although in general we expect that most such jobs will utilise 128 or fewer cores.
This leads on to the question of what sort of uses Eddie will not be suitable for. Some of these are a result of the fundamental design and others are due to practical and economic constraints. An example of the latter is that each worker node is specified with 2 gigabytes of memory per core: this is 8 gigabytes per worker node for phase one increasing to 16 in phase two. If a computational task requires that some array (a 3-D image for example) of greater than this size must be loaded into memory then the physical limits of the memory will be reached and the system will recourse to using much slower disk space. The result could then be that the task will run orders of magnitude slower than if a large memory machine was used. A more fundamental problem might be that a piece of code has been parallelised but uses a shared memory model where additional processing cores will increase the speed but these cores must be within the same node (or system image to be precise). Such tasks require a more traditional 'supercomputer' to improve performance.
Access to Eddie is via a scheduler and batch system. The batch system is responsible for accepting large numbers of jobs and queueing them. The scheduler decides in which order the jobs will be run. This order is based upon information such as the user's allocation of time on the machine, the length of their jobs and many other variables. Above all it is designed to allow fair access to all users. This mode of working allows for a great deal of automation (and thus time saving) on the part of the user.
Eddie is connected to a large amount of high performance fibre channel disk storage via a number of dedicated machines. These machines then share the data across the cluster using a parallel file system. This means that each node in Eddie can see the same data in the same way. In order to allow this data space to be accessed from outside Eddie (from a user's workstation for example) further machines are provided to export the data via the standard Windows and Unix network file systems. A typical work flow might be that a microscope data acquisition computer writes to this exported disk, the image files are then processed (in parallel) on Eddie and the researcher can view the finished results on their desktop machine.