May 022011

Clustering is a term you will find in a lot of marketing literature, both for hardware configured as a “cluster”, and as software that enables management and use of a “cluster”. What can be misleading is that there are two distinctly different types of clusters commonly used, performing very different functions. Both types of clustering are useful for scientific computing.

High performance computing clusters

Scientific computing commonly uses high performance computing clusters, or HPC clusters. Several (perhaps very many) computers are tied together in a common management framework, usually with a shared file system, and usually with some sort of high-speed internode network. Programs run on the various computers, or nodes. In some cases, each node has a program working on a separate problem. In other cases, a single program spreads over multiple nodes, sending messages to each other through the course of the computation. The shared file system needs to support sustained high-speed I/O from several compute nodes, often on the same file. Specialized networked parallel file systems are commonly used, with dedicated storage nodes managing the file system itself.

High performance computing cluster




High availablity clusters

For other services, it is common to find high availability clusters, or HA clusters. Again, several computers are tied together into a common management framework. There may be a shared file system. Programs supporting various services (such as web or database services) run on the various nodes, but usually a particular service only runs on a single node at a time. (Exceptions are “load-balanced” services that run on multiple nodes, with requests routed to prevent any one node from being overloaded.) If a node should fail, then other nodes will start the services that were running on the failed node. Once the node is restored, services may migrate back to it. HA clusters are typically composed of only a handful of nodes, and the shared file system is primarily used for backing storage for the managed services.

High availability cluster




Both are needed

A large scientific computation facility is going to need a HPC clusters for technical computations. HA cluster solutions are not well suited for building HPC clusters. But large scientific computation facilities need robust infrastructures to keep the HPC clusters running well. To run this infrastructure well, it is worth considering the use of HA clusters. In the Linux world, two common cluster frameworks are the Red Hat Cluster Suite and Pacemaker.

Sorry, the comment form is closed at this time.