2.1. A very, very brief introduction to clustering

Most of the time, your computer is bored. Start a program like xload or top that monitors your system use, and you will probably find that your processor load is not even hitting the 1.0 mark. If you have two or more computers, chances are that at any given time, at least one of them is doing nothing. Unfortunately, when you really do need CPU power - during a C++ compile, or coding Ogg Vobis music files - you need a lot of it at once. The idea behind clustering is to spread these loads among all available computers, using the resources that are free on other machines.

The basic unit of a cluster is a single computer, also called a "node". Clusters can grow in size - they "scale" - by adding more machines. A cluster as a whole will be more powerful the faster the individual computers and the faster their connection speeds are. In addition, the operating system of the cluster must make the best use of the available hardware in response to changing conditions. This becomes more of a challenge if the cluster is composed of different hardware types (a "heterogenous" cluster), if the configuration of the cluster changes unpredictably (machines joining and leaving the cluster), and the loads cannot be predicted ahead of time.

2.1.1. A very, very brief introduction to clustering

2.1.1.1. HPC vs Failover vs Loadbalancing

Basically there are 3 types of clusters, the most deployed ones are probably the Failover Cluster and the Loadbalancing Cluster, HIGH Performance Computing.

Failover Clusters consist of 2 or more network connected computers with a separate heartbeat connection between the 2 hosts. The Heartbeat connection between the 2 machines is being used to monitor wether all the services are still in use, as soon as a service on one machine breaks down the other machine tries to take over.

With loadbalancing clusters the concept is that when a request for say a webserver comes in, the cluster checks wich machine is the lease busy and then sends the request to that machine. Actually most of the times a Loadbalancing cluster is also Failover cluster but with the extra load balancing functionality and often with more nodes.

The last variation of clustering is the High Performance Computing Cluster, this machine is being configured specially to give data centers that require extreme performance the performance they need. Beowulfs have been developed especially to give research facilities the computing speed they need. These kind of clusters also have some loadbalancing features, they try to spread different processes to more machines in order to gain perfomance. But what it mainly comes down to in this situation is that a process is being parralellised and that routines that can be ran separately will be spread on different machines in stead of having to wait till they get done one after another.