Sunday, January 23, 2011

Computer Clustering

A computer cluster is a group of linked computers, working together closely thus in many respects forming a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.

Cluster categorizations

High-availability (HA) clusters

(also known as Failover Clusters) are implemented primarily for the purpose of improving the availability of services that the cluster provides. They operate by having redundant nodes, which are then used to provide service when system components fail. The most common size for an HA cluster is two nodes, which is the minimum requirement to provide redundancy. HA cluster implementations attempt to use redundancy of cluster components to eliminate single points of failure.

There are commercial implementations of High-Availability clusters for many operating systems. The Linux-HA project is one commonly used free software HA package for the Linux operating system. The LanderCluster from Lander Software can run on Windows, Linux, and UNIX platforms.

Load-balancing clusters

Load-balancing is when multiple computers are linked together to share computational workload or function as a single virtual computer. Logically, from the user side, they are multiple machines, but function as a single virtual machine. Requests initiated from the user are managed by, and distributed among, all the standalone computers to form a cluster. This results in balanced computational work among different machines, improving the performance of the cluster systems.

Compute clusters

Often clusters are used primarily for computational purposes, rather than handling IO-oriented operations such as web service or databases. For instance, a cluster might support computational simulations of weather or vehicle crashes. The primary distinction within computer clusters is how tightly-coupled the individual nodes are. For instance, a single computer job may require frequent communication among nodes - this implies that the cluster shares a dedicated network, is densely located, and probably has homogenous nodes. This cluster design is usually referred to as Beowulf Cluster. The other extreme is where a computer job uses one or few nodes, and needs little or no inter-node communication. This latter category is sometimes called "Grid" computing. Tightly-coupled compute clusters are designed for work that might traditionally have been called "supercomputing". Middleware such as MPI (Message Passing Interface) or PVM (Parallel Virtual Machine)permits compute clustering programs to be portable to a wide variety of clusters.

OS vs Application Server Clustering

In general, operating-system level clustering (aka hardware clustering) is designed to manage hardware and os-level failures. These typically work by starting a backup server when a primary fails in such a way that it fully assumes the role of the primary. Failover generally involves re-assigning the failed server IP-Address to the backup (IP-takeover), re-permissioning file system access to the backup (if using a shared file system instead of replication) , and then running a script that you setup yourself to startup all your applications. This technology is older, takes more time to perform a failover, and is less able to fully utilize all of your hardware resources. Application Server Clustering, or, more generally, software clustering, is far more capable and dynamic. First of all, the backup server is usually in at least a warm-standby mode and hopefully hot, meaning that it can immediately assume the primary's responsibilities with very little delay. Second, advanced software-level clustering also supports load-balancing, so you never have "backup" hardware sitting idle. Instead of re-assigning IP addresses, applications that connect to your clustered environment must already be designed to check for service failure/availability on more than one destination host. Alternatively, you can use some kind of load balancer/traffic router that exposes the cluster as a single IP address. By focusing on the availability of the application service, any lower-level problems in the OS or hardware are automatically covered as well (assuming, of course, that they cause the software to crash). A final difference is the dynamic nature of newer software clustering techniques. You can generally add or lose capacity on-the-fly with little or no visible impact to dependent applications. Hardware-level clustering is quite difficult to correctly setup and modify and requires fairly painful and regular testing.

The ultimate computer cluster would be a group of computers working together to seamlessly provide services and appear as one computer. This cluster would make its services available to applications transparently. This means that an application written to run on a single computer, would see no difference in running on the cluster. In addition, it would get all the benefits of being on the cluster without impacting its operation at all – scalability, high availability, memory management, etc. To my knowledge, this type of cluster does not exist.

Hardware and OS level clustering allows multiple computers to work together for a purpose, but the computers generally don’t appear to applications as 1 big computer. When the OS makes this possible, special programming is usually required to make an application take advantage of the clustered OS services. Most of the time, OS clusters are geared toward providing high-availability (HA) for fault tolerance rather than seamless scalability of applications running on the cluster (HPC – high performance computing). Windows 2003 server clustering and SQL Server clusters are examples of clusters that are geared toward HA rather than HPC.

Sources:
http://en.wikipedia.org/wiki/Computer_cluster
http://www.theserverside.com/discussions/thread.tss?thread_id=41734
http://tompierce.blogspot.com/2008/05/application-vs-os-clustering.html

No comments:

Post a Comment

Feel free to leave your comments and suggestions.