[ start | index | login ]
start > knowledgebase > rimuhosting > load balancing and failover

load balancing and failover

Created by retep. Last edited by retep, 290 days ago. Viewed 1,754 times. #12
[diff] [history] [edit] [rdf]
labels
attachments

Load Balancing

Load balancing lets you spread load over multiple servers.

You would want to do this if you were maxing out your CPU or disk IO or network capacity on a particular server.

Alternatives to load balancing include 'scaling' vertically. e.g. getting faster disks, a faster CPU or a fatter network pipe.

Failover

Failover != Load Balancing

The goal of failover is to allow work that would normally be done be one server to be done by another server should the regular one fail.

e.g. where you want Server A to respond to all requests unless it has a hardware failure, or someone trips over its network cable, or the data center it is located in burns to the ground. And if Server A cannot respond to requests, then Server B can take over.

For failover server B would ideally be in a separate data center, or if that wasn't possible you would at least want to try and put it on a separate switch to Server A and on a separate power outlet as well.

Implementing Failover

To implement failover you typically need to have your data replicated from one machine to the other. You could do this via rsync+cron for files/directories. And via something like MySQL replication for databases.

One way to trigger the failover is to change the IP address your domain points to. IP address changes can happen within a few minutes of a DNS server update. Though if a client PC is caching an IP then it may take a bit longer.

There are some services (e.g. >>http://zoneedit.com) that operate DNS servers that can detect a failure on a particular IP and automatically update the DNS for you.

>>http://pingability.com (run by the same people that run RimuHosting) also offers a failover server check type (if you are using the RimuHosting name servers).

One issue with failover is falling back to the primary server. Say your main server fails. You fail over to the failover server. Your customers use that server. The files and database on that server are updated. When the main server comes back you would need to ensure that those changes are reflected on the main server. e.g. by rsyncing the files and by exporting/importing the database. You could do replication (of files and databases) in both directions to automate this. But that approach may lead to conflicts. e.g. what if two different people using two different servers tried to change the same thing? Which person's update would take affect?

Implementing Load Balancing

One simple way to implement load balancing is to split services between servers. e.g. running the web server on one server and the database server on another.

This way is easy since there are no data replication issues. e.g. all necessary files are on the web servers, all necessary database data is on the database server.

Another common load balancing option is to have multiple front end servers. To distribute requests to multiple servers you could setup multiple IP addresses for a particular domain. Then clients should get all these addresses and to to a random one. Spreading the load around.

Another way to distribute requests is to have a single virtual IP (VIP) that all clients use. And for the computer on that VIP to forward the request to the real servers.

People can also implement load balancing via http balancers like mod_proxy_balancer in Apache 2.2 and Pound.

Load Balancing Options At RimuHosting

We do not recommend you load balance on VPS'. VPS owners share their physical hardware with other users. If you are maxing out the CPU or disk IO or network then you are probably using too much of the host server's resource which would not be fair to other users.

If you are load balancing then you would probably need to have your own hardware (i.e. be on a dedicated server).

If you have multiple dedicated servers in one data center then we can setup one of them with LVS (see >>http://www.linuxvirtualserver.org/) to distribute requests. Or you could setup round robin dns.

Failover Options At RimuHosting

For failover we can help setup cron jobs to run rsync to replicate your file systemas. And we can setup mysql replication to replicate your database. And you could use an automated service like >>http://zoneedit.com or >>http://pingability.com to 'fail over' the IP. Or do it manually yourself in the event of failure.

If you want to implement failover we recommend the failover server be in a separate data center. e.g. have you primary server as a dedicated server or VPS in Dallas and the failover server be a VPS or dedicated server in, say, NY.

What About IP Takeover/Heartbeat?

A popular failover technique is IP failover. This is where a 'heartbeat' process runs on your servers. And in the event one server fails to see the heartbeat of another server it takes over an IP on that server (i.e. makes that IP route to itself, rather than to the other server that is not sending out heartbeats).

Typically IP takeover is implemented when two servers are connected on the same switch and are running on the same subnet.

RimuHosting does not really support IP Takeovers. There are a few reasons for this:

  • IP takeover makes it a bit tricker for us to troubleshoot problems and assist customers (e.g. which physical server are we actually going to?)
  • It requires servers to be connected to the same switch. This may not always be the case. And if it were we may need to move servers between cabinets/switches. And we do not have procedures/tools that would give us a heads up that this would impact on an ip takeover setup.
  • For some servers (e.g. in the non-Dallas data centers we use) we do not control the switch. And the ip takeover would either not work. Or if it worked the data center staff at some point may disable it.
  • DNS failover is a good (better? more fault tolerable?) solution/alternative.
RimuHosting is a company that hates giving 'no' as an answer. So if a customer _really_ needs to implement IP takeover then just let us know and we can work something out. e.g. a custom setup for you using your own private switch.

What About 'Shared Storage'?

We have had a few people ask about clustered or shared file systems.

These can be tricky.

In an ideal world you would have a file system that any server could read/write to and where that filesystem was located on disks spread across multiple servers and where any of those servers or disks could fail and that would not affect the file system's availability.

In the real world to do this you need a clustered file system. That is a file system that knows it needs to co-ordinate any disk access between other servers in the cluster. To do that you need to have monitoring software (to check when a device goes down) as well as locking software (e.g. DLMs) that ensure that no two servers are writing to the same place, or that one server is not reading something that another server is in the middle of writing.

In the real world there are setups like the RedHat clustered file system. e.g. see >>http://www.linuxtopia.org/online_books/rhel5/rhel5_clustering_guide/rhel5_cluster_s1-ha-components-CSO.html These systems enable you to run clustered file systems. Although the setup can be 'somewhat' (i.e. really) complex. And it can also require specialized hardware to work. And often the system ends up being deployed on a single shared file system (e.g. a SAN) with no failover capability in the even the SAN fails (which SAN vendors will tell you will never happen, but which can actually happen, e.g. due to electricity providers or meteorities)

Often for many applications the 'simple' solution is for one server to export an NFS share. Other servers can use that. And you would setup frequent rsync backups between that server and another. And in the event of a failure export the NFS share from the other server instead.

Please login to post a comment.
Powered by snipsnap.org Found a mistake in a howto? Let us know via an email to p.blikibugs at rimuhosting com.