Cloud Hosting, Drupal, and You

In the last few years, the term “cloud computing” has become the ubiquitous new buzz-word to toss around when talking about flexible, scalable hosting solutions. Unfortunately, as often happens when a new concept is introduced to a large market, it is often misunderstood; the benefits and drawbacks are misrepresented (intentionally or not).

A recent schematic of 'The Cloud' The original concept of cloud computing was a utility-like service that would provide a pool of shared resources for clients to use on-demand – thereby allowing customers to leverage a large pool of resources without having to bear the burden of the capital cost of building or maintaining the entire infrastructure themselves. In the web hosting world what we refer to as “cloud computing” sometimes aligns with this original concept, sometimes not so much, depending on the service in question.

In this post, we hope to remove some of the mystery around cloud computing as it relates to web hosting services; pull back the curtain a bit and explain what is actually happening inside that misty and illusory “cloud”.

Virtualization and The Cloud
All cloud products that are offered as a web hosting service have one thing in common: they are virtualized. “Virtualized” means that your “server” (whether called an “instance”, “slice” or something else by your service provider) is actually a virtualized operating system (kernel) that is referred to as a “virtual machine”, or VM. The VM is running inside a “container” system, which is in charge of managing the resources of multiple VMs running alongside yours. There are several products that do this, some of which you may be familiar with – some of the more well known products out there are VmWare, VirtualBox, and XEN.

Although all cloud products are virtualized, not all virtualized hosting services are cloud services. What distinguishes the two is:

the ability to scale dynamically
adding or removing resources can be controlled by the customer through an API
have the utilized resources billed out by some small increment, such as by the hour or day (although some providers do have larger billable increments, such as week or months)

This flexibility is generally considered to be the largest advantage of a cloud product over your more standard virtualized private server, it allows you to dial up and down resources, on-demand and as-needed, without having to commit to a lot of hardware that comes with a bunch of unused overhead during non-peak hours.

The team said my post needed more pictures, so, eh, here is a guy 'Virtualizing' Once you are on a virtualized platform, your “server” becomes platform agnostic. It is now a VM. This means that it will run anywhere the “container” will run. It doesn’t matter what the hardware is. This means that (compared to an operating system installed directly on some hardware) its trivially easy to move your VM from one container to another – from server to server, or even from datacenter to datacenter. The various virtualization technologies have different ways to do this, with different requirements, but they almost all have some way to “migrate” a VM from one host to another. Here is where an important distinction between virtualized private server (or “VPS”) services and cloud services comes into play.

Under the Hood
In order for cloud services to be able to scale in an elastic fashion, they must use some sort of network attached storage. Network attached storage is pretty much what it sounds like – a physical disk storage system that connects to a server over a network. In the case of the cloud, your VM runs on a server which provides the “brain”, the RAM/CPUs and other hardware necessary for the operating system to run and connect to the Internet. But instead of having hard drives inside that same server, the storage is on a networked storage solution that is connected to the server running your VM (in many cases just using your standard Ethernet interface and switching gear).

Having network based storage is generally a requirement for any type of “hot migration”, that is, moving a VM from one server to another very seamlessly. In this configuration, using XEN for instance, you can migrate a running VM from one server to another without even a reboot. This is useful when you need to allocate more ram or CPU cores to a VM that is on a server that is fully utilized (by migrating to a server with more resources available), or if a server has some hardware failure and needs maintenance all the VMs on it can be migrated with minimal interruption.

More Ins and Outs

Image of an actual I/O bottleneck in the cloud, taken by electron microscope and artificially colorized by photoshop. There are several drawbacks to using network attached storage, the primary one being I/O. I/O, very simply, is the rate at which data can be transferred “In” and “Out” of a device. There are various kinds of I/O that affect performance in various ways, but in this case we are talking about disc I/O. There are several types of network attached storage products out there, but the most popular is iSCSI, which is basically SCSI over Ethernet. This means that if you are running Gigabit Ethernet, you have 1,000 Mbit/s of I/O. Serial attached SCSI (SAS) drives put you at 4,800 Mbit/s – in a RAID configuration, you can get much more than that. If you are running 10 Gigabit Ethernet, you have 10,000 Mbit/s.

Comparing I/O rates between devices and standards is never an apples to apples comparison, because it completely depends on the hardware and network configuration – for example it doesn’t matter if you’ve spent all your money on a 10GbE network if you have connected an 8 servers with 8 CPU cores to one 10GbE network connection – you are now sharing that connection with potentially 64 CPU cores – that many cores could easily saturate your expensive 10GbE network several times over. Now, of course, even if each server had a dedicated direct attached SCSI array you could still saturate your disk I/O, but you would have much more overhead to play with, and you could more easily dedicate resources to a specific VM or set of VMs.

As always, there are some exceptions to the rule, but as of right now, most cloud products will not give you as much I/O as you can get out of dedicated direct attached storage. I/O is not everything, but this is one potential bottleneck.

Keep it down over there! – or – Noisy Neighbors
These crazy kids up all night saturating my network Perhaps more important than raw I/O, is the fact that cloud products (and shared VPS services for that matter), are based on shared resources. This is great when you don’t want to pay for all the resources you have access to all the time, but what happens when someone sharing the same resources as you maxes them out? Your service suffers. If they saturate the network your storage is on, your service can suffer *a lot*. For disk I/O intensive applications (like Drupal) this can mean slow, frustrating death. This is true not only for shared network resources, but for shared CPU cores (you can’t share RAM for technical reasons, so at least you are safe there). If you have someone sharing the same resources as you, and they use more than their “fair share”, your service will be impacted. This is called having “Noisy Neighbors”. Some people mitigate this by rebooting their VM, hoping they boot back up on a less resource-bound server. Some just wait it out. Many just watch the low load numbers on their VM, and puzzle over the lackluster performance.

What should I do?!?
Decisions about what type of hosting service to use depends on your particular situation. In most cases, we always say “proof is in the pudding” – if the service you are on is making you happy, and the price is sustainable, then stay there for goodness sake! But if you experience performance problems on a budget hosting or cloud service that are causing you to lose money or traffic, maybe its time to look at a dedicated server environment.

Drupal, especially with the default database-based cache enabled, can be particularly I/O heavy on the database. Many cloud providers offer a hybrid model, where you can have virtualized cloud servers that connect to external non-virtualized hardware with direct attached storage. This is a great solution if you can afford it – it allows you to put your database server on dedicated hardware, but keep your web servers on a more flexible solution.

If you can’t afford that, then look at Drupal performance solutions like Pressflow or Mercury. If thats not an option, then the next stop may be a VPS with direct attached disks and dedicated resources.

If you feel like you are having I/O or “Noisy Neighbor” issues, check if you are on a “shared” resource plan, ask your provider what type of infrastructure they are using and if its saturated. If they won’t tell you, and you continue to have problems, its probably time to start looking for another provider.

Next up: Caching and Drupal – examining apc, varnish, boost, memcache, and content delivery networks, and how they work together.

Related resources