I've been setting up dedicated machines for years and looked into switching to A...

shimon · on Jan 4, 2009

This seems like a very unusual configuration. Can you explain why you chose it? If you control the hardware and software, how does the VM layer help you scale better? Why not just install whatever packages you need on whatever servers you have, and skip the added complexity and performance cost of virtualization? Is there an assumption that at some point you'll move part of your operations to AWS or a similar service? Or is some of your software buggy enough that it needs to be contained within a VM?

I'm not trying to judge, this just seems like a weird choice and I'd like to know what motivated it.

tdavis · on Jan 5, 2009

Sure (I really should do a real write-up on this...):

So the main purposes are simple horizontal scaling and efficient use of hardware. Virtualization makes horizontal scaling simple because it's just a matter of cloning a particular machine (or machines). It makes efficient use of hardware because there's no need to have a dozen physical machines for a dozen different purposes, unless they all use a full machine worth of resources.

Lets say I want to add a new (non-static) Web server. Well, Apache is on its own VM; I can clone and migrate it to a new physical box (or just have two on the same hardware). I could also simply add more resources dynamically. If I need to scale the database, same deal. The biggest win here is that when I scale one of those, nothing else comes with it. There's no DNS server on the Web box. There is no NFS server on the database box.

Before virtualization, you basically had two choices: Throw a whole bunch of packages on a single box or spread it out over different physical machines. The first completely ruins encapsulation, thus adding unnecessary complexity, while the second is really uneconomical unless you're using all those resources out of the gate.

Then, what happens when you need to scale? All that crap needs to be setup again! Hopefully we were smart about it and made it as simple as possible, but I have never made a system as scalable as my current one-click-cloning mechanism.

Right now we only have a single physical machine (16 cores, 32gb ram, iSCSI); using your recommendation of "whatever packages on whatever servers" I would end up with this clusterfuck of a server that does a dozen different things at once. What I have now is exactly that, except encapsulated into VMs with their own resources and their own purpose.

Yes, virtualization has a slight performance cost (though bare-metal virts like Xen have a pretty marginal one), but I'll gladly accept it for the massively easier scaling and efficient use of hardware. And, yes, if one VM happens to go insane for some reason, it doesn't affect anything else. For instance, on one of our older servers, the MySQL VM's "drive" had a tendency to become corrupt randomly. I never really did figure out why, but I imagine it was because I make fun of MySQL all the time, but I digress -- the point is, it never affected the rest of the machine and, since MySQL was used for things of little importance (Wordpress), it didn't even take down the website when it happened.

That was a pretty rambling explanation, but hopefully it covers your questions. If not, let me know.

shimon · on Jan 5, 2009

Thanks! That's a pretty thorough reply, and I have to agree that it's hard to imagine an easier way to deploy new server instances than copying a VM. I think you can get pretty darn close with a package system like DEB or RPM, but in terms of being able to bring up and test an environment on a development machine, and then be assured the same environment will work exactly the same in production, this is a reassuring approach. I think it makes a lot of sense if you are expecting to require rapid scaling at some point in the future.

I'm a little unsure why you wouldn't just use EC2 in that case, although it might make sense to use a dedicated box for as long as you can and supplement that with EC2 instances when appropriate. Obviously you can get a lot of EC2-instance-equivalents out of a 16-core 32gb box, and your approach would likely be easy to integrate with EC2 when the time comes.

I'm not sure what your load trends look like, but if you're not likely to surpass what your dedicated server can handle within the next few months, this would seem like overengineering. I know you said it "ruins encapsulation" and makes a "clusterfuck of a server" but I don't quite see it. If this is a web app running on a modern gnu/linux distro, with maybe some DNS and cron jobs and email and blogs, we're talking about a very common setup that's running successfully on zillions of boxes. On the other hand, two or three times in five years I have had buggy Apache modules bring down a machine by leaking processes, and it would have been nice if that didn't take e.g. email down with it.

I think my biggest reservation about your approach is that it's weird. It's basically hand-made custom EC2, right? Is there an open source project somewhere to package up the tools to do this on one's own servers? (If not, maybe you should start one... :)

timf · on Jan 5, 2009

It's basically hand-made custom EC2, right? Is there an open source project somewhere to package up the tools to do this on one's own servers?

There's two that I know of that support the EC2 wire protocol: Nimbus and Eucalyptus.

I am the primary developer of Nimbus. We had an EC2-like system released before EC2 existed, only later adapting to their protocol because our users wanted to use the EC2 client tools.

tdavis · on Jan 5, 2009

Yeah, we should surpass it. As you said though, it's also very much about simplification (following the initial learning, anyway). EC2 may have simplified a little more eventually, but it was also even more expensive and difficult for me to learn up front.

As for a custom EC2, it's really the other way around; EC2 is a custom Xen setup. Having initially learned of virtualization before AWS and such cloud services, EC2 is the strange one to me. I'd only ever known "DYI EC2", so to speak.

timf · on Jan 5, 2009

"EC2 is the strange one to me. I'd only ever known "DYI EC2", so to speak."

Those EC2-like systems do have their place. There is a lot more to do and think about when you are allowing others to run VMs on your infrastructure. That is often not the case and not your situation either, it sounds like.

mdasen · on Jan 5, 2009

This is a very serious question as you clearly know what you're talking about from experience: how do you find it cheaper to run dedicated hardware? The reason I ask is because I've priced out 4-core servers with 16GB of RAM at SoftLayer and ThePlanet and they come out to around $700/mo with 2 drives and RAID 1. Amazon charges $750 for an Extra-Large instance (15GB RAM).

There is the potential that you don't want to delve into what you're paying for stuff too much, but it just seems like AWS is charging similar rates to ThePlanet and SoftLayer which are the two dedicated hosts that seem to have the most credibility in the community. Even if you were provisioning your own 1.7GB instances on a larger dedicated box, you would still only fit about 8 or 9 of them in 16GB of RAM (leaving room for Xen and such) which would make it the same price as AWS. The only thing I can see is that the included bandwidth could save some money. Maybe I'm not good at looking for dedicated server deals.

apinstein · on Jan 5, 2009

We've been running these numbers ourselves lately as well.

We find AWS much more expensive.

For instance, we bought a Dell 2950 / 2xQuad Core / 12 GB RAMB / 4x500GB RAID 5 w/hot spare for ~$4500. We have it in a colo where it costs about $150 a month for the space + bandwidth.

This is about equivalent to the $750/mo extra-large instance. There will also be additional AWS fees for transfer, storage, etc, but we'll go with $750 for simplicity.

That's a $600/mo premium, or $7200 a year. So I pay for the hardware within 8 months and after that it's $600 a month savings.

There is a lot of value in being able to provision extra server quickly, use cloudfront, etc, but it comes with a high price, IMO.

tdavis · on Jan 5, 2009

Well, there are a few considerations here (all of this is in reference to Softlayer, whom we use):

- Depending on your storage requirements, 2 drives + RAID 1 (which is more of a convenience than anything and I almost never recommend getting) is often times more expensive than an iSCSI LUN which is far superior and offers zero-setup cross-country replication and snapshots (if we're going to pretend that RAID 1 is some kind of backup solution).

- When ordering, if you choose the lowest clock speed CPUs, you're practically guaranteed to get the highest rated (more expensive ones) for free. This is either due to scarcity of low-end CPUs or Softlayer loves me. I have ordered numerous boxes from them and this has always been the case.

- They always have "specials" which are usually pretty ridiculous. For instance, 16 of the 32GB of RAM we have was free, as in beer. Right now (and most of the time) they have free double RAM and HDD. Kiss the cost of one of those RAID drives away.

- There are non-monetary considerations, such as support. Softlayer has without a doubt the best technical support I have ever been provided, and I've been through countless hosts in my tenure. We're talking about an Unmanaged host that has better techs than any Managed host I've come across. Not to mention conveniences such as automated OS reloads, private network, inter-DC OC3 backbones, VPN, secure backups, optional CDN, etc. (AWS has most of these afaik, minus VPN, but this goes to equivalence)

- Your 4-core server, if you don't make use of deals, would likely be equivalent to AWS. Once you start getting into high core-counts, that changes fast. As a huge proponent of parallelization, many of the processes run for TicketStumbler make use of multiple CPUs; this means a lot of what we do is CPU-bound, thus the need for higher core counts.

- 2TB of bandwidth is included; I also have no idea how this affects the cost overall. Edit: I added a couple TB of transfer to the AWS calculator, plus 80GB of storage: $854.10 per Extra Large. The difference in cost between this and our machine now amounts to nearly nothing.

So, at the end of the day, the hardware we have is nearly identical in cost (within $100, IIRC) to the Extra Large Linux instance you reported, while having twice the number of CPU cores and twice the amount of RAM. We're also afforded all the other luxuries that come with the myriad services and support the conventional dedicated host provides.

The dedicated hosting environment also allows me to setup and administer the hardware in the method I described in my previous reply; i.e., I don't have to setup a single Extra Large Instance (well, technically two) to handle a dozen different jobs.

Hope this helps! Let me know if you have any other questions.