Sure, we’ve heard it all before. At the end of the day when it comes to performance benchmarks, someone is always going to complain. Ask a vendor representative a performance question, and you may get the answer taught in IT Marketing 101 classes around the world -

For performance questions, the ‘ol reliable “It depends…” always seems to work as a convenient escape tactic. Yes, I know… it really does depend. Really! Now let me get to my point.
If you haven’t seen the latest drama around hypervisor benchmarks, Rick Vanover’s recent Virtualization Review magazine article “Lab Experiment: Hypervisors” is where you should start. You should then head over to the Windows Virtualization Team blog and read Patrick O’Rourke’s take, and follow it up with Eric Horschman’s take on the VMware Virtual Reality blog. For some more perspective, these blog posts provide additional commentary:
- Say it isn’t so: Hyper-V and XenServer outperform ESX (Jason Boche)
- Reaction to “Say it isn’t so: Hyper-V and XenServer outperform ESX” (Ken Cline)
Over the last couple of days, I’ve had the opportunity to speak with Scott Drummonds on VMware’s performance team, as well as to Keith Ward (Virtualization Review Magazine’s editor) and Rick Vanover (the author of the article that sparked the latest performance debate). I spoke with Keith and Rick after reading Eric Horschman’s post, which passionately defended the need for VMware’s EULA restriction on public benchmarks. Keith Ward is one of the most meticulous editor’s that I know, and I’ve known Keith for about nine years. So I was surprised that Keith would publish any benchmark that violates the VMware EULA. Rick is also a stand-up guy, so I doubted that Rick would write something that violates a EULA restriction either. To make a long story short, it appears that there were some communication disconnects between VMware, Rick, and Keith. Keith and Rick thought they had approval from VMware on the test methodology. It’s also clear to me that VMware thought otherwise.
One of VMware’s issues with the Virtualization Review benchmark stems from the fact that in the third test, VMware ESX 3.5 took over 50 seconds longer to complete a SQL job than Hyper-V. In his post, Horschman states:
The fact that ESX is completing so many more CPU, memory, and disk operations than Hyper-V obviously means that cycles were being used on those components as opposed to SQL Server.
He’s right, and I agree that the added CPU (40% greater), memory, and disk operations noted in the ESX benchmark would degrade the SQL job response time.
Now let’s shift gears and talk about a benchmark that I believe VMware had no objections with - the recent Network World hypervisor bake off. VMware wants industry standard benchmarks? Well that’s what Network World used, and in some tests ESX lagged XenServer and Hyper-V. Take a look at the SPECjbb2005 results, which included a total of 12 tests using Windows Server 2008 and SLES 10 guests, vCPUs ranging from 1 to 4, and VMs ranging from 1 to 6. In the most extreme test, six 4-vCPU VMs were run on a 4-way quad core host (total of 16 cores), resulting in CPU oversubscription of 1.5:1 (24 total vCPUs on 16 physical cores). Here’s the bottom line. There were 12 SPECjbb2005 tests, XenServer had the best results in 9 tests, while ESX, Hyper-V, and Xen on SLES 10 were tops in one test. ESX was consistently second in the tests it didn’t win. The Network World I/O tests revealed that Xen on SLES 10 performed best, primarily due to the fact that Novell enables write caching by default.
I’m mentioning the Network World benchmark because the Virtualization Review benchmark was not the first time a major publication offered hypervisor performance results favorable to VMware’s competitors.
Here’s the deal. What does it all get back to? It depends! Hypervisor performance is very workload-specific, so even industry standard benchmarks like SPECjbb2005 are not without fault. You need to test a series of workload patterns that mirror your environment in order to draw a full conclusion. We constantly advise our clients to P2V or V2V their existing systems for internal hypervisor performance testing. That provides the best idea of what’s important - how the hypervisor performs in your environment, with your workloads (that was Rick’s intent in his article). Outside of customized internal testing, SPECvirt is our best hope. For the SPEC Virtualization Committee, I offer this advice - take your time… but hurry up! Yes, please do your diligence to get the benchmark right, but we really don’t want to wait another couple of years for it. The sooner you can provide a vendor-neutral virtualization benchmark, the better.
Until SPEC delivers SPECvirt, it’s going to be up to us, the virtualization community, to carry the torch. Vendor benchmark standards such as VMware’s VMmark are going to help you compare how a hypervisor performs on different server platforms (such as HP and IBM), but will not be trusted for comparing different hypervisors. Have you heard Simon Crosby or Mike Neil encourage anyone to use VMmark? Don’t get me wrong. I’m a fan of VMmark, and think it’s the most comprehensive virtualization benchmark we have. However, it’s owned and maintained by a vendor, so it’s not something that any magazine or independent analyst firm (such as Burton Group) can use for comparative purposes. SPECvirt is needed to diminish the “it depends” factor with virtualization performance evaluations, and give us something that all vendors can agree on.
At Burton Group, we’re doing quite a bit of research to help add clarity to hypervisor performance considerations and evaluations, and you’ll be hearing more from me on that later. Also, I’m waiting on one final committment for a hypervisor performance debate to occur at our Catalyst Conference in July. More details on that once I have the final speaker confirmed. In the mean time, if something in a vendor benchmark doesn’t look right, tell the world. I blogged about what I thought was a suspect benchmark last summer, and will continue to do that when I see others that I feel are deceptive, whether intentional or not.







Recent Comments