I’ve been traveling quite a bit lately, so it’s been hard to keep up with the blogs that I normally frequent. Today I played catch-up with Mike Dipetrillo’s blog, which contained an interesting post on VMware’s Distributed Power Management (DPM), an “experimental” feature in VMware’s VI 3.5 software. This is an interesting concept, and one that competitors such as Virtual Iron offer as well. While such features make for a nice story and blog post, I have to question how relevant they are to the typical enterprise. Don’t get me wrong. Power management is critical and the virtualization platform vendors will have to play a role in adding power management to the hypervisor; however, I’m not convinced that fully shutting down physical hosts is the solution. Sure the mobility offered by virtualization mitigates the risk associated with component failure that may result from frequent power cycles – if one physical host doesn’t come up, the VMs can run on the remaining nodes in the cluster. But is the approach offered by DPM a pill most enterprises are ready to swallow?
This year alone, I have asked hundreds of IT folks about power management and I can count on one hand those that would allow software to dynamically shut down and power up their production systems. Why? Practically every one of them has cold booted a server and had it not come back up. So the idea of scheduling such an activity to occur daily scares the (insert your favorite adjective here) out of them. Driven by these concerns, Burton Group approached every major server independent hardware vendor (IHV) this year and asked them if they had conducted any testing on the impact of distributed power management (shutting down servers nightly) and mean time between failure (MTBF). Across the board, the answer was “No.”
Some IHVs aren’t too interested in even testing such a scenario because they don’t think it will ever fly in most enterprises. Instead, they feel that adding more power management features (e.g. powering down unneeded CPUs, memory, or PCI devices) to their server platforms is the right path to take. Of course, this means that the hypervisor vendors will need to update their software to take advantage of such features. Also, the IHVs will still need to conduct MTBF testing to ensure that any new power management features do not significantly degrade MTBF. As embedded hypervisors (e.g. ESXi, XenServer OEM) continue to evolve, we’ll hit a time where physical drives are no longer needed in servers. Instead, the hypervisor will simply load from flash. Getting the hard disk out of the server will improve power efficiency, but we also need better power management between the hypervisor and server too.
Microsoft has already announced that advanced power management will be included in the Hyper-V update coming in Windows Server 2008 R2. VMware, Citrix, and Virtual Iron (among others) will need to follow a similar path. Until we have advanced power management in the server and hypervisor, and have MTBF tests that allow enterprises to adopt such features with confidence, let’s hold off on propping up science projects features like DPM. Using such features in development, test, or training environments may make sense if you feel the good (improved energy efficiency) outweighs the bad (reduced server or component life). In production environments, stay away from features such as DPM until your preferred IHV will stand behind a particular power management solution, and has the test data to back it up.
Note: Originally posted to Burton Group’s Data Center Strategies blog.







#1 by Craig - December 8th, 2008 at 15:17
Sorry but I had to nitpick.
SNIP
I’m not convinced that fully shutting down physical hosts is the solution.
/SNIP
DPM tosses a host into standby not fully shutting them down. It’s not Cold-Booting but more like a warm reload.
Just an FYI.
#2 by Chris - December 8th, 2008 at 20:59
Good catch, Craig, and thanks for pointing that out. We talked about the standby vs. cold boot impact on server resources (power supply, drives, memory, etc.) internally and while you’re right about the difference, the impact on server resources between coming out of standby and a cold boot is effectively the same. Drue Reeves, a guy I highly respect, was deeply involved in these issues at Dell, HP, and Compaq. My stance has never been that DPM is a bad thing. I don’t think that anyone would argue against it being in our not too distant future. I, along with my colleagues at Burton Group, are asking the IHVs to step up and fully support some form of active power management. The good news is that the attention this blog post has generated over the last few weeks is having an impact. I’ve heard positive updates (so far under NDA) for more than one virtualization vendor as well as from a couple of hardware vendors. Hardware vendors that I’ve recently spoken with feel that active power management (e.g. running system resources at a reduced frequency) instead of powering them down (or placing them in standby mode) may become the vendor-recommended solution. Of course, time will tell soon enough. In the mean time, our clients would like some reassurance from the hardware vendors that deploying any time of dynamic power management on a large scale would not void their service contracts or associated warranties. Some have noted that tests in the field have yielded positive results. Still, it’s not too much to ask for the server hardware vendors to simply give their clients what they’re looking for – a simple statement saying they will support VMware DPM, and will fully honor service contracts and warranties in environments that use DPM.