VMware Launches the V Series Mainframe


VMware’s clever marketing folks can call their latest release whatever they want (vSphere 4.0 if you’re keeping track), but to me it’s the V Series Mainframe.

VMware is taking mainframe-class availability, performance, and infrastructure management principles and bringing them to commodity hardware. vSphere 4.0’s release, in my opinion, makes it hard to argue against VMware’s intentions of a software mainframe. VMware Fault Tolerance (FT), for example, is one of the new features that provides the availability levels required by many tier 1 applications. This is especially critical for home grown tier 1 apps that do not have built-in resiliency. VMware FT allows a single VM to be kept in lock-step on 2 physical hosts, allowing the VM to be fully resilient to the failure of a physical host. FT is a 1.0 release and does have some limitations. For example, only VMs with a single virtual CPU are supported. Note that this isn’t a VMware-only limitation since competitors such as Marathon Technologies cannot mirror VMs with multiple vCPUs either. It’s a tough problem to solve and one that’s going to take some time.

Back to the announcement. VMware is further cementing their position that they are a cloud OS, and I agree. Many tasks associated with the traditional OS (e.g., resource scheduling, QoS) are moving down to the hypervisor and associated virtual infrastructure, so the roles of the traditional monolithic OS are changing. VMware isn’t alone here. Microsoft is retooling its OS and application delivery methodologies as well.

Many of our clients are interested in internal cloud and are looking for practical steps they can do today to start down the cloud path. Most aren’t able to seriously consider external cloud out of concerns regarding security and regulatory compliance. However, organizations can begin building an internal cloud architecture that is capable of leveraging external cloud resources once the predominant security and compliance riddles are solved. VMware is banking that if vSphere is the foundation for an enterprise’s internal cloud, the enterprise will look at vSphere-based external cloud resources once the time is right.

vSphere 4.0 includes a laundry list of new features, including:

  • Distributed Power Management (DPM)
  • Availability of the VMsafe API
  • Distributed virtual switch and support for the Cisco Nexus 1000V
  • Host profiles
  • vShield Zones
  • Thin provisioned virtual hard disks

I’m not going to discuss all of the features (you can get the details here and read plenty of additional analysis here), but I wanted to talk about a few takeaways from the launch event:

  • DPM
  • Thin provisioned virtual hard disks
  • I/O as a factor in VM placement
  • Hot memory add
  • Partial host failure detection

When DPM was announced with ESX 3.5, Burton Group advised clients to stay away from using it for two reasons:

  • VMware considered the feature “experimental” and wouldn’t officially support it themselves
  • VMware’s IHV partners would not officially support it either

I had blogged on DPM back in November, and while the post incited some strong vendor reaction, it served its purpose – move the discussion forward on official IHV support for DPM, an assurance many of our clients wanted before they would consider implementing it in production. VMware now supports DPM, which is a good first step. Burton Group has had serious dialogues with all major IHVs on the topic of DPM for nearly a year, and we believe that official support from the server IHVs isn’t far off, so stay tuned. DPM can result in substantial power and cooling cost reduction by shutting down unneeded servers in a given ESX cluster, and then turning them back on once they’re needed again. Once the IHVs step up and do their part, I expect some Burton Group clients to begin implementing DPM for some of their workloads.

Thin provisioned virtual hard disks (i.e. virtual disk files that grow as data is added to them) isn’t a new concept. VMware Workstation has had this feature since its inception. It’s in other hypervisors such as Virtual Server 2005 and Hyper-V too. It was even in ESX 3.5, but wasn’t officially supported. VMware is high on thin provisioned virtual disks, but keep in mind that there will be a small performance overhead associated with using this feature. VMware has yet to publish a benchmark illustrating the performance tax. For enterprise implementations, thin provisioning is best done in the storage array. For smaller deployments and for deployments involving arrays that don’t support thin provisioned storage, using thin provisioned virtual disk files can result in considerable storage savings.

During Steve Herrod’s keynote, he passionately stated how vSphere 4.0 can run practically any x86 workload, including high end I/O intensive databases. He further went on to tout the values of VMware’s distributed resource scheduler (DRS) and DPM features. Both DRS and DPM relocate workloads to balance performance utilization across an ESX cluster or to shut down unneeded physical servers. That all sounds good, but there’s one problem. The intelligence used by DRS and DPM only takes memory and CPU utilization into account when determining VM placement. I/O utilization is ignored. This means that it’s possible that a relocated VM will be I/O bound as soon as it lands on a new physical host. I’ve talked to VMware technology partners who are eager to provide deeper I/O utilization information to vCenter. The problem is, however, that vCenter doesn’t have an API that can be used for this purpose, nor does it have the metadata structure for storing this type of information. Until VMware can factor I/O into VM placement decisions, you should use caution when considering enabling DRS or DPM on I/O intensive workloads. To be fair, VMware’s competitors don’t take I/O into account for VM placement decisions either, but I still see it as something that needed to be pointed out.

Hot resource add (e.g. RAM) is a nice new feature too. One thing to remember though is that hot adding memory to a running VM is only useful if the application running inside the VM can take advantage of the new memory without a restart. If the application must be restarted, the VM is as good as offline anyway. That being said, the way applications are able to leverage hot memory add should be something you’re querying prospective software vendors about, and should be something you include in RFPs.

On the high availability side, I’m still waiting on partial node failure detection. Why is this important? Consider an ESX host that’s online, but due to physical storage controller failure is unable to meet required service levels. So while storage access may remain thanks to multipath support, you may not have enough available I/O to meet service level requirements. Intelligence that allows the cluster to rebalance VMs due to reduced I/O availability, for example, further brings vSphere closer to VMware’s goal of building a software mainframe.

Based on the last few paragraphs, it may seem like I’m raining on VMware’s parade, but that’s not my intent. vSphere 4.0 is a major release, and if the massive performance improvements measure up to VMware’s claims, the hardware savings resulting from the associated VM consolidation densities will be enough to cost-justify a vSphere 4.0 upgrade. Of course, the enhanced security (e.g. VMsafe API) and networking features with accelerate adoption as well.

Once again, VMware’s raised the feature bar. Next, I’m looking forward to see how Citrix and Microsoft respond at each of their conferences next month.

Note: Originally posted to Burton Group’s Data Center Strategies blog.

  1. #1 by Harley Stagner - April 23rd, 2009 at 08:13

    Excellent post as always. I was waiting for your reaction to vSphere and I was not disappointed. You brought up some good points to consider when designing solutions around VMware’s new suite.

    I especially liked the comments on VM’s being I/O bound and making sure to speak with application vendors about hot-add memory.

(will not be published)