Recently I had the pleasure (again) of designing and building a vSphere Metro Storage Cluster (vMSC). The last time I did this the environment was built using vSphere 6.0 and I used HPE StoreVirtual Storage, see my previous blog.
This time the customer already had a storage solution in place: HPE 3 PAR F7400. And the customer wanted me to design and build a vMSC using the latest available version. Which was vSphere 6.5 at the time.
Normally I go through the available best practices from all relevant vendors, in this case VMware, HPE and Dell because we are using PowerEdge servers from Dell. I was a bit disappointed to find out there are no up to date best practices from VMware regarding vMSC and vSphere 6.5. I contacted VMware and they confirmed there is no whitepaper regarding vMSC available for vSphere 6.5.
The reason for this blog is that I encountered something I didn’t expect with a new vSphere 6.5 DRS feature “VM Distribution”. But before we dive into that let me first explain what we tried to achieve.
vSphere Metro Storage Cluster
The customer was in need of a new vMSC. A vMSC is a specific VMware certified high availability solution. Basically it is a stretched cluster in which you could lose a whole site and still have your VM’s up and running.
Site01 is the primary site. Meaning all primary VM’s will be running in this site. If availability can be arranged on the application level then the secondary VM’s will be placed on Site02. For example the customer has two domain controllers, one will be placed in Site01 and the other in Site02. The same goes for Citrix provisioning services, SQL Clustering and so on.
New vSphere 6.5 DRS: VM Distribution
New in vSphere 6.5 is a DRS option called: VM Distribution. By using this setting DRS not only loads balances based on CPU and Memory (and recently network traffic) but it also tries to balance the number of VM’s per ESXi host. In a normal cluster this is an option you would definitely consider to use because it can limit the number of VM’s impacted by an HA event.
If you enable this feature an advanced DRS option called “LimitVMsPerESXHostPercent” is set with a value of 0. As you can see on the image below.
LimitVMsPerESXHostPercent
This setting determines the number of VM’s that can run on a host. Before vSphere 6.5 it was possible to set this advanced setting with any number between 0 and 100. But when using vSphere 6.5 and enabling the VM Distribution option to evenly spread the VM’s between hosts the value for LimitVMsPerESXHostPercent cannot be adjusted. If you adjust it, it will revert to 0.
The maximum number of VMs that will run per host is calculated using the following formula: (Number of VMs)/(Number of vSphere hosts) + (Number of VMs)/(Number of vSphere hosts) * (LimitVMsPerESXHostPercent).
In the case of my customer we have 4 ESXi hosts per site. Let’s assume we have 120 VM’s. This will result in the following number of VM’s per ESXi host:
120 VM’s / 8 hosts = 15
15 + (15 * 0%) = 15
vMSC with VM Distribution
The calculation above shows us we can have 15 VM’s on each ESXi Host. The problem now is the fact that my customer does not have a balanced vMSC. We are running most of our VM’s on the primary site. This means there are a lot more VM’s on the primary site then on the secondary site.
This leads to the fact that DRS is limiting the number of VM’s per ESXi host and results in VM’s running on the wrong site (even with the use of DRS affinity rules) and more importantly it prevents us from powering on additional VM’s.
So in a unbalanced vMSC the option VM Distribution does not make sense. Even if the advanced LimitVMsPerESXHostPercent option is adjustable I don’t see the point of using it, because if your (unbalanced) environment grows you must keep calculating the value of this setting.
I hope VMware will take this into account when updating the vMSC best practices for vSphere 6.5.