vSphere Metro Storage Cluster with HPE StoreVirtual – Part 1
Recently I had the pleasure of designing and building a vSphere Metro Storage Cluster (vMSC). In this blog I will provide information about setting up vMSC with HPE StoreVirtual. I will highlight the differences between our implementation of vMSC and the VMware vSphere Metro Storage Cluster Recommended Practises.
This blog will consist of two parts. In part 1 I will focus on the reason behind this blog, provide an overview of our implementation and I will provide details about HPE StoreVirtual Multi-Site. In part 2 I will focus on the details of vMSC, what happens when site to site communication is no longer possible, the issues we faced and I will provide an overview of the documentation used.
Why this blog?
First of all let me explain why I felt I needed to write this blog about vSphere Metro Storage Cluster (vMSC) in combination with HPE StoreVirtual (LeftHand). Currently I’m working at a customer site and they built a highly available vSphere environment to support the services they deliver to their customers. In this situation the vSphere cluster runs in one datacenter and the ESXi- and storage nodes are spread between two different racks.
At the time this environment was designed and built there was just one datacenter available. A couple of months ago a second datacenter became available. One of the clients of my customer (are you still with me :d) requested a new service and also requested this service to be available if one of the datacenters was not available due to disaster. This gave me the opportunity to design and build a vSphere environment which spans sites. vMSC is the VMware certified solution for stretched storage cluster configurations. This solution is basically a stretched vSphere cluster where the storage is also stretched and available in multiple sites. HPE calls this HPE StoreVirtual Multi-Site (or LeftHand Multi-Site).
HPE has no up-to-date documentation about vMSC in combination with StoreVirtual. The latest documentation they have is regarding vSphere version 5.0. I opened a support call and got confirmation there is no documentation regarding vSphere 6.x. In the documentation which is available HPE talks about:
- the requirement to have redundant physical links between datacenters
- if the link between sites is lost the Storage Cluster becomes partitioned and only one site maintains access (site can be configured through primary site designation)
The above information is exactly the reason for this blog. Because the storage solution is truly active-active and there is no possible way to designate a particular LUN to a site (so no site locality) the failure scenario in case where the site to site link is lost is not as described in the VMware vMSC documentation. HPE describes 13 tested failure scenario’s but losing communication between sites is not one of them.
This blog will provide information for setting up a vMSC with HPE StoreVirtual using vSphere 6.0 Update 2 and HPE Lefthand 12.6. It will also describe the issues we faced while designing, building and testing the environment. Thank you Duncan Epping and VMware support for working with us on these issues.
vMSC, HPE StoreVirtual and Networking Overview
Now that we have discussed the current set up of the environment and the reason behind this blog it is time to provide an overview of the solution we have built. After this I will go into the details of the HPE StoreVirtual and vMSC.
vMSC and StoreVirtual overview
The picture below shows the vMSC solution. There are two sites (site01 and site02) which are running ESXi nodes and form one vSphere Cluster. VM’s can run in both datacenters. The ESXi hosts are connected to a HPE StoreVirtual Multi-Site cluster. This cluster provides active/active storage access. The third site (site03) we have not talked about yet so let me explain why it is in this picture. The third site hosts a Failover Manager (FOM). This FOM is part of the HPE Multi-Site solution and is used in the event of a failure. I will go into more detail about the storage solution shortly.
Network Overview
In the “HPE StoreVirtual Storage Multi-Site Configuration Guide” HPE describes the design considerations for the network. The picture below shows our network. I know there are a lot of different ways to configure the network and I’m not saying ours is the best way, but we had to deal with a couple of constraints and this is the network currently in place as designed and built by our networking team.
Each site has two switches. The ESXi servers and the StoreVirtual nodes are connected to both switches. The picture below only shows how they are connected logically.
In this network design there are two 10Gbps links between Switch01 and Switch03. These links are dark fibres and are bundled. These are also the primary links. The connection between Switch02 and Switch04 is configured as passive/standby. These are not Dark fibres, but this is a L2 VPN connection. The L2 VPN connection is only used if the connection between Switch01 and Switch03 is lost. Latency is extremely important and should not exceed 1 ms RTT. The requirements for the site hosting the FOM are off course different. This can be a 100 Mpbs connection with 50 ms RTT. Our network team tested the latency for the dark fibres and the L2 VPN connection and latency does not exceed 1 ms. With this configuration in place a situation where both sites are unable to communicate with each other should be a rare occurrence.
Details HPE StoreVirtual and vMSC
As promised it is now time to provide you with the details. I will provide details on how we have built the HPE StoreVirtual and vMSC environment. In the vMSC configuration section I will highlight the differences between our implementation of vMSC based on HPE StoreVirtual Multi-Site and the VMware vSphere Metro Storage Cluster Recommended Practises. There are differences because with HPE StoreVirtual it is not possible to designate a particular LUN to a site (site locality) as explained earlier in this blog.
HPE StoreVirtual Multi-Site configuration
Let me start explaining how we have built the HPE StoreVirtual Multi-Site solution. HPE StoreVirtual Multi-Site is a clustered iSCSI storage solution. Multi-Site is a feature of the LeftHand operating system (SAN/iQ). By configuring Network Raid 10 volumes, the LUN’s are striped across the StoreVirtual nodes. This provides high availability.
Software and version used
Lefthand OS 12.6 is used.
Bonding
The StoreVirtuals are configured with 4x1Gbps and 2x10Gbps interfaces. We use the 10Gbps interfaces for iSCSI traffic and two of the 4 1Gbps interfaces for management traffic. By bonding the interfaces teaming can be enabled and one IP address can be used by multiple interfaces. So if one interface (or switch) were to fail the StoreVirtual maintains connection. We created 2 bonds. Bond0 bundles the 2x1Gbps interfaces and bond1 bundles 2x10Gpbs interfaces. Off course both interfaces are connected to a different switch. The MTU size for bond1 is configured as 9000 to support jumbo frames.
Load Balancing
Both bonds use different load balancing mechanisms. Let me explain the different load balancing policies first and then I will explain which policy is used for each bond and why.
There are three load balancing mechanisms:
- Active-Passive
With Active-Passive one interface is marked as preferred. Only the preferred interface is actively used. The other interface only becomes available if the preferred ones fails.
- Link aggregation dynamic mode
Both interfaces can be used simultaneously. If one fails the other remains active. Both interfaces must be connected to the same switch however.
- Adaptive Load Balancing
Both interfaces can be used simultaneously. If one fails the other remains active. Both interfaces do not have to be connected to the same switch as long as both switches are connected with each other.
Adaptive Load Balancing is used for bond0 (management traffic). This is because Link aggregation dynamic mode cannot be used because our switches are not stacked and we do not want to connect to just one switch. By using Adaptive load balancing we have increased bandwidth and can handle a single interface or switch failure.
Active-Passive is used for bond1 (iSCSI traffic). This is because the site to site primary connection is configured between Switch01 and Switch03. To avoid additional latency and the so called “Out of Sequence” symptoms Active-Passive is used.
Network traffic types
To separate different types of traffic the StoreVirtuals supports the following traffic types:
- LeftHand OS Interface
The LeftHand interface is used for communication between storage nodes, Failover Managers and Management groups.
- Management Interface
The management Interface is used to manage the storage nodes with for example the Centralized Management Console (CMC).
- iSCSI Interface
The iSCSI Interface is used for iSCSI communication between storage nodes.
The picture below shows the different types of traffic.
The LeftHand OS Interface and the Management Interface use bond0. The iSCSI Interface uses bond1. With this configuration we made sure the 10Gbps interfaces are used for storage traffic and “management” traffic uses the 1Gbps interfaces and the traffic types are separated from each other.
Failover Manager (FOM)
As explained earlier the FOM is used in the event of a failure. FOM is a component used by HPE StoreVirtual Multi-Site and runs as a virtual machine in a third site. The FOM makes sure the volumes remain available in case of a failure like a failure to a storage node, site failure or a site to site connection failure.
HPE StoreVirtual Multi-Site clustering uses a quorum mechanism to determine consistency between storage nodes. FOM does not participate in storage traffic and contains no data of any volumes. The removal or failure of the FOM in a healthy environment will have no impact other than generating a warning.
Management Group, Clusters and Volumes
A Management Group is a collection of one or more storage systems. Storage systems will be placed inside a management group.
All ESXi servers (from both sites) must be able to access the volumes. This means all initiators are added to the management group and configured with Read/Write access to the volumes. Within the management groups sites are configured. Site01 will be configured as primary site. This means that in case of a site to site connection failure the volumes will be available in site01 and unavailable in site02. It is important to make sure the storage nodes, ESXi servers and the FOM are added to the correct site.
Clusters are groups of storage nodes within a management group. Clusters form storage pools which enable the creating of volumes. These volumes are configured with RAID 10 2 way mirror to make sure the volumes are spread between sites.
The cluster is configured with a virtual IP address (VIP). This makes sure volumes are still accessible if one node fails. The ESXi servers are configured to connect to this VIP to gain access to the volumes.
This is the end of part 1. In part 2 I will focus on the details of vMSC, what happens when site to site communication is no longer possible, the issues we faced and I will provide an overview of the documentation used. So stay tuned for part 2.