Achieving HA and DR with Azure Shared Disks
Azure shared disks is a new feature for Azure managed disks that enables you to attach a managed disk to multiple virtual machines (VMs) simultaneously. Attaching a managed disk to multiple VMs allows you to either deploy new or migrate existing clustered (e.g. Windows Clustering) applications to Azure. Unfortunately Azure Shared Disks lacks support for Azure Backup and Azure Site Recovery. This limitation prevents customers to use Azure Shared Disks for HA & DR.
Don’t be scared. Storage Replica comes to the rescue. Storage Replica is Windows Server technology that enables replication of volumes between servers or clusters for disaster recovery. It also enables you to create stretch failover clusters that span two sites, with all nodes staying in sync.
In this blog I’ll be focussing on the “Stress Cluster” scenario using Azure Shared Disks. i.e. 2 servers in the primary and only 1 VM in the DR region.
Prerequisites
- 1 or 2 Windows 2019/2016 Servers with Active Directory Domain Services.
- 2 Windows 2019/2016 Datacenter Servers with Storage-Replica, Failover-Clustering, FS-FileServer roles are installed.
- Domain join the above servers to your windows cluster.
- Create 1 Data and Log Disk of type “Azure Shared Disk” in each region via Azure Portal/CLI and attach the same to the Clustered Servers. The disk drives should be of same size in both locations.
- Log and Data Disks must be initialized as GPT.
- The volumes must be formatted with NTFS or ReFS
Steps to enable replication via Failover Cluster Manager
- Configure Stretch Cluster site awareness
New-ClusterFaultDomain -Name eastus2 -Type Site -Description "Primary" -Location "eastus2 Datacenter"New-ClusterFaultDomain -Name eastus2 -Type Site -Description "Secondary" -Location "centralus Datacenter"Set-ClusterFaultDomain -Name iscsvm0 -Parent eastus2Set-ClusterFaultDomain -Name iscsvm1 -Parent eastus2Set-ClusterFaultDomain -Name iscsdrvm -Parent centralus(Get-Cluster).PreferredSite="eastus2"
2. Launch the failover cluster manager GUI
If you haven’t already add the available storage to your cluster via
Get-ClusterAvailableDisk -All | Add-ClusterDisk
3. I have 2 servers in the primary region (ISCSVM0 and ISCSVM1) and only 1 server in the DR region (ISCSDRVM). Refer to the below screenshot
4. I have a pair of Data and Log Disks (Cluster Disk 1 & 2) in the primary region and the same in the DR region (Cluster Disk 3 & 4). Right Click on Cluster Disk 2 and Add it to the “Cluster Storage Volume”
Right Click on Cluster Disk 2 >> Replication >> Enable
Select the destination data disk in the DR region.
Leave the Overwrite Volume value at Overwrite destination Volume if the destination volume does not contain a previous copy of the data from the source server. If the destination does contain similar data, from a recent backup or previous replication, select Seeded destination disk, and then click Next.
Leave the Consistency Group value at Highest Performance if you do not plan to use write ordering later with additional disk pairs in the replication group. If you plan to add further disks to this replication group and you require guaranteed write ordering, select Enable Write Ordering, and then click Next.
You have successfully configured Storage Replica from Primary to DR region :-)
Validate the Storage Replica
- On the Source VM go to C:\ClusterStorage\Volume1; Create some folders and files
Alternatively launch the eventvwr.exe in the source VM.
i) navigate to Applications and Services \ Microsoft \ Windows \ StorageReplica \ Admin and examine events 5015, 5002, 5004, 1237, 5001, and 2200
ii) On the destination VM navigate to Applications and Services \ Microsoft \ Windows \ StorageReplica \ Operational and wait for event 1215. This event states the number of copied bytes and the time taken. Example:
iii) On the destination server, navigate to Applications and Services \ Microsoft \ Windows \ StorageReplica \ Admin and examine events 5009, 1237, 5001, 5015, 5005, and 2200 to understand the processing progress. There should be no warnings of errors in this sequence. There will be many 1237 events; these indicate progress.
Conclusion
In this blog I have demonstrated how Azure Shared Disks with Storage Replica can be used to achieve DR across the Azure regions.
References
Storage Replica Overview | Microsoft Docs
Cluster to Cluster Storage Replica cross region in Azure | Microsoft Docs
Known issues with Storage Replica | Microsoft Docs
Frequently asked questions about Storage Replica | Microsoft Docs
Stretch Cluster Replication Using Shared Storage | Microsoft Docs