When it comes to disaster recovery, one of the most overlooked areas for most environments is testing. Disaster recovery is only as good as the ability to restore your data. All too often organizations don’t take the time to properly test restoration. This includes fully testing plans of site recovery. Site replication and recovery involves many more moving parts and pieces than simply restoring a file or folder. With site recovery you have virtual machines being replicated to a new environment altogether. Generally, there are different virtual network port groups at play between the virtual environments and there are different network subnets being used between the protected site and the recovery site. Having an automated means to test the process allows making sure that, number one, thorough testing can be done and that it is done. Let’s take a look at Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1.
VMware Site Recovery Manager SRM 8.1 Recovery Test Importance and Workflow
To reiterate the importance of testing the recovery plan, the test of the recovery plan ensures the virtual machine is correctly recovered to the recovery site. If you don’t test recovery plans, then you could in a worst case scenario encounter data loss in a true disaster recovery situation. The great thing about the VMware Site Recovery “test recovery plan” functionality is that it exercises nearly every aspect of the recovery plan. It does this by carefully testing these aspects in a special way to avoid disrupting ongoing operations on the protected and recovery sites. There is a caveat for those jobs that suspend local virtual machines which do so for tests as well as for actual recoveries. Other than this exception, running a test recovery does not disrupt replication or ongoing activities at either site.
Another great feature of the test mechanism is that it does not disrupt the replication of the protected VMs at the recovery site. During the process, the vSphere Replication server creates redo logs on the VM disks at the recovery site so that synchronization of the changes can continue normally. After the cleanup mechanism is performed, the vSphere Replication server removes the redo logs from the recovery site VM disks and merges the changes into the virtual disks.
Array based replication is able to still continue as well. During the test operation, the array creates a snapshot of the volumes hosting the virtual machines disk files on the recovery site. Array replication continues normally while the test is running. After the cleanup operation is ran, the array removes snapshots created as part of the test recovery process.
Recoveries can be ran as often as you want and can even be canceled. Another point to note regarding permissions is that the permission to test a recovery plan does not include permission to run a recovery plan and vice versa.
Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1
Let’s run through the simple steps to test a disaster recovery plan with VMware Site Recovery Manager SRM 8.1. Launch the Site Recovery Manager Console and navigate to Recovery Plans. Select the radio button next to the recovery plan you want to test. Choose Test.
A quick note on the placeholder VM that exists in the recovery site. Note how it is managed by Site Recovery Manager.
The Test dialog box opens for the Recovery Plan you are testing. One the Confirmation options screen, you can choose to Replicate recent changes to recovery site. A point here to think about is that in a true DR scenario, most likely it will be unplanned. In the event of an unplanned failure, you most likely won’t have the chance to replicate recent changes to the recovery site. However, in the event of a planned outage or maintenance period, you could replicate recent changes to ensure the recovery site has the most recent data.
The simple two-step test configuration is complete. Simply click Finish to begin the test.
You will note a Test in progress Status.
Note in the vCenter Server tasks that you will see activity of the virtual machine on both sides being reconfigured and you will see the recovery site VM get powered on for the test.
For my simple test, I had a Linux VM that did not have VM Tools installed. As you can see, the test operation checks to see if VMware Tools is available. You will see these and other errors encountered in the Site Recovery Manager Dashboard.
The Test completes.
After the test is complete, we can run the cleanup process to bring the environment back to the state before testing began. The plan is brought back to a “Ready” state by running a cleanup operation. The cleanup operation must be performed before you can run another test or perform a failover.
What exactly does the Cleanup operation do?
Powers off the recovered virtual machines.
Replaces recovered virtual machines with placeholders, preserving their identity and configuration information.
Cleans up replicated storage snapshots that the recovered virtual machines used during the test.
This launches the Cleanup wizard. Again, this is a simple two-step process.
Click Finish to finalize the cleanup operation.
Again, you will see the vCenter tasks involved in the process.
Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1 is a fairly straightforward process and allows making sure the steps involved with a recovery operation with a site failover work as expected. The great thing about the test recovery is that it performs basically all the same steps with a few differences to ensure production workloads are not disrupted while simply testing.