Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

0

When it comes to disaster recovery, one of the most overlooked areas for most environments is testing. Disaster recovery is only as good as the ability to restore your data. All too often organizations don’t take the time to properly test restoration. This includes fully testing plans of site recovery. Site replication and recovery involves many more moving parts and pieces than simply restoring a file or folder. With site recovery you have virtual machines being replicated to a new environment altogether. Generally, there are different virtual network port groups at play between the virtual environments and there are different network subnets being used between the protected site and the recovery site. Having an automated means to test the process allows making sure that, number one, thorough testing can be done and that it is done. Let’s take a look at Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1.

VMware Site Recovery Manager SRM 8.1 Recovery Test Importance and Workflow

To reiterate the importance of testing the recovery plan, the test of the recovery plan ensures the virtual machine is correctly recovered to the recovery site. If you don’t test recovery plans, then you could in a worst case scenario encounter data loss in a true disaster recovery situation. The great thing about the VMware Site Recovery “test recovery plan” functionality is that it exercises nearly every aspect of the recovery plan. It does this by carefully testing these aspects in a special way to avoid disrupting ongoing operations on the protected and recovery sites. There is a caveat for those jobs that suspend local virtual machines which do so for tests as well as for actual recoveries. Other than this exception, running a test recovery does not disrupt replication or ongoing activities at either site.

Another great feature of the test mechanism is that it does not disrupt the replication of the protected VMs at the recovery site. During the process, the vSphere Replication server creates redo logs on the VM disks at the recovery site so that synchronization of the changes can continue normally. After the cleanup mechanism is performed, the vSphere Replication server removes the redo logs from the recovery site VM disks and merges the changes into the virtual disks.

Array based replication is able to still continue as well. During the test operation, the array creates a snapshot of the volumes hosting the virtual machines disk files on the recovery site. Array replication continues normally while the test is running. After the cleanup operation is ran, the array removes snapshots created as part of the test recovery process.

Recoveries can be ran as often as you want and can even be canceled. Another point to note regarding permissions is that the permission to test a recovery plan does not include permission to run a recovery plan and vice versa.

Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Let’s run through the simple steps to test a disaster recovery plan with VMware Site Recovery Manager SRM 8.1.  Launch the Site Recovery Manager Console and navigate to Recovery Plans.  Select the radio button next to the recovery plan you want to test.  Choose Test.

Testing-a-Recovery-Plan-in-VMware-Site-Recovery-Manager-8.1 Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Testing a Recovery Plan in VMware Site Recovery Manager 8.1

A quick note on the placeholder VM that exists in the recovery site.  Note how it is managed by Site Recovery Manager.

Placeholder-VM-created-in-the-recovery-site-from-the-protected-site Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Placeholder VM created in the recovery site from the protected site

The Test dialog box opens for the Recovery Plan you are testing.  One the Confirmation options screen, you can choose to Replicate recent changes to recovery site.  A point here to think about is that in a true DR scenario, most likely it will be unplanned.  In the event of an unplanned failure, you most likely won’t have the chance to replicate recent changes to the recovery site.  However, in the event of a planned outage or maintenance period, you could replicate recent changes to ensure the recovery site has the most recent data.

Choosing-whether-or-not-to-replicate-most-changes-from-the-protected-site-to-recovery-site Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Choosing whether or not to replicate most changes from the protected site to recovery site

The simple two-step test configuration is complete.  Simply click Finish to begin the test.

Testing-a-Disaster-Recovery-Plan-with-VMware-Site-Recovery-Manager-SRM-8.1 Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

You will note a Test in progress Status.

Disaster-Recovery-Test-begins-and-is-in-progress-in-VMware-Site-Recovery-Manager-SRM-8.1 Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Disaster Recovery Test begins and is in progress in VMware Site Recovery Manager SRM 8.1

Note in the vCenter Server tasks that you will see activity of the virtual machine on both sides being reconfigured and you will see the recovery site VM get powered on for the test.

Recovery-virtual-machine-is-powered-on-and-tested Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Recovery virtual machine is powered on and tested

For my simple test, I had a Linux VM that did not have VM Tools installed.  As you can see, the test operation checks to see if VMware Tools is available.  You will see these and other errors encountered in the Site Recovery Manager Dashboard.

VMware-Tools-Error-during-the-VMware-Site-Recovery-Manager-test-recovery-plan-process Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

VMware Tools Error during the VMware Site Recovery Manager test recovery plan process

The Test completes.

Test-of-the-VMware-Site-Recovery-Manager-SRM-8.1-recovery-completes Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Test of the VMware Site Recovery Manager SRM 8.1 recovery completes

After the test is complete, we can run the cleanup process to bring the environment back to the state before testing began.  The plan is brought back to a “Ready” state by running a cleanup operation. The cleanup operation must be performed before you can run another test or perform a failover.

What exactly does the Cleanup operation do?

  1. Powers off the recovered virtual machines.

  2. Replaces recovered virtual machines with placeholders, preserving their identity and configuration information.

  3. Cleans up replicated storage snapshots that the recovered virtual machines used during the test.

Running-cleanup-job-after-testing-a-recovery-plan-in-VMware-Site-Recovery-Manager-8.1 Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Running cleanup job after testing a recovery plan in VMware Site Recovery Manager 8.1

This launches the Cleanup wizard.  Again, this is a simple two-step process.

Running-the-cleanup-operation-in-VMware-Site-Recovery-Manager-SRM-8.1 Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Running the cleanup operation in VMware Site Recovery Manager SRM 8.1

Click Finish to finalize the cleanup operation.

Ready-to-complete-the-cleanup-operation-in-VMware-Site-Recovery-Manager-SRM-8.1 Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Ready to complete the cleanup operation in VMware Site Recovery Manager SRM 8.1

Again, you will see the vCenter tasks involved in the process.

Cleanup-operation-deleting-networking-and-reconfiguring-the-recovery-site-virtual-machine Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1

Cleanup operation deleting networking and reconfiguring the recovery site virtual machine

Takeaways

Testing a Disaster Recovery Plan with VMware Site Recovery Manager SRM 8.1 is a fairly straightforward process and allows making sure the steps involved with a recovery operation with a site failover work as expected.  The great thing about the test recovery is that it performs basically all the same steps with a few differences to ensure production workloads are not disrupted while simply testing.