For most virtualization admins who have spent any time in the trenches with administering virtual environments, snapshots are just a bad thing for most use cases, especially production workloads. We have all most likely been in situations where snapshots caused major issues, either in performance or the underlying virtual disk infrastructure. The rule of thumb for years has been snapshots are not supported in production. With Windows Server 2016 Hyper-V, a new type of snapshot was introduced which mixes things up a bit when it comes to the idea of snapshots in production. We use the term “snapshot” here loosely as this is the term that most are familiar with conceptually from the VMware side of things. In Hyper-V “checkpoints” are the snapshots of Microsoft ideology. Let’s take a look at how we can use snapshots in production with Hyper-V Production Checkpoints.
Use Snapshots in Production with Hyper-V Production Checkpoints
To give a little background on checkpoints/snapshots, prior to Windows Server 2016, these were created a simple standard checkpoints. When a standard checkpoint is created, the guest operating system has no knowledge of the checkpoint being created. In other words, there is no interaction with the Volume Shadow Copy service or VSS to enable the guest operating system to be aware of a point in time when a checkpoint is being taken. When a standard snapshot is rolled back, the virtual machine is shifted back in time without any reference point that it is aware of from a VSS standpoint. This can lead to very bad things happening especially in production workloads since standard snapshots are not application consistent.
With backup software we are familiar with application-aware processing. When backup jobs are “application-aware” they interact with VSS to flush in memory and pending disk I/O operatings so that the VSS operation captures all application specific transactions consistently. With applications such as databases that require transactional consistency, it is extremely important that VSS and specifically VSS writers are enlisted to work with the application to flush data properly. This results in application consistent backups.
Hyper-V Production Checkpoints borrow from this proven backup technology. When a production checkpoint is created, the checkpoint creation process interacts with VSS in the guest operating system running in the Hyper-V environment. In Linux, a file system freeze is instituted. Since a production checkpoint uses VSS to create the checkpoint, the operation is the same to the virtual machine as a VSS aware backup being taken. Application specific data living in memory is flushed to disk. This means the production checkpoint is a snapshot of the virtual machine that is in an application consistent state. Hence, we can see why now, the checkpoint is a “production checkpoint”.
Configuring Production Hyper-V Checkpoints
With Windows Server 2016, as mentioned, the options to enable production checkpoints are found when looking at the settings of a Hyper-V virtual machine. Right-click your Hyper-V virtual machine and choose Settings.
Once we get to the Hyper-V virtual machine settings, we then navigate to the Management section and then Checkpoints Production. Notice the options we have under the Checkpoint Type section.
- Enable Checkpoints checkbox – We can either check or uncheck this option to enable or disable checkpoints altogether. By default, checkpoints are enabled.
- Production Checkpoints – This uses backup technology in the guest operating system to create data-consistent checkpoints that don’t include memory
- There is a failback checkbox – “create standard checkpoints if it’s not possible to create a production checkpoint” that is checked to failback to the standard checkpoint.
- Standard Checkpoints – These are standard snapshots that capture data and memory in an exact state. They are application consistent since they capture both memory and data in an exact state.
Creating a Hyper-V Production Checkpoint
Creating a production checkpoint once you have the options configured as shown above is as simple as right-clicking on the virtual machine and creating a checkpoint.
Hyper-V verifies with the resulting pop-up that a “Production checkpoint” was created. As shown in the message – the production checkpoint uses “backup technology” in the guest operating system (aka VSS) to create the consistent backup/checkpoint.
Applying a Hyper-V Production Checkpoint
Applying a Hyper-V Production Checkpoint is equally as simple. You right-click on the checkpoint and click Apply.
We are given the option to create another checkpoint to capture the current state, or simply Apply and lose current state and revert back to the production checkpoint.
After applying, we see the Applying Checkpoint status appear.
A difference to note when applying production checkpoints is the virtual machine will power off. This denotes the production checkpoint method truly being used. Additionally, as we recall, memory is not captured in a production checkpoint and is another reason we have to restart the VM. Also, much like restoring a server using a VSS aware backup, the server is rebooted. Here it simply powers off and when you power back on, you are sitting at the application consistent state of the production checkpoint.
There are still times when you may want to use a “standard checkpoint” in certain use cases. Examples would be in a development environment, if you want the ability to revert to the “exact state” of a virtual machine, including memory, then you would want to utilize a standard checkpoint. We can now use snapshots in production with Hyper-V Production Checkpoints. The new production checkpoint utilizes “backup technology” and is supported in production environments which is definitely a cool feature for production workloads. As with any tool, a word of caution. Just because we have the ability to use something, doesn’t mean we need to or should. In my humble honest opinion, you should still be very judicious in using any type of snapshot/checkpoint in production as it adds to the underlying complexity of the virtual infrastructure and gives more entry points for something to go wrong.