Posted On:

Last Updated:

Changed Block Tracking bug in ESXi 8.0 U2 – fixed


This is a guest post from Maximilian Maier

Last Thursday VMware published Update 2b for ESXi 8.0 which seems to include the fix for the Change Block Tracking bug. This critical bug has been discovered by Veeam in December and, when affected, could result in a data loss scenario. Make sure that you understand all the details, install the hot fix and repair affects VMs.

At the bottom of this blog post I have included the links to VMware’s KB article, the release notes and the discussion in the Veeam R&D forums.

What’s Changed Block Tracking?

Changed Block Tracking or CBT is a technology which keeps track of all changes on a volume or disk at block-level. Backup solutions, for example, can benefit from CBT during incremental runs as they don’t need to scan a whole disk for changes and instead only read blocks which were tracked as changed. This makes backups much faster and is essential for today’s data protection.

In virtualization CBT often is a hypervisor features. So both Microsoft’s Hyper-V (since 2016) and VMware’s ESXi provide CBT for virtual disks.
An Operating System itself doesn’t offer any CBT functionality and this is why Veeam Agent for Windows comes with an optional CBT driver from Veeam.

CBT can also be used for other tasks but let’s keep it focused on the backup part.

How critical are CBT bugs?

As backup vendors rely on CBT providing correct information about changes in backed up systems, bugs can have a very critical impact. In the current case for affected VMs not all changes get tracked and CBT returns incomplete information. This means a backup software doesn’t get all changed blocks and the resulting backups may be incomplete or even corrupted. As this is on block-level, the results can be catastrophic. Just think about a large database where some parts are missing; the whole database will be corrupted and can’t be restored (successfully).

Therefore any bugs in CBT can be seen as very critical and should be taken seriously.
Fortunately bugs are rather rare, but they did also happen in the past.

Is installing the fix enough to remediate this bug?

Unfortunately the answer is no.
While the bug fixes the cause, affected VMs will already have incomplete CBT information and therefore it’s backups, both existing and future ones, may be corrupted.

There’s no easy way of finding out if a VM is affected or not. Currently only resized disks of online VMs (hot-extended) on vSphere 8.0 U2 might have corrupted CBT data. But besides searching through vSphere logs which may also be already purged, or using Veeam ONE for change tracking in vSphere, there’s no way of finding out.

For affected VMs, CBT should be reset and a fresh full backup should be created. All previous backups might be affected and should be checked/tested.
For all other systems where you’re unsure, at least a CBT reset should be done and, if you want to be on the safe side, also create full backups.

What else can be done?

While CBT bugs can cause corrupted backups, there are many other bad things which can affect your backups; software or hardware bugs, OS/application updates, user errors,…
So in any case, you should regularly test your backups. Fortunately Veeam has a very nice feature called SureBackup. With SureBackup you can automatically test all your backups in an isolated environment and be sure that not only the backup files are OK but also the system is bootable and applications start successfully.

Staying with Veeam. Each time you create a full backup, Veeam resets CBT information of VMware VMs by default. This had been implemented as a countermeasure after a previous CBT bug.

Summary

So, to summarize all the details from above.

CBT is an essential functionality for incremental backups. Under normal circumstances it can be trusted and should be in use for incremental backups.

CBT of affected VMs needs to be reseted and a new full backup should be created afterwards. All previous backups can be corrupted.

Test your backups regularly to be sure they can be recovered successfully. If you’re using Veeam then setup scheduled SureBackup jobs.

If you should have any questions then feel free to ask them here, in the corresponding Veeam R&D forums topic or contact your backup vendor for details.

Links

KB article from VMware: https://kb.vmware.com/s/article/95940

ESXi 8.0 U2b release notes: https://docs.vmware.com/en/VMware-vSphere/8.0/rn/vsphere-esxi-80u2b-release-notes/index.html

Details and Discussion in the Veeam R&D forums: https://forums.veeam.com/post505571.html#p505571

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.