I've been investigating this one for some time but haven't been able to find a solution, hoping someone can point me in the right direction or see if they've gotten the same thing. I will also try to replicate the issue in my lab but so far that hasn't been doable.
In a production setup I have quite a few VMs that backup on a nightly basis to a very fast TrueNAS CORE server, the backups work well, but for some reason every once in a while I get the following errors on a VM backup and it reports as failed. It's almost always just 1 single VM, and after 3 or 4 additional backups the error will go away (despite retention being 7 and full backup interval being 30 days), also if I wipe the directory from the TrueNAS box for that VM, the next backup of it will succeed.
VHD Check Error
Parent VHD is Missing
Under the remote logs: VHD Check Error
EBUSY: resource busy or locked, unlink (VHD path)
It's worth noting that these errors always seem to come up when the TrueNAS machine is backing up it's directory to a cloud provider, which would make sense if TrueNAS was working with the VHD that XCP-ng was trying to access, however, TrueNAS is setup to snapshot first and my understanding of that is TrueNAS ONLY touches the snapshot for the backup process, so the file shouldn't be locked. I may be wrong, but long ago I did NOT have TrueNAS set to snapshot before cloud backups and I got this same EBUSY error ALL the time, then the issue went away (mostly) when enabling "snapshot first".
For reference, this reddit posts talks about this "snapshot first" feature: https://www.reddit.com/r/freenas/comments/gpz701/clarity_on_take_snapshot_for_cloud_sync_tasks/
In short, it appears TrueNAS should be snapshotting the directory, then backing up that snapshot, then removing it, so that "live" data isn't effected/being written to during the backup.
And my TrueNAS machine starts it's backups BEFORE XO does, so the snapshot shouldn't be happening at like the same time XO tries to access the directory. And the backup of this directory usually takes several hours, so the snapshot isn't being deleted while XO backs up either.
It's entirely possible this is more of a TrueNAS issue than an XCP-ng/XO thing, but wanted to post about it.
Anyone else seen this with large SMB VM backups?
I'll keep trying to replicate in my lab too and report back if I can duplicate the issue.
This isn't urgent (which is why I'm just posting and not filling out a ticket haha) since I have the same VMs backed up directly to a cloud provider, so isn't a data resilience issue.