Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Backing up Hyper-V VM's that live on PetaSAN iSCSI Cluster Shared Volumes

Hi,

Recently setup a 3 node PetaSAN cluster, all 100% SSD and working great. Followed the guide on how to set it up with some Hyper-V servers with iSCSI multipathing. Everything is working great -- except one thing, backups! It seems like when our backup tool (Arcserve UDP, which just uses the standard MS Software VSS Writer) goes to snapshot one of the VM's running on the cluster shared volume, it just errors out. Been doing a lot of digging on this, and the big thing that stands out to me is the PetaSAN RBD Multi-Path disk device listed under disk management on each of the hyper-v nodes, doesn't actually have the "Shadow Copies" tab.

Does this mean PetaSAN iSCSI storage is incompatible with using MS VSS for backing things up?

 

Also we did create a 64TB volume -- we are aware of the 64TB volume size limit for VSS.

Happy things are working well. We currently do not support VSS, this is something we have in our roadmap,  it will internally use Ceph rbd snapshots, but we are not there yet. We are also planning for built in backup features.

Quote from admin on January 28, 2022, 9:45 am

Happy things are working well. We currently do not support VSS, this is something we have in our roadmap,  it will internally use Ceph rbd snapshots, but we are not there yet. We are also planning for built in backup features.

So just to confirm, tools such as Veeam, Arcserve, essentially anything that can perform agentless backup of Hyper-V VM's will not work if PetaSAN iSCSI is used as the storage backend for Cluster Shared Volumes, at least at present?

My understanding is it will work but not optimized. VSS is a feature implemented in software at the Windows OS level, it does all the snapshot management in software, PetaSAN disks will be like physical disks and unaware of such snapshots. If the disks ate being written to at a high rate while backup is happening,  there will be a lot of data movements which is not efficient. By implementing a "VSS Hardware Provider", the storage can offload this snapshot management from the OS to the storage "hardware" which is much more optimised.  Again this is my understanding as we have not yet tackled doing this.

Quote from admin on January 28, 2022, 6:59 pm

My understanding is it will work but not optimized. VSS is a feature implemented in software at the Windows OS/filesystem level, it does all the snapshot management in software, PetaSAN disks will be like physical disks and unaware of such snapshots. If the disks ate being written to at a high rate while backup is happening,  there will be a lot of data movements which is not efficient. By implementing a "VSS Hardware Provider", the storage can offload this snapshot management from the OS to the storage "hardware" which is much more optimised.  Again this is my understanding as we have not yet tackled doing this.

I see, I just wanted to be sure! Because I was thinking there should be no excuse for the software VSS writer to not work, but in our case it isn't. We may have to get with the backup vendor and see if maybe it's something on their end. The data movements are understandable, we actually planned for this. Although I do hope you guys figure out the hardware provider, that would be sweet!

Whole reason I made this post is just to be certain it wasn't PetaSAN, I'm not completely familiar with the ins-and-outs of VSS 🙂 Thanks for your help!

For anyone else who comes across this thread, we are using Server 2019, we keep encountering volsnap errors with ID 5 and failover cluster errors with ID 5217 when we try to backup VM's agentless.

volsnap error: The shadow copy of volume  could not be created due to insufficient non-paged memory pool for a bitmap structure.
failover cluster error: Software snapshot creation on Cluster Shared Volume(s) ('\\?\Volume{390546eb-8d1a-4d7b-a58b-0c78e92fc493}\') with snapshot set id '66f42f94-b9ce-4c30-9ad4-0623b03180b8' failed with error 'HrError(0x80042306)(2147754758)'. Please check the state of the CSV resources and the system events of the resource owner nodes.

We tried a bunch of different things, adding the registry keys listed here, trying different windows updates, trying to use vssadmin to change the snapshot volume for the clustered volume(which I couldn't actually successfully do, as it seems you can't use a disk GUID with the command for this), also the unpaged memory pool should be plenty free as in testing we have 1 little VM, and the host was using about 20 out of 256GB of RAM... 2.2 of which was unpaged. So a little confused there. Nothing has worked yet of course.

Will post a solution here if I come across one in case anyone else runs into this.

After a week of trying to find solutions for this.... we found out microsoft's wording of saying "Volumes Larger Than 64TB" is wrong.

If you create a 64TB disk, then backing up your VM's will not work. I created a 63TB disk using the instructions here on this forum to make a custom disk size: http://www.petasan.org/forums/?view=thread&id=774

I setup all the iSCSI MPIO madness (we maxed out the paths at 8 because this cluster is going to grow into at least 10 nodes soon, so takes a little while haha) then made a clustered shared volume out of the new 63TB disk... moved a VM over and tried a backup and BAM! working perfectly now.

So lesson learned, don't trust how Microsoft words things 🙂 haha! Thanks for the help