Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

iSCSI multi-client access to disk

Pages: 1 2 3

Hi there,

Trying to configure a Hyper-V cluster, but having trouble where it seems only one host has access to the iSCSI disk at any given time. Is there a setting I need to change on PetaSAN somewhere to allow multiple hosts to access each iSCSI disk/lun at once? I know on our old EQL SANs, this was a checkbox somewhere.

Thanks!

This appears to be an issue with perisitent reservations to the iSCSI LUN. Here's the error Microsoft presents:

Failure issuing call to Persistent Reservation REGISTER AND IGNORE EXISTING on Test Disk 1 from node BD-E7k-HV-CN1.testing.net when the disk has no existing registration. It is expected to succeed. The device is not ready.

Test Disk 1 does not provide Persistent Reservations support for the mechanisms used by failover clusters. Some storage devices require specific firmware versions or settings to function properly with failover clusters. Please contact your storage administrator or storage vendor to check the configuration of the storage to allow it to function properly with failover clusters

Not seeing anything in PetaSAN logs. Here's some interesting output from dmesg:

[14352.489656] PR register with aptpl unset. Treating as aptpl=1
[14352.492348] PR register with aptpl unset. Treating as aptpl=1
[14352.495207] PR register with aptpl unset. Treating as aptpl=1
[14498.661779] PR register with aptpl unset. Treating as aptpl=1
[14498.661783] PR info not present, initializing
[14498.668197] PR register with aptpl unset. Treating as aptpl=1
[14498.686007] PR register with aptpl unset. Treating as aptpl=1
[14499.411722] PR register with aptpl unset. Treating as aptpl=1
[14499.431857] PR register with aptpl unset. Treating as aptpl=1
[14499.435080] PR register with aptpl unset. Treating as aptpl=1
[14499.446180] SPC-3 PR: Attempted RESERVE from iqn.2018-03.net.testing.internal:bd-e7k-hv-cn1,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x1 while reservation already held by iqn.2018-03.net.testing.internal:bd-e7k-hv-cn2,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x2, returning RESERVATION_CONFLICT
[14499.446456] SPC-3 PR: Attempted RESERVE from iqn.2018-03.net.testing.internal:bd-e7k-hv-cn1,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x1 while reservation already held by iqn.2018-03.net.testing.internal:bd-e7k-hv-cn2,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x2, returning RESERVATION_CONFLICT
[14499.474500] PR register with aptpl unset. Treating as aptpl=1
[14499.482617] SPC-3 PR: Attempted RESERVE from iqn.2018-03.net.testing.internal:bd-e7k-hv-cn2,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x1 while reservation already held by iqn.2018-03.net.testing.internal:bd-e7k-hv-cn1,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x1, returning RESERVATION_CONFLICT
[14499.490455] PR register with aptpl unset. Treating as aptpl=1

Let me know what other info might help here.

We do support Persistent Reservations and do pass both the Win 2012 R2 and Win 2016 storage and pr tests.

From the error it appears PetaSAN thinks there is an existing reservation on the disk but the Windows cluster does not. Maybe it could have been due to the Windows cluster being completely shutdown and restarted, though in our tests if we shutdown the entire Windows cluster in will "pre-empt" any existing reservation and starts over without issue.

It is possible to manually clear the reservation yourself as per   https://docs.microsoft.com/en-us/powershell/module/failoverclusters/clear-clusterdiskreservation?view=win10-ps

I would be interested to know what steps you did to get this error.

I attempted to clear the reservation using the clear-clusterdiskreservation command, but this did not help.

I have not created the cluster yet. This is just when trying to "Validate Configuration" prior to actually creating the cluster.

It seems like PetaSAN is not clearing the reservation quickly enough for the test to succeed. If you look at the errors, the third one states the reservation is held by a different node than the first two errors, and this changes when i rerun the validation test.

I went ahead and tried creating the cluster anyway. This did not result in good things. When trying to failover the PetaSAN iSCSI disk to another cluster node, the disk role will not come online and is not accessible. Also, if you browse to the disk via Explorer, none of the data that was on the disk from one node is visible from the other. Tested, and this works just fine with other iSCSI disks on the same hyper-v cluster (Equallogic, FreeNAS).

Tried destroying the cluster and starting over with the same results.

 

  • Are you using Win2012 R2 or Win2016 ? how many nodes ?
  • If you create all new iSCSI disks and run the validation tests on these new disks, does it fail. If so can you show a screenshot of the Windows report detail. Also show the dmesg logs on the PetaSAN iSCSI target nodes.
  • Can you install this kernel on the nodes and see if it solves this:
    https://drive.google.com/open?id=1LxizxXz4WKsIkcXUigQ-kv9aUVYo2oVZ
    dpkg -i linux-image-4.4.38-petasan_amd64.deb
    reboot

Thanks for the response, sorry for delay.

This is with Server 2016, and I am now just using two nodes to test with. Both fresh installs of Windows, with all cumulative updates installed.

Tried deleting all PetaSAN iSCSI disks first, and creating two new. Same issue, here's the screenshot of the error:

Windows Failover Cluster iSCSI Testing Failure

And here's some output from dmesg on one of the nodes:

[ 478.200940] PR register with aptpl unset. Treating as aptpl=1
[ 478.200980] SPC-3 PR: Attempted RESERVE from iqn.2018-03.net.testing.internal:bd-e7k-hv-cn2,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x2 while reservation already held by iqn.2018-03.net.testing.internal:bd-e7k-hv-cn1,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x2, returning RESERVATION_CONFLICT
[ 478.205458] PR register with aptpl unset. Treating as aptpl=1
[ 478.210698] PR register with aptpl unset. Treating as aptpl=1
[ 478.220384] PR register with aptpl unset. Treating as aptpl=1
[ 478.234342] PR register with aptpl unset. Treating as aptpl=1
[ 478.234493] PR info too large for encoding: 8673
[ 478.234494] failed to encode PR xattr: -22
[ 478.234494] atomic PR info update failed: -22
[ 478.786503] PR register with aptpl unset. Treating as aptpl=1
[ 478.803247] PR register with aptpl unset. Treating as aptpl=1
[ 478.817486] PR register with aptpl unset. Treating as aptpl=1
[ 478.827918] PR register with aptpl unset. Treating as aptpl=1
[ 478.828083] PR info too large for encoding: 8673
[ 478.828084] failed to encode PR xattr: -22
[ 478.828085] atomic PR info update failed: -22
[ 479.317857] PR register with aptpl unset. Treating as aptpl=1
[ 479.327779] PR register with aptpl unset. Treating as aptpl=1
[ 479.338941] PR register with aptpl unset. Treating as aptpl=1
[ 479.349230] PR register with aptpl unset. Treating as aptpl=1
[ 479.349478] PR info too large for encoding: 8673
[ 479.349480] failed to encode PR xattr: -22
[ 479.349480] atomic PR info update failed: -22
[ 479.849099] PR register with aptpl unset. Treating as aptpl=1
[ 479.859554] PR register with aptpl unset. Treating as aptpl=1
[ 479.870360] PR register with aptpl unset. Treating as aptpl=1
[ 479.880972] PR register with aptpl unset. Treating as aptpl=1
[ 479.881165] PR info too large for encoding: 8673
[ 479.881166] failed to encode PR xattr: -22
[ 479.881166] atomic PR info update failed: -22
[ 480.380570] PR register with aptpl unset. Treating as aptpl=1
[ 480.390800] PR register with aptpl unset. Treating as aptpl=1
[ 480.400858] PR register with aptpl unset. Treating as aptpl=1
[ 480.410973] PR register with aptpl unset. Treating as aptpl=1
[ 480.411120] PR info too large for encoding: 8673
[ 480.411121] failed to encode PR xattr: -22
[ 480.411122] atomic PR info update failed: -22
[ 480.431682] PR register with aptpl unset. Treating as aptpl=1
[ 480.441739] PR register with aptpl unset. Treating as aptpl=1
[ 480.450811] PR register with aptpl unset. Treating as aptpl=1
[ 480.455146] PR register with aptpl unset. Treating as aptpl=1
[ 480.464050] PR register with aptpl unset. Treating as aptpl=1
[ 480.500401] PR register with aptpl unset. Treating as aptpl=1
[ 480.508086] PR register with aptpl unset. Treating as aptpl=1
[ 480.512111] PR register with aptpl unset. Treating as aptpl=1
[ 480.517052] PR register with aptpl unset. Treating as aptpl=1
[ 480.526109] PR register with aptpl unset. Treating as aptpl=1
[ 480.541659] PR register with aptpl unset. Treating as aptpl=1
[ 480.553608] PR register with aptpl unset. Treating as aptpl=1
[ 480.579284] PR register with aptpl unset. Treating as aptpl=1
[ 480.592389] SPC-3 PR: Attempted RESERVE from iqn.2018-03.net.testing.internal:bd-e7k-hv-cn2,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x2 while reservation already held by iqn.2018-03.net.testing.internal:bd-e7k-hv-cn1,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x2, returning RESERVATION_CONFLICT
[ 480.622925] PR register with aptpl unset. Treating as aptpl=1
[ 480.636298] SPC-3 PR: Attempted RESERVE from iqn.2018-03.net.testing.internal:bd-e7k-hv-cn1,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x2 while reservation already held by iqn.2018-03.net.testing.internal:bd-e7k-hv-cn2,i,0x3430303030313337,iqn.2018-05.net.testing.internal:00001,t,0x2, returning RESERVATION_CONFLICT
[ 480.640564] PR register with aptpl unset. Treating as aptpl=1
[ 480.693412] PR register with aptpl unset. Treating as aptpl=1
[ 480.727340] PR register with aptpl unset. Treating as aptpl=1
[ 480.739020] PR register with aptpl unset. Treating as aptpl=1
[ 480.751663] PR register with aptpl unset. Treating as aptpl=1
[ 480.772253] PR register with aptpl unset. Treating as aptpl=1
[ 480.807250] PR register with aptpl unset. Treating as aptpl=1
[ 480.820105] PR register with aptpl unset. Treating as aptpl=1
[ 480.832997] PR register with aptpl unset. Treating as aptpl=1
[ 480.855080] PR register with aptpl unset. Treating as aptpl=1
[ 480.886481] PR register with aptpl unset. Treating as aptpl=1
[ 480.898644] PR register with aptpl unset. Treating as aptpl=1

There is quite a bit more of the same content in dmesg.

The issue persists with the 4.4.38 kernel you posted. (The dmesg above is from that kernel)

Thanks very much for the error detail.

We will look into this, we do tests a lot with 2012 and 2016 but maybe it is one of the recent updates that causes this.  If we cannot reproduce it, i will send you a newer kernel with lots of debug logs which will help us solve this, it will take a couple of days and i will get back to you.

Again thanks for all the effort in this. 🙂

 

Thanks, look forward to seeing what you find!

Just checking in to see if you were able to reproduce this? Thanks!

We did several test but we could not reproduce it. We are using Windows Server 2016 version  1607 build 14393.0 release date 10/12/2016 + did all the updates.We are testing by running the Windows cluster validation suite of tests. We also did various configurations  2 nodes to 2 nodes, 2 nodes to 1 node., sharing paths or using different paths...etc.

Can you check the build and version number of Windows and let us know.  Also can you give more detail on your configuration is it 2 hyperv nodes talking to 2 (same/different) paths each on different node ? Also if possible can you give us the node names and initiator iqn names so we can be exactly like your environment.

Currently we are building a new kernel with extra logging for you to test, i will post you the link when done.

Pages: 1 2 3