iSCSI multi-client access to disk
protocol6v
85 Posts
May 29, 2018, 12:35 pmQuote from protocol6v on May 29, 2018, 12:35 pmI have Windows Server 2016 Datacenter, build 14393.2273. It is two Supermicro AMD Epyc 7351 nodes. Trying to free up a couple of Intel nodes to do the same tests, but not sure when I'll have them available.
There are two PetaSAN iSCSI disks connected to each node. LUN 0001 is a 20GB "quorum" disk with 2 paths. LUN 0002 is a 30TB vm data disk with 8 paths.
HV node IQNs are: iqn.2018-03.net.testing.internal:bd-e7k-hv-cn1 iqn.2018-03.net.testing.internal:bd-e7k-hv-cn2
Using IQN ACL on the petasan disks for security. When configuring the iSCSI disk on the HV nodes, I enabled multipath and used "round robin with subset", and set all paths to active.
The iSCSI paths are automatically assigned amongst 4 petasan nodes. Will try moving all paths to one PS node and see if that helps.
I have Windows Server 2016 Datacenter, build 14393.2273. It is two Supermicro AMD Epyc 7351 nodes. Trying to free up a couple of Intel nodes to do the same tests, but not sure when I'll have them available.
There are two PetaSAN iSCSI disks connected to each node. LUN 0001 is a 20GB "quorum" disk with 2 paths. LUN 0002 is a 30TB vm data disk with 8 paths.
HV node IQNs are: iqn.2018-03.net.testing.internal:bd-e7k-hv-cn1 iqn.2018-03.net.testing.internal:bd-e7k-hv-cn2
Using IQN ACL on the petasan disks for security. When configuring the iSCSI disk on the HV nodes, I enabled multipath and used "round robin with subset", and set all paths to active.
The iSCSI paths are automatically assigned amongst 4 petasan nodes. Will try moving all paths to one PS node and see if that helps.
protocol6v
85 Posts
May 29, 2018, 12:56 pmQuote from protocol6v on May 29, 2018, 12:56 pmMoving all paths to one PS node did not help.
Moving all paths to one PS node did not help.
admin
2,930 Posts
May 29, 2018, 1:47 pmQuote from admin on May 29, 2018, 1:47 pmThanks for the info, we will try it.
Can you try the 30TB disk with 2 paths rather than 8 : so the 2 hv nodes each connect to the 2 paths.
fyi one of the earlier kernel logs you posted showed a pr decode issue due to insufficient buffer size. "PR info too large for encoding: 8673" this may be a clue that we need to increase it.
Thanks for the info, we will try it.
Can you try the 30TB disk with 2 paths rather than 8 : so the 2 hv nodes each connect to the 2 paths.
fyi one of the earlier kernel logs you posted showed a pr decode issue due to insufficient buffer size. "PR info too large for encoding: 8673" this may be a clue that we need to increase it.
Last edited on May 29, 2018, 1:51 pm by admin · #13
protocol6v
85 Posts
May 29, 2018, 2:43 pmQuote from protocol6v on May 29, 2018, 2:43 pmTested with 2 paths only, validation succeeded.
Where can I go from here to determine the issue with using more paths?
Tested with 2 paths only, validation succeeded.
Where can I go from here to determine the issue with using more paths?
admin
2,930 Posts
May 29, 2018, 3:13 pmQuote from admin on May 29, 2018, 3:13 pmThis is good news, at least there is no Windows version differences between us, we were not testing persistent reservations with 8 paths, we will do that tomorrow, it will probably fail as yours did.
If you can/want you can test it with 4 paths. this will help.
Ofcourse we will be doing this as well since we now know it depends on the path count. What we suspect now is that we currently allocate an 8K buffer to hold the persistent reservation data, with 8 paths this may not be enough and may need to be increased. But we need to first reproduce this.
This is good news, at least there is no Windows version differences between us, we were not testing persistent reservations with 8 paths, we will do that tomorrow, it will probably fail as yours did.
If you can/want you can test it with 4 paths. this will help.
Ofcourse we will be doing this as well since we now know it depends on the path count. What we suspect now is that we currently allocate an 8K buffer to hold the persistent reservation data, with 8 paths this may not be enough and may need to be increased. But we need to first reproduce this.
admin
2,930 Posts
May 30, 2018, 2:33 pmQuote from admin on May 30, 2018, 2:33 pmWe can reproduce it with 8 paths per disk. We are working now to solve it. For now you can use 4 paths per disk. I will update you soon.
We can reproduce it with 8 paths per disk. We are working now to solve it. For now you can use 4 paths per disk. I will update you soon.
protocol6v
85 Posts
May 31, 2018, 1:23 pmQuote from protocol6v on May 31, 2018, 1:23 pmIf i set a disk up with 4 paths now, how do I add paths later? I don't seem to be able to edit that after creation.
Nevermind, found out if you detach the disk, you can then add paths on re-attach.
If i set a disk up with 4 paths now, how do I add paths later? I don't seem to be able to edit that after creation.
Nevermind, found out if you detach the disk, you can then add paths on re-attach.
Last edited on May 31, 2018, 1:26 pm by protocol6v · #17
admin
2,930 Posts
June 4, 2018, 11:37 amQuote from admin on June 4, 2018, 11:37 amWe fixed the Windows 2016 Persistent Reservations limits in new kernel :
linux-image-4.4.126-03-petasan _amd64.deb
It will now allow up to 160 different client connections per disk.
You can download it from:
You should install it on all (iSCSI Server) nodes via:
dpkg -i linux-image-4.4.126-03-petasan _amd64.deb
reboot
We fixed the Windows 2016 Persistent Reservations limits in new kernel :
linux-image-4.4.126-03-petasan _amd64.deb
It will now allow up to 160 different client connections per disk.
You can download it from:
You should install it on all (iSCSI Server) nodes via:
dpkg -i linux-image-4.4.126-03-petasan _amd64.deb
reboot
Last edited on June 4, 2018, 11:41 am by admin · #18
protocol6v
85 Posts
June 4, 2018, 11:39 amQuote from protocol6v on June 4, 2018, 11:39 amAwesome! I'll give this a try over the next few days. Thanks!
Awesome! I'll give this a try over the next few days. Thanks!
protocol6v
85 Posts
June 6, 2018, 4:24 pmQuote from protocol6v on June 6, 2018, 4:24 pmI've been testing this new kernel, and all has been well until i simulated a disk failure by removing one of the disks.
I physically removed the disk, then went to the node disk list and removed the down OSD. Waited for the cluster to finish rebalancing, then re-plugged the disk. Went to the disk list again, and added the OSD. The system goes through the "adding" procedure, but then the OSD stays "down", with no status indicator in the disk list. I have the option to remove the disk again (the button with the X in the disk list). It does have an OSD# associated, but will not come online.
Am I missing a step in replacing the disk?
EDIT: I found in other threads some commands to start the OSD, which resulted in this:
root@BD-Ceph-SD4:~# /usr/lib/ceph/ceph-osd-prestart.sh --cluster BD-Ceph-Cluster1 --id 37
root@BD-Ceph-SD4:~# /usr/bin/ceph-osd -f --cluster BD-Ceph-Cluster1 --id 37 --setuser ceph --setgroup ceph
2018-06-06 13:07:13.795924 7fe3017b6e00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/BD-Ceph-Cluster1-37: (5) Input/output error
I have since attempted to remove the disk, reformat it, re-add and start the OSD various ways such as "service start ceph-osd osd.37", "systemctl start ceph-osd@37" and other methods from ceph documentation, none of which will bring the OSD up.
Only thing i haven't tried is flat out rebooting the whole node, which in a production environment wouldn't be ideal so I'd like to figure this out properly.
Thanks!
I've been testing this new kernel, and all has been well until i simulated a disk failure by removing one of the disks.
I physically removed the disk, then went to the node disk list and removed the down OSD. Waited for the cluster to finish rebalancing, then re-plugged the disk. Went to the disk list again, and added the OSD. The system goes through the "adding" procedure, but then the OSD stays "down", with no status indicator in the disk list. I have the option to remove the disk again (the button with the X in the disk list). It does have an OSD# associated, but will not come online.
Am I missing a step in replacing the disk?
EDIT: I found in other threads some commands to start the OSD, which resulted in this:
root@BD-Ceph-SD4:~# /usr/lib/ceph/ceph-osd-prestart.sh --cluster BD-Ceph-Cluster1 --id 37
root@BD-Ceph-SD4:~# /usr/bin/ceph-osd -f --cluster BD-Ceph-Cluster1 --id 37 --setuser ceph --setgroup ceph
2018-06-06 13:07:13.795924 7fe3017b6e00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/BD-Ceph-Cluster1-37: (5) Input/output error
I have since attempted to remove the disk, reformat it, re-add and start the OSD various ways such as "service start ceph-osd osd.37", "systemctl start ceph-osd@37" and other methods from ceph documentation, none of which will bring the OSD up.
Only thing i haven't tried is flat out rebooting the whole node, which in a production environment wouldn't be ideal so I'd like to figure this out properly.
Thanks!
Last edited on June 6, 2018, 5:20 pm by protocol6v · #20
iSCSI multi-client access to disk
protocol6v
85 Posts
Quote from protocol6v on May 29, 2018, 12:35 pmI have Windows Server 2016 Datacenter, build 14393.2273. It is two Supermicro AMD Epyc 7351 nodes. Trying to free up a couple of Intel nodes to do the same tests, but not sure when I'll have them available.
There are two PetaSAN iSCSI disks connected to each node. LUN 0001 is a 20GB "quorum" disk with 2 paths. LUN 0002 is a 30TB vm data disk with 8 paths.
HV node IQNs are: iqn.2018-03.net.testing.internal:bd-e7k-hv-cn1 iqn.2018-03.net.testing.internal:bd-e7k-hv-cn2
Using IQN ACL on the petasan disks for security. When configuring the iSCSI disk on the HV nodes, I enabled multipath and used "round robin with subset", and set all paths to active.
The iSCSI paths are automatically assigned amongst 4 petasan nodes. Will try moving all paths to one PS node and see if that helps.
I have Windows Server 2016 Datacenter, build 14393.2273. It is two Supermicro AMD Epyc 7351 nodes. Trying to free up a couple of Intel nodes to do the same tests, but not sure when I'll have them available.
There are two PetaSAN iSCSI disks connected to each node. LUN 0001 is a 20GB "quorum" disk with 2 paths. LUN 0002 is a 30TB vm data disk with 8 paths.
HV node IQNs are: iqn.2018-03.net.testing.internal:bd-e7k-hv-cn1 iqn.2018-03.net.testing.internal:bd-e7k-hv-cn2
Using IQN ACL on the petasan disks for security. When configuring the iSCSI disk on the HV nodes, I enabled multipath and used "round robin with subset", and set all paths to active.
The iSCSI paths are automatically assigned amongst 4 petasan nodes. Will try moving all paths to one PS node and see if that helps.
protocol6v
85 Posts
Quote from protocol6v on May 29, 2018, 12:56 pmMoving all paths to one PS node did not help.
Moving all paths to one PS node did not help.
admin
2,930 Posts
Quote from admin on May 29, 2018, 1:47 pmThanks for the info, we will try it.
Can you try the 30TB disk with 2 paths rather than 8 : so the 2 hv nodes each connect to the 2 paths.
fyi one of the earlier kernel logs you posted showed a pr decode issue due to insufficient buffer size. "PR info too large for encoding: 8673" this may be a clue that we need to increase it.
Thanks for the info, we will try it.
Can you try the 30TB disk with 2 paths rather than 8 : so the 2 hv nodes each connect to the 2 paths.
fyi one of the earlier kernel logs you posted showed a pr decode issue due to insufficient buffer size. "PR info too large for encoding: 8673" this may be a clue that we need to increase it.
protocol6v
85 Posts
Quote from protocol6v on May 29, 2018, 2:43 pmTested with 2 paths only, validation succeeded.
Where can I go from here to determine the issue with using more paths?
Tested with 2 paths only, validation succeeded.
Where can I go from here to determine the issue with using more paths?
admin
2,930 Posts
Quote from admin on May 29, 2018, 3:13 pmThis is good news, at least there is no Windows version differences between us, we were not testing persistent reservations with 8 paths, we will do that tomorrow, it will probably fail as yours did.
If you can/want you can test it with 4 paths. this will help.
Ofcourse we will be doing this as well since we now know it depends on the path count. What we suspect now is that we currently allocate an 8K buffer to hold the persistent reservation data, with 8 paths this may not be enough and may need to be increased. But we need to first reproduce this.
This is good news, at least there is no Windows version differences between us, we were not testing persistent reservations with 8 paths, we will do that tomorrow, it will probably fail as yours did.
If you can/want you can test it with 4 paths. this will help.
Ofcourse we will be doing this as well since we now know it depends on the path count. What we suspect now is that we currently allocate an 8K buffer to hold the persistent reservation data, with 8 paths this may not be enough and may need to be increased. But we need to first reproduce this.
admin
2,930 Posts
Quote from admin on May 30, 2018, 2:33 pmWe can reproduce it with 8 paths per disk. We are working now to solve it. For now you can use 4 paths per disk. I will update you soon.
We can reproduce it with 8 paths per disk. We are working now to solve it. For now you can use 4 paths per disk. I will update you soon.
protocol6v
85 Posts
Quote from protocol6v on May 31, 2018, 1:23 pmIf i set a disk up with 4 paths now, how do I add paths later? I don't seem to be able to edit that after creation.
Nevermind, found out if you detach the disk, you can then add paths on re-attach.
If i set a disk up with 4 paths now, how do I add paths later? I don't seem to be able to edit that after creation.
Nevermind, found out if you detach the disk, you can then add paths on re-attach.
admin
2,930 Posts
Quote from admin on June 4, 2018, 11:37 amWe fixed the Windows 2016 Persistent Reservations limits in new kernel :linux-image-4.4.126-03-petasan_amd64.deb It will now allow up to 160 different client connections per disk.You can download it from:You should install it on all (iSCSI Server) nodes via:dpkg -i linux-image-4.4.126-03-petasan_amd64.deb reboot
dpkg -i linux-image-4.4.126-03-petasan_amd64.deb reboot
protocol6v
85 Posts
Quote from protocol6v on June 4, 2018, 11:39 amAwesome! I'll give this a try over the next few days. Thanks!
Awesome! I'll give this a try over the next few days. Thanks!
protocol6v
85 Posts
Quote from protocol6v on June 6, 2018, 4:24 pmI've been testing this new kernel, and all has been well until i simulated a disk failure by removing one of the disks.
I physically removed the disk, then went to the node disk list and removed the down OSD. Waited for the cluster to finish rebalancing, then re-plugged the disk. Went to the disk list again, and added the OSD. The system goes through the "adding" procedure, but then the OSD stays "down", with no status indicator in the disk list. I have the option to remove the disk again (the button with the X in the disk list). It does have an OSD# associated, but will not come online.
Am I missing a step in replacing the disk?
EDIT: I found in other threads some commands to start the OSD, which resulted in this:
root@BD-Ceph-SD4:~# /usr/lib/ceph/ceph-osd-prestart.sh --cluster BD-Ceph-Cluster1 --id 37
root@BD-Ceph-SD4:~# /usr/bin/ceph-osd -f --cluster BD-Ceph-Cluster1 --id 37 --setuser ceph --setgroup ceph
2018-06-06 13:07:13.795924 7fe3017b6e00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/BD-Ceph-Cluster1-37: (5) Input/output errorI have since attempted to remove the disk, reformat it, re-add and start the OSD various ways such as "service start ceph-osd osd.37", "systemctl start ceph-osd@37" and other methods from ceph documentation, none of which will bring the OSD up.
Only thing i haven't tried is flat out rebooting the whole node, which in a production environment wouldn't be ideal so I'd like to figure this out properly.
Thanks!
I've been testing this new kernel, and all has been well until i simulated a disk failure by removing one of the disks.
I physically removed the disk, then went to the node disk list and removed the down OSD. Waited for the cluster to finish rebalancing, then re-plugged the disk. Went to the disk list again, and added the OSD. The system goes through the "adding" procedure, but then the OSD stays "down", with no status indicator in the disk list. I have the option to remove the disk again (the button with the X in the disk list). It does have an OSD# associated, but will not come online.
Am I missing a step in replacing the disk?
EDIT: I found in other threads some commands to start the OSD, which resulted in this:
root@BD-Ceph-SD4:~# /usr/lib/ceph/ceph-osd-prestart.sh --cluster BD-Ceph-Cluster1 --id 37
root@BD-Ceph-SD4:~# /usr/bin/ceph-osd -f --cluster BD-Ceph-Cluster1 --id 37 --setuser ceph --setgroup ceph
2018-06-06 13:07:13.795924 7fe3017b6e00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/BD-Ceph-Cluster1-37: (5) Input/output error
I have since attempted to remove the disk, reformat it, re-add and start the OSD various ways such as "service start ceph-osd osd.37", "systemctl start ceph-osd@37" and other methods from ceph documentation, none of which will bring the OSD up.
Only thing i haven't tried is flat out rebooting the whole node, which in a production environment wouldn't be ideal so I'd like to figure this out properly.
Thanks!