ISCSI reconnect time
jansz0
2 Posts
April 26, 2021, 8:20 amQuote from jansz0 on April 26, 2021, 8:20 amHi All,
So I did a PetaSAN storage with 3 nodes and connected it to 2 W2019 server to be a file server. Setup the ISCSI initiator load balance policy to "Round Robin" as I read in the document. I tested the reconnection time sometimes it is more than 1 minute I think it is too much. Can anybody help me how to setup the initiators to give better reconnect time?
Thanks
jansz0
Hi All,
So I did a PetaSAN storage with 3 nodes and connected it to 2 W2019 server to be a file server. Setup the ISCSI initiator load balance policy to "Round Robin" as I read in the document. I tested the reconnection time sometimes it is more than 1 minute I think it is too much. Can anybody help me how to setup the initiators to give better reconnect time?
Thanks
jansz0
admin
2,930 Posts
April 26, 2021, 3:48 pmQuote from admin on April 26, 2021, 3:48 pmThe default values should give you 30 sec failover time.
You may get higher times if the hardware is very busy with cpu or disk % busy near 100% or maybe if you are runing virtual.
It also depends if you shutdown a storage node with OSDs or if you shutdown an iSCSI gateway that is not serving any OSD storage, the first case should give you 30 sec, second case around 20 sec
You can change these values from defaults, but not recommended
The first is affected by the flowing Ceph timeout settings
osd_heartbeat_grace = 20
osd_heartbeat_interval = 5
The second case is affected by the following settings (assune you use MPIO)
HKEY_LOCAL_MACHINE\\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance_Number>\Parameters
LinkDownTime = 15
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters :
TcpMaxDataRetransmissions
TCPInitialRtt
TcpMaxConnectRetransmissions
In our tests we could get MPIO to switch in 1 sec in case of pure gateway (no local storage) by setting all values to 1, this is for testing only and is not for production.
Again not recommended for production you can lower the Ceph OSD heartbeat settings but creating an overly sensitive system will make it unstable, and can respond to false positives, everyone wants super fast failover but stability is a key factor, the default values are put for a reason. Hope this helps 🙂
The default values should give you 30 sec failover time.
You may get higher times if the hardware is very busy with cpu or disk % busy near 100% or maybe if you are runing virtual.
It also depends if you shutdown a storage node with OSDs or if you shutdown an iSCSI gateway that is not serving any OSD storage, the first case should give you 30 sec, second case around 20 sec
You can change these values from defaults, but not recommended
The first is affected by the flowing Ceph timeout settings
osd_heartbeat_grace = 20
osd_heartbeat_interval = 5
The second case is affected by the following settings (assune you use MPIO)
HKEY_LOCAL_MACHINE\\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance_Number>\Parameters
LinkDownTime = 15
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters :
TcpMaxDataRetransmissions
TCPInitialRtt
TcpMaxConnectRetransmissions
In our tests we could get MPIO to switch in 1 sec in case of pure gateway (no local storage) by setting all values to 1, this is for testing only and is not for production.
Again not recommended for production you can lower the Ceph OSD heartbeat settings but creating an overly sensitive system will make it unstable, and can respond to false positives, everyone wants super fast failover but stability is a key factor, the default values are put for a reason. Hope this helps 🙂
Last edited on April 26, 2021, 3:50 pm by admin · #2
jansz0
2 Posts
April 27, 2021, 7:45 amQuote from jansz0 on April 27, 2021, 7:45 amFirst of all ty the answer.
It is just a sand box before we make a fileserver.
So in our case we've got 3 physical nodes with OSD's, ISCSI gw's and 2 virtual windows server. They've got a 10G bond communication. I made two ISCSI disk that I read in the document a 1GB and a 100GB with 4 ISCSI paths.
The test was the following: we are constantly writing to the shared storage while restarting one of the nodes. It needs 30-40 second.
So if I understood your answer, it is normal operation and i cannot get lower time.
First of all ty the answer.
It is just a sand box before we make a fileserver.
So in our case we've got 3 physical nodes with OSD's, ISCSI gw's and 2 virtual windows server. They've got a 10G bond communication. I made two ISCSI disk that I read in the document a 1GB and a 100GB with 4 ISCSI paths.
The test was the following: we are constantly writing to the shared storage while restarting one of the nodes. It needs 30-40 second.
So if I understood your answer, it is normal operation and i cannot get lower time.
ISCSI reconnect time
jansz0
2 Posts
Quote from jansz0 on April 26, 2021, 8:20 amHi All,
So I did a PetaSAN storage with 3 nodes and connected it to 2 W2019 server to be a file server. Setup the ISCSI initiator load balance policy to "Round Robin" as I read in the document. I tested the reconnection time sometimes it is more than 1 minute I think it is too much. Can anybody help me how to setup the initiators to give better reconnect time?
Thanks
jansz0
Hi All,
So I did a PetaSAN storage with 3 nodes and connected it to 2 W2019 server to be a file server. Setup the ISCSI initiator load balance policy to "Round Robin" as I read in the document. I tested the reconnection time sometimes it is more than 1 minute I think it is too much. Can anybody help me how to setup the initiators to give better reconnect time?
Thanks
jansz0
admin
2,930 Posts
Quote from admin on April 26, 2021, 3:48 pmThe default values should give you 30 sec failover time.
You may get higher times if the hardware is very busy with cpu or disk % busy near 100% or maybe if you are runing virtual.It also depends if you shutdown a storage node with OSDs or if you shutdown an iSCSI gateway that is not serving any OSD storage, the first case should give you 30 sec, second case around 20 sec
You can change these values from defaults, but not recommended
The first is affected by the flowing Ceph timeout settings
osd_heartbeat_grace = 20
osd_heartbeat_interval = 5The second case is affected by the following settings (assune you use MPIO)
HKEY_LOCAL_MACHINE\\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance_Number>\Parameters
LinkDownTime = 15
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters :
TcpMaxDataRetransmissions
TCPInitialRtt
TcpMaxConnectRetransmissionsIn our tests we could get MPIO to switch in 1 sec in case of pure gateway (no local storage) by setting all values to 1, this is for testing only and is not for production.
Again not recommended for production you can lower the Ceph OSD heartbeat settings but creating an overly sensitive system will make it unstable, and can respond to false positives, everyone wants super fast failover but stability is a key factor, the default values are put for a reason. Hope this helps 🙂
The default values should give you 30 sec failover time.
You may get higher times if the hardware is very busy with cpu or disk % busy near 100% or maybe if you are runing virtual.
It also depends if you shutdown a storage node with OSDs or if you shutdown an iSCSI gateway that is not serving any OSD storage, the first case should give you 30 sec, second case around 20 sec
You can change these values from defaults, but not recommended
The first is affected by the flowing Ceph timeout settings
osd_heartbeat_grace = 20
osd_heartbeat_interval = 5
The second case is affected by the following settings (assune you use MPIO)
HKEY_LOCAL_MACHINE\\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance_Number>\Parameters
LinkDownTime = 15
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters :
TcpMaxDataRetransmissions
TCPInitialRtt
TcpMaxConnectRetransmissions
In our tests we could get MPIO to switch in 1 sec in case of pure gateway (no local storage) by setting all values to 1, this is for testing only and is not for production.
Again not recommended for production you can lower the Ceph OSD heartbeat settings but creating an overly sensitive system will make it unstable, and can respond to false positives, everyone wants super fast failover but stability is a key factor, the default values are put for a reason. Hope this helps 🙂
jansz0
2 Posts
Quote from jansz0 on April 27, 2021, 7:45 amFirst of all ty the answer.
It is just a sand box before we make a fileserver.
So in our case we've got 3 physical nodes with OSD's, ISCSI gw's and 2 virtual windows server. They've got a 10G bond communication. I made two ISCSI disk that I read in the document a 1GB and a 100GB with 4 ISCSI paths.The test was the following: we are constantly writing to the shared storage while restarting one of the nodes. It needs 30-40 second.
So if I understood your answer, it is normal operation and i cannot get lower time.
First of all ty the answer.
It is just a sand box before we make a fileserver.
So in our case we've got 3 physical nodes with OSD's, ISCSI gw's and 2 virtual windows server. They've got a 10G bond communication. I made two ISCSI disk that I read in the document a 1GB and a 100GB with 4 ISCSI paths.
The test was the following: we are constantly writing to the shared storage while restarting one of the nodes. It needs 30-40 second.
So if I understood your answer, it is normal operation and i cannot get lower time.