Forums - PetaSAN

ForumGeneral DiscussionISCSI reconnect time
You need to log in to create posts and topics. Login · Register
ISCSI reconnect time

jansz0
2 Posts

April 26, 2021, 8:20 am
Quote from jansz0 on April 26, 2021, 8:20 am
Hi All,

So I did a PetaSAN storage with 3 nodes and connected it to 2 W2019 server to be a file server. Setup the ISCSI initiator load balance policy to "Round Robin" as I read in the document. I tested the reconnection time sometimes it is more than 1 minute I think it is too much. Can anybody help me how to setup the initiators to give better reconnect time?
Thanks

jansz0

Hi All,

So I did a PetaSAN storage with 3 nodes and connected it to 2 W2019 server to be a file server. Setup the ISCSI initiator load balance policy to "Round Robin" as I read in the document. I tested the reconnection time sometimes it is more than 1 minute I think it is too much. Can anybody help me how to setup the initiators to give better reconnect time?
Thanks

jansz0

#1

admin
2,969 Posts

April 26, 2021, 3:48 pm
Quote from admin on April 26, 2021, 3:48 pm
The default values should give you 30 sec failover time.
You may get higher times if the hardware is very busy with cpu or disk % busy near 100% or maybe if you are runing virtual.

It also depends if you shutdown a storage node with OSDs or if you shutdown an iSCSI gateway that is not serving any OSD storage, the first case should give you 30 sec, second case around 20 sec

You can change these values from defaults, but not recommended

The first is affected by the flowing Ceph timeout settings

osd_heartbeat_grace = 20
osd_heartbeat_interval = 5

The second case is affected by the following settings (assune you use MPIO)

HKEY_LOCAL_MACHINE\\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance_Number>\Parameters

LinkDownTime = 15

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters :
TcpMaxDataRetransmissions
TCPInitialRtt
TcpMaxConnectRetransmissions

In our tests we could get MPIO to switch in 1 sec in case of pure gateway (no local storage) by setting all values to 1, this is for testing only and is not for production.

Again not recommended for production you can lower the Ceph OSD heartbeat settings but creating an overly sensitive system will make it unstable, and can respond to false positives, everyone wants super fast failover but stability is a key factor, the default values are put for a reason. Hope this helps 🙂

The default values should give you 30 sec failover time.
You may get higher times if the hardware is very busy with cpu or disk % busy near 100% or maybe if you are runing virtual.

It also depends if you shutdown a storage node with OSDs or if you shutdown an iSCSI gateway that is not serving any OSD storage, the first case should give you 30 sec, second case around 20 sec

You can change these values from defaults, but not recommended

The first is affected by the flowing Ceph timeout settings

osd_heartbeat_grace = 20
osd_heartbeat_interval = 5

The second case is affected by the following settings (assune you use MPIO)

HKEY_LOCAL_MACHINE\\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance_Number>\Parameters

LinkDownTime = 15

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters :
TcpMaxDataRetransmissions
TCPInitialRtt
TcpMaxConnectRetransmissions

In our tests we could get MPIO to switch in 1 sec in case of pure gateway (no local storage) by setting all values to 1, this is for testing only and is not for production.

Again not recommended for production you can lower the Ceph OSD heartbeat settings but creating an overly sensitive system will make it unstable, and can respond to false positives, everyone wants super fast failover but stability is a key factor, the default values are put for a reason. Hope this helps 🙂

Last edited on April 26, 2021, 3:50 pm by admin · #2

jansz0
2 Posts

April 27, 2021, 7:45 am
Quote from jansz0 on April 27, 2021, 7:45 am
First of all ty the answer.

It is just a sand box before we make a fileserver.
So in our case we've got 3 physical nodes with OSD's, ISCSI gw's and 2 virtual windows server. They've got a 10G bond communication. I made two ISCSI disk that I read in the document a 1GB and a 100GB with 4 ISCSI paths.

The test was the following: we are constantly writing to the shared storage while restarting one of the nodes. It needs 30-40 second.
So if I understood your answer, it is normal operation and i cannot get lower time.

First of all ty the answer.

It is just a sand box before we make a fileserver.
So in our case we've got 3 physical nodes with OSD's, ISCSI gw's and 2 virtual windows server. They've got a 10G bond communication. I made two ISCSI disk that I read in the document a 1GB and a 100GB with 4 ISCSI paths.

The test was the following: we are constantly writing to the shared storage while restarting one of the nodes. It needs 30-40 second.
So if I understood your answer, it is normal operation and i cannot get lower time.

#3

Post Reply: ISCSI reconnect time

Cancel