Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

ISCSI reconnect time

Hi All,

So I did a PetaSAN storage with 3 nodes and connected it to 2 W2019 server to be a file server. Setup the ISCSI initiator load balance policy to "Round Robin" as I read in the document. I tested the reconnection time sometimes it is more than 1 minute I think it is too much. Can anybody help me how to setup the initiators to give better reconnect time?
Thanks

 

jansz0

The default values should give you 30 sec failover time.
You may get higher times if the hardware is very busy with cpu or disk % busy near 100% or maybe if you are runing virtual.

It also depends if you shutdown a storage node with OSDs or if you shutdown an iSCSI gateway that is not serving any OSD storage, the first case should give you 30 sec, second case around 20 sec

You can change these values from defaults, but not recommended

The first is affected by the flowing Ceph timeout settings

osd_heartbeat_grace = 20
osd_heartbeat_interval = 5

The second case is affected by the following settings (assune you use MPIO)

HKEY_LOCAL_MACHINE\\SYSTEM\CurrentControlSet\Control\Class\{4D36E97B-E325-11CE-BFC1-08002BE10318}\<Instance_Number>\Parameters

LinkDownTime = 15

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters :
TcpMaxDataRetransmissions
TCPInitialRtt
TcpMaxConnectRetransmissions

In our tests we could get MPIO to switch in 1 sec in case of pure gateway (no local storage) by setting all values to 1, this is for testing only and is not for production.

Again not recommended for production you can lower the Ceph OSD heartbeat settings but creating an overly sensitive system will make it unstable, and can respond to false positives, everyone wants super fast failover but stability is a key factor, the default values are put for a reason. Hope this helps 🙂

First of all ty the answer.

It is just a sand box before we make a fileserver.
So in our case we've got 3 physical nodes with OSD's, ISCSI gw's and 2 virtual windows server. They've got a 10G bond communication. I made two ISCSI disk that I read in the document a 1GB and a 100GB with 4 ISCSI paths.

The test was the following: we are constantly writing to the shared storage while restarting one of the nodes. It needs 30-40 second.
So if I understood your answer, it is normal operation and i cannot get lower time.