Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Windows iSCSI initiator randomly freezes during VM snapshot

Hi,

We have been narrowing this issue for several days. we have one VM which uses a petasan disk using Windows software initiator. The problem appears when a VMware snapshot is created, which momentarily "freezes" the VM. After this, the iscsi initiator stops responding and only way to recover is to cold reboot de VM.

At petasan nodes we see these logs:

Jul 11 14:09:23 CEPH-11 kernel: [5965880.208375] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:10:42 CEPH-11 kernel: [5965959.312575] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:11:04 CEPH-11 kernel: [5965981.328617] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:26:18 CEPH-11 kernel: [5966895.250820] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:26:21 CEPH-11 kernel: [5966898.322829] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:27:02 CEPH-11 kernel: [5966939.538927] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:28:02 CEPH-11 kernel: [5966999.699073] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:28:43 CEPH-11 kernel: [5967040.659173] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:29:06 CEPH-11 kernel: [5967063.443225] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:29:47 CEPH-11 kernel: [5967104.403326] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:30:28 CEPH-11 kernel: [5967145.363425] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:30:50 CEPH-11 kernel: [5967167.379476] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:31:31 CEPH-11 kernel: [5967208.595578] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:32:12 CEPH-11 kernel: [5967249.555679] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:32:53 CEPH-11 kernel: [5967290.771772] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:33:15 CEPH-11 kernel: [5967312.787823] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:34:16 CEPH-11 kernel: [5967372.947964] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:34:57 CEPH-11 kernel: [5967413.908064] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:35:38 CEPH-11 kernel: [5967454.868163] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:39:10 CEPH-11 kernel: [5967667.092669] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:39:51 CEPH-11 kernel: [5967708.308771] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:39:54 CEPH-11 kernel: [5967711.380774] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:39:57 CEPH-11 kernel: [5967714.452782] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:41:35 CEPH-11 kernel: [5967812.501018] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:41:57 CEPH-11 kernel: [5967834.517069] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:43:35 CEPH-11 kernel: [5967932.565302] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:43:57 CEPH-11 kernel: [5967954.581355] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:44:00 CEPH-11 kernel: [5967957.653362] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:44:03 CEPH-11 kernel: [5967960.725369] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:44:25 CEPH-11 kernel: [5967982.741426] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:45:06 CEPH-11 kernel: [5968023.701521] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02
Jul 11 14:45:09 CEPH-11 kernel: [5968026.773529] Unable to recover from DataOut timeout while in ERL=0, closing iSCSI connection for I_T Nexus iqn.1991-05.com.microsoft:veeam-isp,i,0x400001370007,iqn.2016-05.com.petasan:00005,t,0x02

 

Any hint on this ?

Thanks!

For snapshots we need to flush all outstanding/inflight writes before we can proceed, this may be what you observe as temporary freeze. Can you try to see if this is load related,  meaning if you are not  doing heavy writing, does it work ? how long a freeze do you observe ?  is this a custom snapshot or a built in PetaSAN replication ?

Well the freeze is, in fact, permanent, once this happens the target never recovers, and we have to reboot the Windows VM to allow the initiator reconnect again.

About load, cluster is scrubbing, but write load is really low at that time.

It's a VMware snapshot, which affects a VM that runs Windows iSCSI client

Thanks!

 

Can you look at the disk % util charts and see if it was high ?

Can also try to lower the iSCSI performance tuning

cp /opt/petasan/config/tuning/templates/Generic\ Entry\ Level\ Hardware/lio_tunings /opt/petasan/config/tuning/current/

and see if this helps, you will need to restart the iSCSI nodes ( or move paths away then back ) so new paths will use the new tunnings

 

Hi,

I checked load at the moment the issue happened,  and all nodes loads are evenly low.

Would you recommend to try an entry tuning template anyway ?

Thanks!