Crash on one LUN while creating new LUNs
Pages: 1 2
therm
121 Posts
June 12, 2020, 11:03 amQuote from therm on June 12, 2020, 11:03 amHi Petasan team,
I have recently upgraded our cluster to 2.5.3. Because I now have more space (due to the upmap balancer) I created some new disks. Shortly after that one old LUN lost two paths:
naa.60014050000100000000000000000000 : PETASAN iSCSI Disk (naa.60014050000100000000000000000000)
vmhba33:C0:T0:L0 LUN:0 state:dead iscsi Adapter: Unavailable Target: Unavailable
vmhba33:C1:T0:L0 LUN:0 state:active iscsi Adapter: iqn.1998-01.com.vmware:bl460-6-1882a8cd Target: IQN=iqn.2016-05.com.petasan:00001 Alias= Session=00023d000004 PortalTag=2
vmhba33:C2:T0:L0 LUN:0 state:active iscsi Adapter: iqn.1998-01.com.vmware:bl460-6-1882a8cd Target: IQN=iqn.2016-05.com.petasan:00001 Alias= Session=00023d000005 PortalTag=3
vmhba33:C3:T0:L0 LUN:0 state:dead iscsi Adapter: Unavailable Target: Unavailable
Jun 12 11:52:29 ceph-node-mro-5 kernel: Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00001
Jun 12 11:52:29 ceph-node-mro-5 kernel: iSCSI Login negotiation failed.
All VMs on this LUN crashed. After migrating paths between ISCSI-Nodes they reappeard.
This leads to two questions:
- Is there something that could cause path lost when adding several LUNs one after the other without much time in between?
- Is roundrobin path selection (VMware) really a good idea when it comes to availability?
Regards,
Dennis
Hi Petasan team,
I have recently upgraded our cluster to 2.5.3. Because I now have more space (due to the upmap balancer) I created some new disks. Shortly after that one old LUN lost two paths:
naa.60014050000100000000000000000000 : PETASAN iSCSI Disk (naa.60014050000100000000000000000000)
vmhba33:C0:T0:L0 LUN:0 state:dead iscsi Adapter: Unavailable Target: Unavailable
vmhba33:C1:T0:L0 LUN:0 state:active iscsi Adapter: iqn.1998-01.com.vmware:bl460-6-1882a8cd Target: IQN=iqn.2016-05.com.petasan:00001 Alias= Session=00023d000004 PortalTag=2
vmhba33:C2:T0:L0 LUN:0 state:active iscsi Adapter: iqn.1998-01.com.vmware:bl460-6-1882a8cd Target: IQN=iqn.2016-05.com.petasan:00001 Alias= Session=00023d000005 PortalTag=3
vmhba33:C3:T0:L0 LUN:0 state:dead iscsi Adapter: Unavailable Target: Unavailable
Jun 12 11:52:29 ceph-node-mro-5 kernel: Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00001
Jun 12 11:52:29 ceph-node-mro-5 kernel: iSCSI Login negotiation failed.
All VMs on this LUN crashed. After migrating paths between ISCSI-Nodes they reappeard.
This leads to two questions:
- Is there something that could cause path lost when adding several LUNs one after the other without much time in between?
- Is roundrobin path selection (VMware) really a good idea when it comes to availability?
Regards,
Dennis
Last edited on June 12, 2020, 11:05 am by therm · #1
therm
121 Posts
June 12, 2020, 1:36 pmQuote from therm on June 12, 2020, 1:36 pmI also noticed that when moving a path it says it will move 2 paths. When moving 2 Paths it says it moves 4 Paths.
Is there some load balancing when manually moving paths?
I also noticed that when moving a path it says it will move 2 paths. When moving 2 Paths it says it moves 4 Paths.
Is there some load balancing when manually moving paths?
therm
121 Posts
June 12, 2020, 2:23 pmQuote from therm on June 12, 2020, 2:23 pmMhh might be something different. Now there are all 4 paths of this Lun on one ISCSI-Server, 2 paths are down. One on nic3 and another on nic4. No other path is down. It was fine when I moved the paths but 20 minutes later the paths were down again. It seems that the paths (only for this LUN) go up and down. Now about 10 min later one of the two paths is back again. Really strange.
It is flapping. Here are the demsg logs of iscsi-node1:
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:09:07 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:09:07 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:09:07 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:12:05 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:12:05 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:12:05 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:07 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:07 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:13:07 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:15:27 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:15:27 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:15:27 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:20:15 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:20:15 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:20:15 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:22:08 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:22:08 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:22:08 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:29:59 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:29:59 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:29:59 2020] COMPARE_AND_WRITE: miscompare at offset 0
and iscsi-node2:
[Fri Jun 12 16:04:59 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-5-51b6a8ca,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:05:01 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-13-2168e5f9,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:05:07 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-7-5f0a06c0,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:05:09 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-12-1ce696ba,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:05:09 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-10-58cf050b,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:12:55 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:12:55 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:12:55 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:04 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:04 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:13:04 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:25 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:25 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:13:25 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:20:35 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:20:35 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:20:35 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:24:53 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-8-30424947,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:24:55 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-7-5f0a06c0,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:24:55 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-12-1ce696ba,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:25:05 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-6-1882a8cd,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
Any idea?
Mhh might be something different. Now there are all 4 paths of this Lun on one ISCSI-Server, 2 paths are down. One on nic3 and another on nic4. No other path is down. It was fine when I moved the paths but 20 minutes later the paths were down again. It seems that the paths (only for this LUN) go up and down. Now about 10 min later one of the two paths is back again. Really strange.
It is flapping. Here are the demsg logs of iscsi-node1:
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:09:07 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:09:07 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:09:07 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:12:05 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:12:05 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:12:05 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:07 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:07 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:13:07 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:15:27 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:15:27 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:15:27 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:20:15 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:20:15 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:20:15 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:22:08 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:22:08 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:22:08 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:29:59 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:29:59 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:29:59 2020] COMPARE_AND_WRITE: miscompare at offset 0
and iscsi-node2:
[Fri Jun 12 16:04:59 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-5-51b6a8ca,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:05:01 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-13-2168e5f9,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:05:07 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-7-5f0a06c0,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:05:09 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-12-1ce696ba,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:05:09 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-10-58cf050b,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:12:55 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:12:55 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:12:55 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:04 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:04 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:13:04 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:25 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:25 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:13:25 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:20:35 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:20:35 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:20:35 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:24:53 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-8-30424947,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:24:55 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-7-5f0a06c0,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:24:55 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-12-1ce696ba,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:25:05 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-6-1882a8cd,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
Any idea?
Last edited on June 12, 2020, 2:31 pm by therm · #3
therm
121 Posts
June 12, 2020, 3:18 pmQuote from therm on June 12, 2020, 3:18 pm[Fri Jun 12 17:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 17:08:36 2020] iSCSI Login timeout on Network Portal 192.168.3.21:3260
[Fri Jun 12 17:08:36 2020] rx_data returned -512, expecting 48.
[Fri Jun 12 17:08:36 2020] iSCSI Login negotiation failed.
[Fri Jun 12 17:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 17:08:36 2020] iSCSI Login timeout on Network Portal 192.168.3.21:3260
[Fri Jun 12 17:08:36 2020] rx_data returned -512, expecting 48.
[Fri Jun 12 17:08:36 2020] iSCSI Login negotiation failed.
therm
121 Posts
June 12, 2020, 3:43 pmQuote from therm on June 12, 2020, 3:43 pmand this is from one of the esxi`s:
2020-06-12T15:42:04.353Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcc8df20 network resource pool netsched.pools.persist.iscsi associated
2020-06-12T15:42:04.353Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcc8df20 network tracker id 16768 tracker.iSCSI.192.168.3.21 associated
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:2 T:1 CN:0: Failed to receive data: Connection closed by peer
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba33:CH:2 T:1 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:2 T:1 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:07.367Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcad5ea0 network resource pool netsched.pools.persist.iscsi associated
2020-06-12T15:42:07.367Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcad5ea0 network tracker id 16768 tracker.iSCSI.192.168.3.21 associated
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:2 T:1 CN:0: Failed to receive data: Connection closed by peer
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba33:CH:2 T:1 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:2 T:1 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
and this is from one of the esxi`s:
2020-06-12T15:42:04.353Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcc8df20 network resource pool netsched.pools.persist.iscsi associated
2020-06-12T15:42:04.353Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcc8df20 network tracker id 16768 tracker.iSCSI.192.168.3.21 associated
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:2 T:1 CN:0: Failed to receive data: Connection closed by peer
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba33:CH:2 T:1 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:2 T:1 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:07.367Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcad5ea0 network resource pool netsched.pools.persist.iscsi associated
2020-06-12T15:42:07.367Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcad5ea0 network tracker id 16768 tracker.iSCSI.192.168.3.21 associated
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:2 T:1 CN:0: Failed to receive data: Connection closed by peer
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba33:CH:2 T:1 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:2 T:1 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
therm
121 Posts
June 13, 2020, 8:39 amQuote from therm on June 13, 2020, 8:39 amI found the root cause! One of my new LUNs got the same ip addresses as the LUN with the problems! (in meantime I migrated VMs, unmounted the datastore and unconfigured IPs )
Please check how that could happen! Attached the logs. Ceph-Solaris-xbrs-51 is the one with duplicate addresses, Ceph-ESX-1 the original one.
12/06/2020 11:37:27 INFO include_wwn_fsid_tag() is true
12/06/2020 11:37:27 INFO add disk wwn is c352a56200044
12/06/2020 11:37:33 INFO Disk Ceph-Solaris-xwis-44 created
12/06/2020 11:37:33 INFO Successfully created key 00044 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/1 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/2 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/3 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/4 for new disk.
12/06/2020 11:39:47 INFO include_wwn_fsid_tag() is true
12/06/2020 11:39:48 INFO add disk wwn is c352a56200045
12/06/2020 11:39:53 INFO Disk Ceph-Solaris-xrow-45 created
12/06/2020 11:39:53 INFO Successfully created key 00045 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/1 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/2 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/3 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/4 for new disk.
12/06/2020 11:40:49 INFO Successfully created key 00044 for new disk.
12/06/2020 11:41:30 INFO Successfully created key 00044 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/1 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/2 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/3 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/4 for new disk.
12/06/2020 11:43:28 INFO include_wwn_fsid_tag() is true
12/06/2020 11:43:28 INFO add disk wwn is c352a56200046
12/06/2020 11:43:34 INFO Disk Ceph-Solaris-xpot-46 created
12/06/2020 11:43:34 INFO Successfully created key 00046 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/1 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/2 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/3 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/4 for new disk.
12/06/2020 11:44:34 INFO include_wwn_fsid_tag() is true
12/06/2020 11:44:34 INFO add disk wwn is c352a56200047
12/06/2020 11:44:40 INFO Disk Ceph-Solaris-xlue-47 created
12/06/2020 11:44:40 INFO Successfully created key 00047 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/1 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/2 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/3 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/4 for new disk.
12/06/2020 11:45:49 INFO include_wwn_fsid_tag() is true
12/06/2020 11:45:49 INFO add disk wwn is c352a56200048
12/06/2020 11:45:55 INFO Disk Ceph-Solaris-xhbr-48 created
12/06/2020 11:45:55 INFO Successfully created key 00048 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/1 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/2 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/3 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/4 for new disk.
12/06/2020 11:47:25 INFO include_wwn_fsid_tag() is true
12/06/2020 11:47:25 INFO add disk wwn is c352a56200049
12/06/2020 11:47:31 INFO Disk Ceph-Solaris-xerf-49 created
12/06/2020 11:47:31 INFO Successfully created key 00049 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/1 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/2 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/3 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/4 for new disk.
12/06/2020 11:49:58 INFO include_wwn_fsid_tag() is true
12/06/2020 11:49:58 INFO add disk wwn is c352a56200050
12/06/2020 11:50:04 INFO Disk Ceph-Solaris-xclz-50 created
12/06/2020 11:50:04 INFO Successfully created key 00050 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/1 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/2 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/3 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/4 for new disk.
12/06/2020 11:51:55 INFO include_wwn_fsid_tag() is true
12/06/2020 11:51:55 INFO add disk wwn is c352a56200051
12/06/2020 11:52:01 INFO Disk Ceph-Solaris-xbrs-51 created
12/06/2020 11:52:01 INFO Successfully created key 00051 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/1 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/2 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/3 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/4 for new disk.
12/06/2020 11:54:46 INFO include_wwn_fsid_tag() is true
12/06/2020 11:54:46 INFO add disk wwn is c352a56200052
12/06/2020 11:54:52 INFO Disk Ceph-Solaris-xanh-51 created
12/06/2020 11:54:52 INFO Successfully created key 00052 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/1 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/2 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/3 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/4 for new disk.
12/06/2020 12:26:24 INFO call get assignments stats function.
12/06/2020 12:26:35 INFO call get assignments stats function.
12/06/2020 12:26:45 INFO call get assignments stats function.
12/06/2020 12:26:47 INFO call get assignments stats function.
12/06/2020 12:26:54 INFO call search by name function.
12/06/2020 12:26:54 INFO call search by name function.
12/06/2020 12:27:17 INFO User starts manual assignments.
12/06/2020 12:27:17 INFO User selected path 00001 Ceph-ESX-1 ceph-node-mro-5.
12/06/2020 12:27:17 INFO User selected manual option in assignment.
12/06/2020 12:27:26 INFO User start manual reassignment paths for selected paths.
12/06/2020 12:27:26 INFO Set new assignment.
12/06/2020 12:27:26 INFO Delete old assignments.
12/06/2020 12:27:26 INFO Lock assignment root.
12/06/2020 12:27:26 INFO {}
I found the root cause! One of my new LUNs got the same ip addresses as the LUN with the problems! (in meantime I migrated VMs, unmounted the datastore and unconfigured IPs )
Please check how that could happen! Attached the logs. Ceph-Solaris-xbrs-51 is the one with duplicate addresses, Ceph-ESX-1 the original one.
12/06/2020 11:37:27 INFO include_wwn_fsid_tag() is true
12/06/2020 11:37:27 INFO add disk wwn is c352a56200044
12/06/2020 11:37:33 INFO Disk Ceph-Solaris-xwis-44 created
12/06/2020 11:37:33 INFO Successfully created key 00044 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/1 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/2 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/3 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/4 for new disk.
12/06/2020 11:39:47 INFO include_wwn_fsid_tag() is true
12/06/2020 11:39:48 INFO add disk wwn is c352a56200045
12/06/2020 11:39:53 INFO Disk Ceph-Solaris-xrow-45 created
12/06/2020 11:39:53 INFO Successfully created key 00045 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/1 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/2 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/3 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/4 for new disk.
12/06/2020 11:40:49 INFO Successfully created key 00044 for new disk.
12/06/2020 11:41:30 INFO Successfully created key 00044 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/1 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/2 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/3 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/4 for new disk.
12/06/2020 11:43:28 INFO include_wwn_fsid_tag() is true
12/06/2020 11:43:28 INFO add disk wwn is c352a56200046
12/06/2020 11:43:34 INFO Disk Ceph-Solaris-xpot-46 created
12/06/2020 11:43:34 INFO Successfully created key 00046 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/1 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/2 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/3 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/4 for new disk.
12/06/2020 11:44:34 INFO include_wwn_fsid_tag() is true
12/06/2020 11:44:34 INFO add disk wwn is c352a56200047
12/06/2020 11:44:40 INFO Disk Ceph-Solaris-xlue-47 created
12/06/2020 11:44:40 INFO Successfully created key 00047 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/1 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/2 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/3 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/4 for new disk.
12/06/2020 11:45:49 INFO include_wwn_fsid_tag() is true
12/06/2020 11:45:49 INFO add disk wwn is c352a56200048
12/06/2020 11:45:55 INFO Disk Ceph-Solaris-xhbr-48 created
12/06/2020 11:45:55 INFO Successfully created key 00048 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/1 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/2 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/3 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/4 for new disk.
12/06/2020 11:47:25 INFO include_wwn_fsid_tag() is true
12/06/2020 11:47:25 INFO add disk wwn is c352a56200049
12/06/2020 11:47:31 INFO Disk Ceph-Solaris-xerf-49 created
12/06/2020 11:47:31 INFO Successfully created key 00049 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/1 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/2 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/3 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/4 for new disk.
12/06/2020 11:49:58 INFO include_wwn_fsid_tag() is true
12/06/2020 11:49:58 INFO add disk wwn is c352a56200050
12/06/2020 11:50:04 INFO Disk Ceph-Solaris-xclz-50 created
12/06/2020 11:50:04 INFO Successfully created key 00050 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/1 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/2 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/3 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/4 for new disk.
12/06/2020 11:51:55 INFO include_wwn_fsid_tag() is true
12/06/2020 11:51:55 INFO add disk wwn is c352a56200051
12/06/2020 11:52:01 INFO Disk Ceph-Solaris-xbrs-51 created
12/06/2020 11:52:01 INFO Successfully created key 00051 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/1 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/2 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/3 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/4 for new disk.
12/06/2020 11:54:46 INFO include_wwn_fsid_tag() is true
12/06/2020 11:54:46 INFO add disk wwn is c352a56200052
12/06/2020 11:54:52 INFO Disk Ceph-Solaris-xanh-51 created
12/06/2020 11:54:52 INFO Successfully created key 00052 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/1 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/2 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/3 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/4 for new disk.
12/06/2020 12:26:24 INFO call get assignments stats function.
12/06/2020 12:26:35 INFO call get assignments stats function.
12/06/2020 12:26:45 INFO call get assignments stats function.
12/06/2020 12:26:47 INFO call get assignments stats function.
12/06/2020 12:26:54 INFO call search by name function.
12/06/2020 12:26:54 INFO call search by name function.
12/06/2020 12:27:17 INFO User starts manual assignments.
12/06/2020 12:27:17 INFO User selected path 00001 Ceph-ESX-1 ceph-node-mro-5.
12/06/2020 12:27:17 INFO User selected manual option in assignment.
12/06/2020 12:27:26 INFO User start manual reassignment paths for selected paths.
12/06/2020 12:27:26 INFO Set new assignment.
12/06/2020 12:27:26 INFO Delete old assignments.
12/06/2020 12:27:26 INFO Lock assignment root.
12/06/2020 12:27:26 INFO {}
Last edited on June 13, 2020, 8:59 am by therm · #6
admin
2,930 Posts
June 13, 2020, 10:54 amQuote from admin on June 13, 2020, 10:54 amThanks for the feedback. It is not something that we have seen before but we will look into how this could happen.
As i understand you created a new disk, chose auto ip and the new created disk got (some/all?) ips similar to an existing disk ? you created the new disk after upgrading ?
Thanks for the feedback. It is not something that we have seen before but we will look into how this could happen.
As i understand you created a new disk, chose auto ip and the new created disk got (some/all?) ips similar to an existing disk ? you created the new disk after upgrading ?
therm
121 Posts
June 13, 2020, 11:02 amQuote from therm on June 13, 2020, 11:02 amYes I created multiple disk using auto ip and one disk got all 4 ips of the very first disk.
I did upgrade from 2.3.0 to 2.3.1 using the image installer and then online update to 2.5.3. Then I let run the balancer for a day with something like this in cron:
*/30 * * * * /usr/bin/ceph -s|/bin/grep -q '6444 active+clean' && /usr/bin/ceph balancer on && sleep 300 && /usr/bin/ceph balancer off
But there was not real load on it due to recovery_sleep. After that day I added a bunch of new petasan disks leading to the crash of disk-1. Was I too fast creating new disks?
Yes I created multiple disk using auto ip and one disk got all 4 ips of the very first disk.
I did upgrade from 2.3.0 to 2.3.1 using the image installer and then online update to 2.5.3. Then I let run the balancer for a day with something like this in cron:
*/30 * * * * /usr/bin/ceph -s|/bin/grep -q '6444 active+clean' && /usr/bin/ceph balancer on && sleep 300 && /usr/bin/ceph balancer off
But there was not real load on it due to recovery_sleep. After that day I added a bunch of new petasan disks leading to the crash of disk-1. Was I too fast creating new disks?
Last edited on June 13, 2020, 11:02 am by therm · #8
admin
2,930 Posts
June 13, 2020, 11:23 amQuote from admin on June 13, 2020, 11:23 amWas I too fast creating new disks?
Could be, it is the only thing i can think of now, but we will look into it. The auto ip works by reading metadata from all ceph rbd images and seeing what is available from the auto pool range, then saving the new ips to the new rbd metadata, so maybe.
We do have automated scripts to create large number of disks for load and failover testing, we have not see this before, but again we will look into it.
Was I too fast creating new disks?
Could be, it is the only thing i can think of now, but we will look into it. The auto ip works by reading metadata from all ceph rbd images and seeing what is available from the auto pool range, then saving the new ips to the new rbd metadata, so maybe.
We do have automated scripts to create large number of disks for load and failover testing, we have not see this before, but again we will look into it.
therm
121 Posts
June 13, 2020, 11:52 amQuote from therm on June 13, 2020, 11:52 amIt is getting better:
The server ceph-node-mro-6 is ment to be an osd server only. In the "ISCSi-Disks" view I can see that this server serves ips (and it does serve on the host this ips) but the server is not listed in the "nodes list" nor in the "path assignment" list. How do I move this ips?
It is getting better:
The server ceph-node-mro-6 is ment to be an osd server only. In the "ISCSi-Disks" view I can see that this server serves ips (and it does serve on the host this ips) but the server is not listed in the "nodes list" nor in the "path assignment" list. How do I move this ips?
Last edited on June 13, 2020, 11:53 am by therm · #10
Pages: 1 2
Crash on one LUN while creating new LUNs
therm
121 Posts
Quote from therm on June 12, 2020, 11:03 amHi Petasan team,
I have recently upgraded our cluster to 2.5.3. Because I now have more space (due to the upmap balancer) I created some new disks. Shortly after that one old LUN lost two paths:
naa.60014050000100000000000000000000 : PETASAN iSCSI Disk (naa.60014050000100000000000000000000)
vmhba33:C0:T0:L0 LUN:0 state:dead iscsi Adapter: Unavailable Target: Unavailable
vmhba33:C1:T0:L0 LUN:0 state:active iscsi Adapter: iqn.1998-01.com.vmware:bl460-6-1882a8cd Target: IQN=iqn.2016-05.com.petasan:00001 Alias= Session=00023d000004 PortalTag=2
vmhba33:C2:T0:L0 LUN:0 state:active iscsi Adapter: iqn.1998-01.com.vmware:bl460-6-1882a8cd Target: IQN=iqn.2016-05.com.petasan:00001 Alias= Session=00023d000005 PortalTag=3
vmhba33:C3:T0:L0 LUN:0 state:dead iscsi Adapter: Unavailable Target: Unavailable
Jun 12 11:52:29 ceph-node-mro-5 kernel: Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00001
Jun 12 11:52:29 ceph-node-mro-5 kernel: iSCSI Login negotiation failed.All VMs on this LUN crashed. After migrating paths between ISCSI-Nodes they reappeard.
This leads to two questions:
- Is there something that could cause path lost when adding several LUNs one after the other without much time in between?
- Is roundrobin path selection (VMware) really a good idea when it comes to availability?
Regards,
Dennis
Hi Petasan team,
I have recently upgraded our cluster to 2.5.3. Because I now have more space (due to the upmap balancer) I created some new disks. Shortly after that one old LUN lost two paths:
naa.60014050000100000000000000000000 : PETASAN iSCSI Disk (naa.60014050000100000000000000000000)
vmhba33:C0:T0:L0 LUN:0 state:dead iscsi Adapter: Unavailable Target: Unavailable
vmhba33:C1:T0:L0 LUN:0 state:active iscsi Adapter: iqn.1998-01.com.vmware:bl460-6-1882a8cd Target: IQN=iqn.2016-05.com.petasan:00001 Alias= Session=00023d000004 PortalTag=2
vmhba33:C2:T0:L0 LUN:0 state:active iscsi Adapter: iqn.1998-01.com.vmware:bl460-6-1882a8cd Target: IQN=iqn.2016-05.com.petasan:00001 Alias= Session=00023d000005 PortalTag=3
vmhba33:C3:T0:L0 LUN:0 state:dead iscsi Adapter: Unavailable Target: Unavailable
Jun 12 11:52:29 ceph-node-mro-5 kernel: Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00001
Jun 12 11:52:29 ceph-node-mro-5 kernel: iSCSI Login negotiation failed.
All VMs on this LUN crashed. After migrating paths between ISCSI-Nodes they reappeard.
This leads to two questions:
- Is there something that could cause path lost when adding several LUNs one after the other without much time in between?
- Is roundrobin path selection (VMware) really a good idea when it comes to availability?
Regards,
Dennis
therm
121 Posts
Quote from therm on June 12, 2020, 1:36 pmI also noticed that when moving a path it says it will move 2 paths. When moving 2 Paths it says it moves 4 Paths.
Is there some load balancing when manually moving paths?
I also noticed that when moving a path it says it will move 2 paths. When moving 2 Paths it says it moves 4 Paths.
Is there some load balancing when manually moving paths?
therm
121 Posts
Quote from therm on June 12, 2020, 2:23 pmMhh might be something different. Now there are all 4 paths of this Lun on one ISCSI-Server, 2 paths are down. One on nic3 and another on nic4. No other path is down. It was fine when I moved the paths but 20 minutes later the paths were down again. It seems that the paths (only for this LUN) go up and down. Now about 10 min later one of the two paths is back again. Really strange.
It is flapping. Here are the demsg logs of iscsi-node1:
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:09:07 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:09:07 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:09:07 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:12:05 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:12:05 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:12:05 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:07 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:07 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:13:07 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:15:27 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:15:27 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:15:27 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:20:15 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:20:15 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:20:15 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:22:08 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:22:08 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:22:08 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:29:59 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:29:59 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:29:59 2020] COMPARE_AND_WRITE: miscompare at offset 0and iscsi-node2:
[Fri Jun 12 16:04:59 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-5-51b6a8ca,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:05:01 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-13-2168e5f9,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:05:07 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-7-5f0a06c0,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:05:09 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-12-1ce696ba,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:05:09 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-10-58cf050b,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:12:55 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:12:55 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:12:55 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:04 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:04 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:13:04 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:25 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:25 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:13:25 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:20:35 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:20:35 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:20:35 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:24:53 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-8-30424947,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:24:55 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-7-5f0a06c0,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:24:55 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-12-1ce696ba,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:25:05 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-6-1882a8cd,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
Any idea?
Mhh might be something different. Now there are all 4 paths of this Lun on one ISCSI-Server, 2 paths are down. One on nic3 and another on nic4. No other path is down. It was fine when I moved the paths but 20 minutes later the paths were down again. It seems that the paths (only for this LUN) go up and down. Now about 10 min later one of the two paths is back again. Really strange.
It is flapping. Here are the demsg logs of iscsi-node1:
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:13 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:13 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:09:07 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:09:07 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:09:07 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:12:05 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:12:05 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:12:05 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:07 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:07 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:13:07 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:15:27 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:15:27 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:15:27 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:20:15 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:20:15 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:20:15 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:22:08 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:22:08 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:22:08 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:29:59 2020] rbd: rbd14: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:29:59 2020] rbd: rbd14: result -4108 xferred 200
[Fri Jun 12 16:29:59 2020] COMPARE_AND_WRITE: miscompare at offset 0
and iscsi-node2:
[Fri Jun 12 16:04:59 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-5-51b6a8ca,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:05:01 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-13-2168e5f9,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:05:07 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-7-5f0a06c0,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:05:09 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-12-1ce696ba,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:05:09 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-10-58cf050b,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:08:21 2020] Unable to locate Target Portal Group on iqn.2016-05.com.petasan:00020
[Fri Jun 12 16:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 16:12:55 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:12:55 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:12:55 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:04 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:04 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:13:04 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:13:25 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:13:25 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:13:25 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:20:35 2020] rbd: rbd34: compare-and-write 200 at 71dd000 (1dd000)
[Fri Jun 12 16:20:35 2020] rbd: rbd34: result -4108 xferred 200
[Fri Jun 12 16:20:35 2020] COMPARE_AND_WRITE: miscompare at offset 0
[Fri Jun 12 16:24:53 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-8-30424947,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:24:55 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-7-5f0a06c0,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:24:55 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-12-1ce696ba,i,0x00023d000005,iqn.2016-05.com.petasan:00001,t,0x03
[Fri Jun 12 16:25:05 2020] Did not receive response to NOPIN on CID: 0, failing connection for I_T Nexus iqn.1998-01.com.vmware:bl460-6-1882a8cd,i,0x00023d000004,iqn.2016-05.com.petasan:00001,t,0x02
Any idea?
therm
121 Posts
Quote from therm on June 12, 2020, 3:18 pm[Fri Jun 12 17:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 17:08:36 2020] iSCSI Login timeout on Network Portal 192.168.3.21:3260
[Fri Jun 12 17:08:36 2020] rx_data returned -512, expecting 48.
[Fri Jun 12 17:08:36 2020] iSCSI Login negotiation failed.
[Fri Jun 12 17:08:21 2020] iSCSI Login negotiation failed.
[Fri Jun 12 17:08:36 2020] iSCSI Login timeout on Network Portal 192.168.3.21:3260
[Fri Jun 12 17:08:36 2020] rx_data returned -512, expecting 48.
[Fri Jun 12 17:08:36 2020] iSCSI Login negotiation failed.
therm
121 Posts
Quote from therm on June 12, 2020, 3:43 pmand this is from one of the esxi`s:
2020-06-12T15:42:04.353Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcc8df20 network resource pool netsched.pools.persist.iscsi associated
2020-06-12T15:42:04.353Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcc8df20 network tracker id 16768 tracker.iSCSI.192.168.3.21 associated
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:2 T:1 CN:0: Failed to receive data: Connection closed by peer
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba33:CH:2 T:1 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:2 T:1 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:07.367Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcad5ea0 network resource pool netsched.pools.persist.iscsi associated
2020-06-12T15:42:07.367Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcad5ea0 network tracker id 16768 tracker.iSCSI.192.168.3.21 associated
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:2 T:1 CN:0: Failed to receive data: Connection closed by peer
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba33:CH:2 T:1 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:2 T:1 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
and this is from one of the esxi`s:
2020-06-12T15:42:04.353Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcc8df20 network resource pool netsched.pools.persist.iscsi associated
2020-06-12T15:42:04.353Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcc8df20 network tracker id 16768 tracker.iSCSI.192.168.3.21 associated
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:2 T:1 CN:0: Failed to receive data: Connection closed by peer
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.355Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba33:CH:2 T:1 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.355Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:2 T:1 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:04.607Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 192.168.3.5:21602 R: 192.168.3.21:3260]
2020-06-12T15:42:07.367Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcad5ea0 network resource pool netsched.pools.persist.iscsi associated
2020-06-12T15:42:07.367Z cpu42:33587)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x4308fcad5ea0 network tracker id 16768 tracker.iSCSI.192.168.3.21 associated
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: vmhba33:CH:2 T:1 CN:0: Failed to receive data: Connection closed by peer
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.369Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_ConnReceiveAtomic: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: vmhba33:CH:2 T:1 CN:0: Connection rx notifying failure: Failed to Receive. State=Bound
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.369Z cpu42:33587)iscsi_vmk: iscsivmk_ConnRxNotifyFailure: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba33:CH:2 T:1 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000005 TARGET: iqn.2016-05.com.petasan:00001 TPGT: 3 TSIH: 0]
2020-06-12T15:42:07.620Z cpu42:33587)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 192.168.3.5:30998 R: 192.168.3.21:3260]
therm
121 Posts
Quote from therm on June 13, 2020, 8:39 amI found the root cause! One of my new LUNs got the same ip addresses as the LUN with the problems! (in meantime I migrated VMs, unmounted the datastore and unconfigured IPs )
Please check how that could happen! Attached the logs. Ceph-Solaris-xbrs-51 is the one with duplicate addresses, Ceph-ESX-1 the original one.
12/06/2020 11:37:27 INFO include_wwn_fsid_tag() is true
12/06/2020 11:37:27 INFO add disk wwn is c352a56200044
12/06/2020 11:37:33 INFO Disk Ceph-Solaris-xwis-44 created
12/06/2020 11:37:33 INFO Successfully created key 00044 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/1 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/2 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/3 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/4 for new disk.
12/06/2020 11:39:47 INFO include_wwn_fsid_tag() is true
12/06/2020 11:39:48 INFO add disk wwn is c352a56200045
12/06/2020 11:39:53 INFO Disk Ceph-Solaris-xrow-45 created
12/06/2020 11:39:53 INFO Successfully created key 00045 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/1 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/2 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/3 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/4 for new disk.
12/06/2020 11:40:49 INFO Successfully created key 00044 for new disk.
12/06/2020 11:41:30 INFO Successfully created key 00044 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/1 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/2 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/3 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/4 for new disk.
12/06/2020 11:43:28 INFO include_wwn_fsid_tag() is true
12/06/2020 11:43:28 INFO add disk wwn is c352a56200046
12/06/2020 11:43:34 INFO Disk Ceph-Solaris-xpot-46 created
12/06/2020 11:43:34 INFO Successfully created key 00046 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/1 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/2 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/3 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/4 for new disk.
12/06/2020 11:44:34 INFO include_wwn_fsid_tag() is true
12/06/2020 11:44:34 INFO add disk wwn is c352a56200047
12/06/2020 11:44:40 INFO Disk Ceph-Solaris-xlue-47 created
12/06/2020 11:44:40 INFO Successfully created key 00047 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/1 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/2 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/3 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/4 for new disk.
12/06/2020 11:45:49 INFO include_wwn_fsid_tag() is true
12/06/2020 11:45:49 INFO add disk wwn is c352a56200048
12/06/2020 11:45:55 INFO Disk Ceph-Solaris-xhbr-48 created
12/06/2020 11:45:55 INFO Successfully created key 00048 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/1 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/2 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/3 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/4 for new disk.
12/06/2020 11:47:25 INFO include_wwn_fsid_tag() is true
12/06/2020 11:47:25 INFO add disk wwn is c352a56200049
12/06/2020 11:47:31 INFO Disk Ceph-Solaris-xerf-49 created
12/06/2020 11:47:31 INFO Successfully created key 00049 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/1 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/2 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/3 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/4 for new disk.
12/06/2020 11:49:58 INFO include_wwn_fsid_tag() is true
12/06/2020 11:49:58 INFO add disk wwn is c352a56200050
12/06/2020 11:50:04 INFO Disk Ceph-Solaris-xclz-50 created
12/06/2020 11:50:04 INFO Successfully created key 00050 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/1 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/2 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/3 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/4 for new disk.
12/06/2020 11:51:55 INFO include_wwn_fsid_tag() is true
12/06/2020 11:51:55 INFO add disk wwn is c352a56200051
12/06/2020 11:52:01 INFO Disk Ceph-Solaris-xbrs-51 created
12/06/2020 11:52:01 INFO Successfully created key 00051 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/1 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/2 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/3 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/4 for new disk.
12/06/2020 11:54:46 INFO include_wwn_fsid_tag() is true
12/06/2020 11:54:46 INFO add disk wwn is c352a56200052
12/06/2020 11:54:52 INFO Disk Ceph-Solaris-xanh-51 created
12/06/2020 11:54:52 INFO Successfully created key 00052 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/1 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/2 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/3 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/4 for new disk.
12/06/2020 12:26:24 INFO call get assignments stats function.
12/06/2020 12:26:35 INFO call get assignments stats function.
12/06/2020 12:26:45 INFO call get assignments stats function.
12/06/2020 12:26:47 INFO call get assignments stats function.
12/06/2020 12:26:54 INFO call search by name function.
12/06/2020 12:26:54 INFO call search by name function.
12/06/2020 12:27:17 INFO User starts manual assignments.
12/06/2020 12:27:17 INFO User selected path 00001 Ceph-ESX-1 ceph-node-mro-5.
12/06/2020 12:27:17 INFO User selected manual option in assignment.
12/06/2020 12:27:26 INFO User start manual reassignment paths for selected paths.
12/06/2020 12:27:26 INFO Set new assignment.
12/06/2020 12:27:26 INFO Delete old assignments.
12/06/2020 12:27:26 INFO Lock assignment root.
12/06/2020 12:27:26 INFO {}
I found the root cause! One of my new LUNs got the same ip addresses as the LUN with the problems! (in meantime I migrated VMs, unmounted the datastore and unconfigured IPs )
Please check how that could happen! Attached the logs. Ceph-Solaris-xbrs-51 is the one with duplicate addresses, Ceph-ESX-1 the original one.
12/06/2020 11:37:27 INFO include_wwn_fsid_tag() is true
12/06/2020 11:37:27 INFO add disk wwn is c352a56200044
12/06/2020 11:37:33 INFO Disk Ceph-Solaris-xwis-44 created
12/06/2020 11:37:33 INFO Successfully created key 00044 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/1 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/2 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/3 for new disk.
12/06/2020 11:37:33 INFO Successfully created key /00044/4 for new disk.
12/06/2020 11:39:47 INFO include_wwn_fsid_tag() is true
12/06/2020 11:39:48 INFO add disk wwn is c352a56200045
12/06/2020 11:39:53 INFO Disk Ceph-Solaris-xrow-45 created
12/06/2020 11:39:53 INFO Successfully created key 00045 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/1 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/2 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/3 for new disk.
12/06/2020 11:39:53 INFO Successfully created key /00045/4 for new disk.
12/06/2020 11:40:49 INFO Successfully created key 00044 for new disk.
12/06/2020 11:41:30 INFO Successfully created key 00044 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/1 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/2 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/3 for new disk.
12/06/2020 11:41:30 INFO Successfully created key /00044/4 for new disk.
12/06/2020 11:43:28 INFO include_wwn_fsid_tag() is true
12/06/2020 11:43:28 INFO add disk wwn is c352a56200046
12/06/2020 11:43:34 INFO Disk Ceph-Solaris-xpot-46 created
12/06/2020 11:43:34 INFO Successfully created key 00046 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/1 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/2 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/3 for new disk.
12/06/2020 11:43:34 INFO Successfully created key /00046/4 for new disk.
12/06/2020 11:44:34 INFO include_wwn_fsid_tag() is true
12/06/2020 11:44:34 INFO add disk wwn is c352a56200047
12/06/2020 11:44:40 INFO Disk Ceph-Solaris-xlue-47 created
12/06/2020 11:44:40 INFO Successfully created key 00047 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/1 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/2 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/3 for new disk.
12/06/2020 11:44:40 INFO Successfully created key /00047/4 for new disk.
12/06/2020 11:45:49 INFO include_wwn_fsid_tag() is true
12/06/2020 11:45:49 INFO add disk wwn is c352a56200048
12/06/2020 11:45:55 INFO Disk Ceph-Solaris-xhbr-48 created
12/06/2020 11:45:55 INFO Successfully created key 00048 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/1 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/2 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/3 for new disk.
12/06/2020 11:45:55 INFO Successfully created key /00048/4 for new disk.
12/06/2020 11:47:25 INFO include_wwn_fsid_tag() is true
12/06/2020 11:47:25 INFO add disk wwn is c352a56200049
12/06/2020 11:47:31 INFO Disk Ceph-Solaris-xerf-49 created
12/06/2020 11:47:31 INFO Successfully created key 00049 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/1 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/2 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/3 for new disk.
12/06/2020 11:47:31 INFO Successfully created key /00049/4 for new disk.
12/06/2020 11:49:58 INFO include_wwn_fsid_tag() is true
12/06/2020 11:49:58 INFO add disk wwn is c352a56200050
12/06/2020 11:50:04 INFO Disk Ceph-Solaris-xclz-50 created
12/06/2020 11:50:04 INFO Successfully created key 00050 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/1 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/2 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/3 for new disk.
12/06/2020 11:50:04 INFO Successfully created key /00050/4 for new disk.
12/06/2020 11:51:55 INFO include_wwn_fsid_tag() is true
12/06/2020 11:51:55 INFO add disk wwn is c352a56200051
12/06/2020 11:52:01 INFO Disk Ceph-Solaris-xbrs-51 created
12/06/2020 11:52:01 INFO Successfully created key 00051 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/1 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/2 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/3 for new disk.
12/06/2020 11:52:01 INFO Successfully created key /00051/4 for new disk.
12/06/2020 11:54:46 INFO include_wwn_fsid_tag() is true
12/06/2020 11:54:46 INFO add disk wwn is c352a56200052
12/06/2020 11:54:52 INFO Disk Ceph-Solaris-xanh-51 created
12/06/2020 11:54:52 INFO Successfully created key 00052 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/1 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/2 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/3 for new disk.
12/06/2020 11:54:52 INFO Successfully created key /00052/4 for new disk.
12/06/2020 12:26:24 INFO call get assignments stats function.
12/06/2020 12:26:35 INFO call get assignments stats function.
12/06/2020 12:26:45 INFO call get assignments stats function.
12/06/2020 12:26:47 INFO call get assignments stats function.
12/06/2020 12:26:54 INFO call search by name function.
12/06/2020 12:26:54 INFO call search by name function.
12/06/2020 12:27:17 INFO User starts manual assignments.
12/06/2020 12:27:17 INFO User selected path 00001 Ceph-ESX-1 ceph-node-mro-5.
12/06/2020 12:27:17 INFO User selected manual option in assignment.
12/06/2020 12:27:26 INFO User start manual reassignment paths for selected paths.
12/06/2020 12:27:26 INFO Set new assignment.
12/06/2020 12:27:26 INFO Delete old assignments.
12/06/2020 12:27:26 INFO Lock assignment root.
12/06/2020 12:27:26 INFO {}
admin
2,930 Posts
Quote from admin on June 13, 2020, 10:54 amThanks for the feedback. It is not something that we have seen before but we will look into how this could happen.
As i understand you created a new disk, chose auto ip and the new created disk got (some/all?) ips similar to an existing disk ? you created the new disk after upgrading ?
Thanks for the feedback. It is not something that we have seen before but we will look into how this could happen.
As i understand you created a new disk, chose auto ip and the new created disk got (some/all?) ips similar to an existing disk ? you created the new disk after upgrading ?
therm
121 Posts
Quote from therm on June 13, 2020, 11:02 amYes I created multiple disk using auto ip and one disk got all 4 ips of the very first disk.
I did upgrade from 2.3.0 to 2.3.1 using the image installer and then online update to 2.5.3. Then I let run the balancer for a day with something like this in cron:
*/30 * * * * /usr/bin/ceph -s|/bin/grep -q '6444 active+clean' && /usr/bin/ceph balancer on && sleep 300 && /usr/bin/ceph balancer off
But there was not real load on it due to recovery_sleep. After that day I added a bunch of new petasan disks leading to the crash of disk-1. Was I too fast creating new disks?
Yes I created multiple disk using auto ip and one disk got all 4 ips of the very first disk.
I did upgrade from 2.3.0 to 2.3.1 using the image installer and then online update to 2.5.3. Then I let run the balancer for a day with something like this in cron:
*/30 * * * * /usr/bin/ceph -s|/bin/grep -q '6444 active+clean' && /usr/bin/ceph balancer on && sleep 300 && /usr/bin/ceph balancer off
But there was not real load on it due to recovery_sleep. After that day I added a bunch of new petasan disks leading to the crash of disk-1. Was I too fast creating new disks?
admin
2,930 Posts
Quote from admin on June 13, 2020, 11:23 amWas I too fast creating new disks?
Could be, it is the only thing i can think of now, but we will look into it. The auto ip works by reading metadata from all ceph rbd images and seeing what is available from the auto pool range, then saving the new ips to the new rbd metadata, so maybe.
We do have automated scripts to create large number of disks for load and failover testing, we have not see this before, but again we will look into it.
Was I too fast creating new disks?
Could be, it is the only thing i can think of now, but we will look into it. The auto ip works by reading metadata from all ceph rbd images and seeing what is available from the auto pool range, then saving the new ips to the new rbd metadata, so maybe.
We do have automated scripts to create large number of disks for load and failover testing, we have not see this before, but again we will look into it.
therm
121 Posts
Quote from therm on June 13, 2020, 11:52 amIt is getting better:
The server ceph-node-mro-6 is ment to be an osd server only. In the "ISCSi-Disks" view I can see that this server serves ips (and it does serve on the host this ips) but the server is not listed in the "nodes list" nor in the "path assignment" list. How do I move this ips?
It is getting better:
The server ceph-node-mro-6 is ment to be an osd server only. In the "ISCSi-Disks" view I can see that this server serves ips (and it does serve on the host this ips) but the server is not listed in the "nodes list" nor in the "path assignment" list. How do I move this ips?