Datastore often is disconnected from VMware
Pages: 1 2
admin
2,930 Posts
July 1, 2022, 9:36 amQuote from admin on July 1, 2022, 9:36 amCan you clarify what is the current status on VMWare: Are all datastores down ? or some down other up ? is it all the time or sometime ? random or under load ? or is it when you add add a new datastore or new vmdk disk and intialize it, or when you format from guest OS ?
Are you still/contentiously getting the dmesg error:
iSCSI Login negotiation failed.
Unable to locate Target Portal Group on iqn....
Are you still getting the latency logs in the OSD logs ? Do they happen on all OSDs or just 50,51,53 ? do they happen on all hosts or on OSDs on 1 host ? In the majority of cases these correlate to load on the disk, can you confirm the load is not the reason for this logs, using atop command which samples every 1 or 2 sec. In less common cases it could be due hardware issues or metadata database ( rocksdb ) fragmentation.
As stated earlier Windows does have larger timeouts than VMWare, so may not report issues.
Can you clarify what is the current status on VMWare: Are all datastores down ? or some down other up ? is it all the time or sometime ? random or under load ? or is it when you add add a new datastore or new vmdk disk and intialize it, or when you format from guest OS ?
Are you still/contentiously getting the dmesg error:
iSCSI Login negotiation failed.
Unable to locate Target Portal Group on iqn....
Are you still getting the latency logs in the OSD logs ? Do they happen on all OSDs or just 50,51,53 ? do they happen on all hosts or on OSDs on 1 host ? In the majority of cases these correlate to load on the disk, can you confirm the load is not the reason for this logs, using atop command which samples every 1 or 2 sec. In less common cases it could be due hardware issues or metadata database ( rocksdb ) fragmentation.
As stated earlier Windows does have larger timeouts than VMWare, so may not report issues.
Last edited on July 1, 2022, 9:37 am by admin · #11
Booop!
8 Posts
July 1, 2022, 10:13 amQuote from Booop! on July 1, 2022, 10:13 amNow VMware keeps disconnecting HDD-based datastore. Others petasan datastores from other cluster keeps working properly.
Disks from HDD-based petasan cluster inititaded by Windows or Linux works well but not from VMware.
It occurs all time now, with or without load.
Last night we deattach and re-attach disk, run discovery and all paths are visible and we test to change paths from Petasan GUI and works fine too.
VMware is still failing with heartbeat:
2022-07-01T09:58:16.807Z cpu9:2100098)HBX: 3041: 'HDD-HA': HB at offset 3772416 - Waiting for timed out HB:
2022-07-01T09:58:16.807Z cpu9:2100098) [HB state abcdef02 offset 3772416 gen 1679 stampUS 29158292752 uuid 62be53de-ad2c72ca-bfed-5cb9019bce9c jrnl <FB 16777230> drv 24.82 lockImpl 4 ip 192.168.77.63]
2022-07-01T09:58:18.880Z cpu2:2113265)HBX: 3041: 'HDD-HA': HB at offset 3772416 - Waiting for timed out HB:
2022-07-01T09:58:18.880Z cpu2:2113265) [HB state abcdef02 offset 3772416 gen 1679 stampUS 29158292752 uuid 62be53de-ad2c72ca-bfed-5cb9019bce9c jrnl <FB 16777230> drv 24.82 lockImpl 4 ip 192.168.77.63]
No latency logs in OSD logs.
Load is extremly low because this datastore is used for backup purpose and we disabled Veeam backup and keep failing.
How can we check rocksdb?
Now VMware keeps disconnecting HDD-based datastore. Others petasan datastores from other cluster keeps working properly.
Disks from HDD-based petasan cluster inititaded by Windows or Linux works well but not from VMware.
It occurs all time now, with or without load.
Last night we deattach and re-attach disk, run discovery and all paths are visible and we test to change paths from Petasan GUI and works fine too.
VMware is still failing with heartbeat:
2022-07-01T09:58:16.807Z cpu9:2100098)HBX: 3041: 'HDD-HA': HB at offset 3772416 - Waiting for timed out HB:
2022-07-01T09:58:16.807Z cpu9:2100098) [HB state abcdef02 offset 3772416 gen 1679 stampUS 29158292752 uuid 62be53de-ad2c72ca-bfed-5cb9019bce9c jrnl <FB 16777230> drv 24.82 lockImpl 4 ip 192.168.77.63]
2022-07-01T09:58:18.880Z cpu2:2113265)HBX: 3041: 'HDD-HA': HB at offset 3772416 - Waiting for timed out HB:
2022-07-01T09:58:18.880Z cpu2:2113265) [HB state abcdef02 offset 3772416 gen 1679 stampUS 29158292752 uuid 62be53de-ad2c72ca-bfed-5cb9019bce9c jrnl <FB 16777230> drv 24.82 lockImpl 4 ip 192.168.77.63]
No latency logs in OSD logs.
Load is extremly low because this datastore is used for backup purpose and we disabled Veeam backup and keep failing.
How can we check rocksdb?
admin
2,930 Posts
July 1, 2022, 10:51 amQuote from admin on July 1, 2022, 10:51 amwhat about dmesg ?
no need to check rocksdb if no errors in PetaSAN logs.
Others petasan datastores from other cluster keeps working properly.
you have other PetaSAN cluster ? do they have different iqn base ?
what about dmesg ?
no need to check rocksdb if no errors in PetaSAN logs.
Others petasan datastores from other cluster keeps working properly.
you have other PetaSAN cluster ? do they have different iqn base ?
Last edited on July 1, 2022, 10:54 am by admin · #13
Booop!
8 Posts
July 4, 2022, 8:45 amQuote from Booop! on July 4, 2022, 8:45 amdmesg at ceph-nodes:
[Sun Jul 3 23:15:07 2022] Unable to locate Target IQN: iqn.2022-07.com.petahdd:00001 in Storage Node
[Sun Jul 3 23:15:07 2022] iSCSI Login negotiation failed.
Both PetaSAN cluster have differents iqn base
- iqn.2022-07.com.petahdd:000XX
- iqn.2016-05.com.petasan:000XX
Now, we have changed the Backup Datastore protocol to NFS, but we want to find why iSCSI is failing to be able to fix in future.
Many thanks,
dmesg at ceph-nodes:
[Sun Jul 3 23:15:07 2022] Unable to locate Target IQN: iqn.2022-07.com.petahdd:00001 in Storage Node
[Sun Jul 3 23:15:07 2022] iSCSI Login negotiation failed.
Both PetaSAN cluster have differents iqn base
- iqn.2022-07.com.petahdd:000XX
- iqn.2016-05.com.petasan:000XX
Now, we have changed the Backup Datastore protocol to NFS, but we want to find why iSCSI is failing to be able to fix in future.
Many thanks,
Pages: 1 2
Datastore often is disconnected from VMware
admin
2,930 Posts
Quote from admin on July 1, 2022, 9:36 amCan you clarify what is the current status on VMWare: Are all datastores down ? or some down other up ? is it all the time or sometime ? random or under load ? or is it when you add add a new datastore or new vmdk disk and intialize it, or when you format from guest OS ?
Are you still/contentiously getting the dmesg error:
iSCSI Login negotiation failed.
Unable to locate Target Portal Group on iqn....Are you still getting the latency logs in the OSD logs ? Do they happen on all OSDs or just 50,51,53 ? do they happen on all hosts or on OSDs on 1 host ? In the majority of cases these correlate to load on the disk, can you confirm the load is not the reason for this logs, using atop command which samples every 1 or 2 sec. In less common cases it could be due hardware issues or metadata database ( rocksdb ) fragmentation.
As stated earlier Windows does have larger timeouts than VMWare, so may not report issues.
Can you clarify what is the current status on VMWare: Are all datastores down ? or some down other up ? is it all the time or sometime ? random or under load ? or is it when you add add a new datastore or new vmdk disk and intialize it, or when you format from guest OS ?
Are you still/contentiously getting the dmesg error:
iSCSI Login negotiation failed.
Unable to locate Target Portal Group on iqn....
Are you still getting the latency logs in the OSD logs ? Do they happen on all OSDs or just 50,51,53 ? do they happen on all hosts or on OSDs on 1 host ? In the majority of cases these correlate to load on the disk, can you confirm the load is not the reason for this logs, using atop command which samples every 1 or 2 sec. In less common cases it could be due hardware issues or metadata database ( rocksdb ) fragmentation.
As stated earlier Windows does have larger timeouts than VMWare, so may not report issues.
Booop!
8 Posts
Quote from Booop! on July 1, 2022, 10:13 amNow VMware keeps disconnecting HDD-based datastore. Others petasan datastores from other cluster keeps working properly.
Disks from HDD-based petasan cluster inititaded by Windows or Linux works well but not from VMware.
It occurs all time now, with or without load.
Last night we deattach and re-attach disk, run discovery and all paths are visible and we test to change paths from Petasan GUI and works fine too.
VMware is still failing with heartbeat:
2022-07-01T09:58:16.807Z cpu9:2100098)HBX: 3041: 'HDD-HA': HB at offset 3772416 - Waiting for timed out HB:
2022-07-01T09:58:16.807Z cpu9:2100098) [HB state abcdef02 offset 3772416 gen 1679 stampUS 29158292752 uuid 62be53de-ad2c72ca-bfed-5cb9019bce9c jrnl <FB 16777230> drv 24.82 lockImpl 4 ip 192.168.77.63]
2022-07-01T09:58:18.880Z cpu2:2113265)HBX: 3041: 'HDD-HA': HB at offset 3772416 - Waiting for timed out HB:
2022-07-01T09:58:18.880Z cpu2:2113265) [HB state abcdef02 offset 3772416 gen 1679 stampUS 29158292752 uuid 62be53de-ad2c72ca-bfed-5cb9019bce9c jrnl <FB 16777230> drv 24.82 lockImpl 4 ip 192.168.77.63]No latency logs in OSD logs.
Load is extremly low because this datastore is used for backup purpose and we disabled Veeam backup and keep failing.
How can we check rocksdb?
Now VMware keeps disconnecting HDD-based datastore. Others petasan datastores from other cluster keeps working properly.
Disks from HDD-based petasan cluster inititaded by Windows or Linux works well but not from VMware.
It occurs all time now, with or without load.
Last night we deattach and re-attach disk, run discovery and all paths are visible and we test to change paths from Petasan GUI and works fine too.
VMware is still failing with heartbeat:
2022-07-01T09:58:16.807Z cpu9:2100098)HBX: 3041: 'HDD-HA': HB at offset 3772416 - Waiting for timed out HB:
2022-07-01T09:58:16.807Z cpu9:2100098) [HB state abcdef02 offset 3772416 gen 1679 stampUS 29158292752 uuid 62be53de-ad2c72ca-bfed-5cb9019bce9c jrnl <FB 16777230> drv 24.82 lockImpl 4 ip 192.168.77.63]
2022-07-01T09:58:18.880Z cpu2:2113265)HBX: 3041: 'HDD-HA': HB at offset 3772416 - Waiting for timed out HB:
2022-07-01T09:58:18.880Z cpu2:2113265) [HB state abcdef02 offset 3772416 gen 1679 stampUS 29158292752 uuid 62be53de-ad2c72ca-bfed-5cb9019bce9c jrnl <FB 16777230> drv 24.82 lockImpl 4 ip 192.168.77.63]
No latency logs in OSD logs.
Load is extremly low because this datastore is used for backup purpose and we disabled Veeam backup and keep failing.
How can we check rocksdb?
admin
2,930 Posts
Quote from admin on July 1, 2022, 10:51 amwhat about dmesg ?
no need to check rocksdb if no errors in PetaSAN logs.
Others petasan datastores from other cluster keeps working properly.
you have other PetaSAN cluster ? do they have different iqn base ?
what about dmesg ?
no need to check rocksdb if no errors in PetaSAN logs.
Others petasan datastores from other cluster keeps working properly.
you have other PetaSAN cluster ? do they have different iqn base ?
Booop!
8 Posts
Quote from Booop! on July 4, 2022, 8:45 amdmesg at ceph-nodes:
[Sun Jul 3 23:15:07 2022] Unable to locate Target IQN: iqn.2022-07.com.petahdd:00001 in Storage Node
[Sun Jul 3 23:15:07 2022] iSCSI Login negotiation failed.Both PetaSAN cluster have differents iqn base
- iqn.2022-07.com.petahdd:000XX
- iqn.2016-05.com.petasan:000XXNow, we have changed the Backup Datastore protocol to NFS, but we want to find why iSCSI is failing to be able to fix in future.
Many thanks,
dmesg at ceph-nodes:
[Sun Jul 3 23:15:07 2022] Unable to locate Target IQN: iqn.2022-07.com.petahdd:00001 in Storage Node
[Sun Jul 3 23:15:07 2022] iSCSI Login negotiation failed.
Both PetaSAN cluster have differents iqn base
- iqn.2022-07.com.petahdd:000XX
- iqn.2016-05.com.petasan:000XX
Now, we have changed the Backup Datastore protocol to NFS, but we want to find why iSCSI is failing to be able to fix in future.
Many thanks,