Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Issue iSCSI disk in stopping state.

Pages: 1 2

I have a petasan Cluster with 4 nodes, 127 OSD backed by HDD drives
Each node with plenty of hardware resources, 2x Xeon E5-2640, and 384 Gb of RAM. 1 bond (2x10G) network for the backend and 2x10 10G for iSCSI services

I was having an issue with one of the iSCSI disks presented to the VMware environment indicating lost access to volume and the second after successfully restored access.

To clarify, I have another 13 Disks presented to the same VMware cluster from the same Petasan but only one is having this disconnection issue.

I think that the problem is due to a previous volume that I had in Petasan that used to have the same IP address as the volume that I'm having issues with right now. To try to identify my theory I planned to change the IP address of the iSCSI disk.
For this, I disconnected the volume in VMware, but when I tried to stop the iSCSI disk on Petasan it stayed in a "stopping" stage.
How can I force the system to stop the Disk?

ceph status and consul status below. Thank you for the help

# ceph -s
cluster:
id: 2f13efd3-d9f0-4260-be33-e626cf23d8c3
health: HEALTH_OK

services:
mon: 3 daemons, quorum us-petasan-05,us-petasan-04,us-petasan-06 (age 2d)
mgr: us-petasan-05(active, since 2d), standbys: us-petasan-04, us-petasan-06
osd: 127 osds: 127 up (since 2h), 127 in (since 2w)

data:
pools: 4 pools, 2337 pgs
objects: 8.78M objects, 33 TiB
usage: 130 TiB used, 647 TiB / 777 TiB avail
pgs: 2337 active+clean

io:
client: 8.0 MiB/s rd, 32 MiB/s wr, 650 op/s rd, 1.42k op/s wr

# consul members
Node Address Status Type Build Protocol DC Segment
us-petasan-04 10.5.2.4:8301 alive server 1.5.2 2 petasan <all>
us-petasan-05 10.5.2.5:8301 alive server 1.5.2 2 petasan <all>
us-petasan-06 10.5.2.6:8301 alive server 1.5.2 2 petasan <all>
us-petasan-03 10.5.2.3:8301 alive client 1.5.2 2 petasan <default>

I forgot to add. The cluster is running version 3.2.1

You can stop disk for command line, for disk 00001

consul kv delete -recurse  PetaSAN/Disks/00001

Are you using 3.2.1 or 3.2.0 as your other post ?

 

Hi, I'm using 3.2.1. In my other post just tried to reference that the issue has also been seen since version 3.2.0.

Thank you for the command after executing I see the GUI with the disk stopped.

Well, I'm not sure if something else is going on with the Cluster, I detached the iSCSI disk after having completely stopped the Disk.

I configured the new IP address for the iSCSI Disk and then hit start and the disk is now in the "Starting stage".

It is 10 minutes now and the state is the same.

The only logs that I'm seeing after hitting start on the disk are the following:

22/08/2023 06:25:22 INFO GlusterFS mount attempt

23/08/2023 06:25:25 INFO GlusterFS mount attempt

do you see errors in log files on any of your nodes ?

I executed the command "consul kv delete -recurse PetaSAN/Disks/00019" once again to stop the iSCSI disk and after that, I noted the following messages in the Petsan logs.

I then tried to start the iSCSI disk but the same behavior, the disk remained in the "starting" state

23/08/2023 20:10:59 INFO Call "get assignments stats" function.
23/08/2023 20:11:06 ERROR ConsulAPI error in "is_path_locked", could not check if path "00019" is locked.
23/08/2023 20:11:06 ERROR 'NoneType' object is not iterable
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/api.py", line 670, in is_path_locked
for i in data:
TypeError: 'NoneType' object is not iterable
23/08/2023 20:11:09 INFO Call "get assignments stats" function.
23/08/2023 20:11:20 INFO Call "get assignments stats" function.
23/08/2023 20:11:30 INFO Call "get assignments stats" function.
23/08/2023 20:11:40 INFO Call "get assignments stats" function.
23/08/2023 20:11:50 INFO Call "get assignments stats" function.
23/08/2023 20:11:55 ERROR ConsulAPI error in "is_path_locked", could not check if path "00019" is locked.
23/08/2023 20:11:55 ERROR 'NoneType' object is not iterable
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/api.py", line 670, in is_path_locked
for i in data:
TypeError: 'NoneType' object is not iterable
23/08/2023 20:12:00 INFO Call "get assignments stats" function.
23/08/2023 20:12:10 INFO Call "get assignments stats" function.
23/08/2023 20:12:20 INFO Call "get assignments stats" function.
23/08/2023 20:12:30 INFO Call "get assignments stats" function.
23/08/2023 20:12:40 INFO Call "get assignments stats" function.
23/08/2023 20:12:49 ERROR ConsulAPI error in "is_path_locked", could not check if path "00019" is locked.
23/08/2023 20:12:49 ERROR 'NoneType' object is not iterable
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/api.py", line 670, in is_path_locked
for i in data:
TypeError: 'NoneType' object is not iterable
23/08/2023 20:12:50 INFO Call "get assignments stats" function.
23/08/2023 20:13:01 INFO Call "get assignments stats" function.
23/08/2023 20:13:11 INFO Call "get assignments stats" function.
23/08/2023 20:13:21 INFO Call "get assignments stats" function.
23/08/2023 20:13:31 INFO Call "get assignments stats" function.
23/08/2023 20:13:34 ERROR ConsulAPI error in "is_path_locked", could not check if path "00019" is locked.
23/08/2023 20:13:34 ERROR 'NoneType' object is not iterable
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/api.py", line 670, in is_path_locked
for i in data:
TypeError: 'NoneType' object is not iterable
23/08/2023 20:13:41 INFO Call "get assignments stats" function.
23/08/2023 20:13:51 INFO Call "get assignments stats" function.
23/08/2023 20:13:55 INFO Successfully created key 00019 for new disk.
23/08/2023 20:13:55 INFO Successfully created key /00019/1 for new disk.
23/08/2023 20:13:55 INFO Successfully created key /00019/2 for new disk.
23/08/2023 20:14:01 INFO Call "get assignments stats" function.

 

just wondering if after upgrade from 3.2.0 to 3.2.1, did you reboot the hosts to apply new kernel ?

Yes, I did it.

This is what I see for the kernel in each node.

root@us-petasan-06:~# uname -srm
Linux 5.14.21-04-petasan x86_64

root@us-petasan-04:~# uname -srm
Linux 5.14.21-04-petasan x86_64

root@us-petasan-05:~# uname -srm
Linux 5.14.21-04-petasan x86_64

root@us-petasan-03:~# uname -srm
Linux 5.14.21-04-petasan x86_64

It is not something we can re-produce here.

Are the disks loaded ?

From the disk % util chart, are they very busy ?

From OSD latency chart, what is max latencies you see ?

From the cluster iops charts, what is the value of cluster iops ?

Pages: 1 2