Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

iSCSI disk goes away after node shutdown

Hello,

First of all, thank you for this product.

We have 3 PetaSAN nodes.

If we shutdown one of them iSCSI disk goes away from iSCSI disks list and all iSCSI paths are dead.

How can we solve this issue?

can you do a

ceph health detail

consul members

 

Thank you for the answer.

This is a floating problem and it does not appear anymore.

If this problem appears again, I will write here.

Hello,

The problem reappeared.

We had problems with 1 of 3 nodes (ceph services gone down), and after this iSCSI disk disconnected from VMware and gone away from iSCSI disks list.

root@ceph1-osd1:~# ceph health detail
HEALTH_WARN Reduced data availability: 48 pgs inactive, 11 pgs peering; Degraded data redundancy: 3877/45084 objects degraded (8.600%), 177 pgs degraded, 37 pgs undersized
PG_AVAILABILITY Reduced data availability: 48 pgs inactive, 11 pgs peering
pg 6.3 is stuck inactive for 109272.804827, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.10 is stuck inactive for 109641.699083, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.50 is stuck peering for 107337.716568, current state remapped+peering, last acting [3,4]
pg 6.5e is stuck peering for 441.860258, current state remapped+peering, last acting [3]
pg 6.63 is stuck inactive for 109644.032663, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.8f is stuck inactive for 109640.790513, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.b6 is stuck inactive for 109641.615669, current state undersized+degraded+remapped+backfill_wait+peered, last acting [12]
pg 6.ce is stuck inactive for 109272.806252, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.e9 is stuck inactive for 109640.782986, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.eb is stuck inactive for 109644.557507, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.ff is stuck peering for 441.863435, current state peering, last acting [3,9]
pg 6.100 is stuck inactive for 109640.793287, current state undersized+degraded+remapped+backfill_wait+peered, last acting [12]
pg 6.10e is stuck peering for 107337.717364, current state remapped+peering, last acting [3]
pg 6.155 is stuck inactive for 109641.542773, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.161 is stuck inactive for 109640.784150, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.186 is stuck peering for 107337.717729, current state peering, last acting [3,4]
pg 6.1a2 is stuck peering for 107337.715572, current state peering, last acting [3,4]
pg 6.1a7 is stuck inactive for 109641.538992, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.1c1 is stuck inactive for 109641.700097, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.1cb is stuck inactive for 109643.555015, current state undersized+degraded+remapped+backfill_wait+peered, last acting [12]
pg 6.207 is stuck inactive for 109272.823779, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.211 is stuck inactive for 109640.786922, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.212 is stuck peering for 441.863955, current state peering, last acting [3,9]
pg 6.227 is stuck peering for 441.867491, current state peering, last acting [3,9]
pg 6.248 is stuck inactive for 109640.794396, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.251 is stuck inactive for 109272.802719, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.26a is stuck inactive for 109272.809259, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.276 is stuck peering for 107337.717910, current state peering, last acting [3,4]
pg 6.2a7 is stuck inactive for 109644.557301, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.2d2 is stuck inactive for 109644.032902, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.2f8 is stuck inactive for 109643.552028, current state undersized+degraded+remapped+backfill_wait+peered, last acting [12]
pg 6.306 is stuck inactive for 109272.827586, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.314 is stuck inactive for 109640.792001, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.32b is stuck inactive for 109641.620426, current state undersized+degraded+remapped+backfill_wait+peered, last acting [12]
pg 6.355 is stuck peering for 107337.715524, current state remapped+peering, last acting [3,4]
pg 6.368 is stuck inactive for 109640.798170, current state undersized+degraded+remapped+backfill_wait+peered, last acting [12]
pg 6.37d is stuck inactive for 109640.791662, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.37e is stuck inactive for 109641.611130, current state undersized+degraded+remapped+backfill_wait+peered, last acting [12]
pg 6.384 is stuck inactive for 109640.797100, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.38e is stuck inactive for 109640.794047, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.396 is stuck inactive for 109640.787453, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.3af is stuck peering for 107337.715643, current state remapped+peering, last acting [3,4]
pg 6.3b6 is stuck inactive for 109640.799755, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.3bb is stuck inactive for 109640.803229, current state undersized+degraded+remapped+backfill_wait+peered, last acting [12]
pg 6.3d2 is stuck inactive for 109640.789208, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.3e5 is stuck inactive for 109641.541582, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.3e6 is stuck inactive for 109640.786788, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.3f2 is stuck inactive for 109641.808531, current state undersized+degraded+remapped+backfilling+peered, last acting [8]
PG_DEGRADED Degraded data redundancy: 3877/45084 objects degraded (8.600%), 177 pgs degraded, 37 pgs undersized
pg 6.190 is active+recovery_wait+degraded, acting [8,3]
pg 6.1a7 is stuck undersized for 440.019123, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.1a8 is active+recovery_wait+degraded, acting [3,8]
pg 6.1a9 is active+recovery_wait+degraded, acting [11,9]
pg 6.1aa is active+recovery_wait+degraded, acting [3,4]
pg 6.1ab is active+recovery_wait+degraded, acting [3,9]
pg 6.1c1 is stuck undersized for 441.003135, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.1c6 is active+recovery_wait+degraded, acting [11,5]
pg 6.1ca is active+recovery_wait+degraded, acting [3,13]
pg 6.1cb is stuck undersized for 440.016099, current state undersized+degraded+remapped+backfill_wait+peered, last acting [12]
pg 6.1d7 is active+recovery_wait+degraded, acting [3,9]
pg 6.1de is active+recovery_wait+degraded, acting [11,9]
pg 6.1e1 is active+recovery_wait+degraded, acting [12,4]
pg 6.1f0 is active+recovery_wait+degraded, acting [12,9]
pg 6.1fb is active+recovery_wait+degraded, acting [12,3]
pg 6.1fc is active+recovery_wait+degraded, acting [3,9]
pg 6.207 is stuck undersized for 108665.031689, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.211 is stuck undersized for 108721.097823, current state undersized+degraded+remapped+backfill_wait+peered, last acting [3]
pg 6.213 is active+recovery_wait+degraded, acting [3,9]
pg 6.222 is active+recovery_wait+degraded, acting [7,9]
pg 6.22a is active+recovery_wait+degraded, acting [11,2]
pg 6.22b is active+recovery_wait+degraded, acting [3,4]
pg 6.22d is active+recovery_wait+degraded, acting [12,9]
pg 6.22f is active+recovery_wait+degraded, acting [3,11]
pg 6.231 is active+recovery_wait+degraded, acting [3,9]
pg 6.236 is active+recovery_wait+degraded, acting [11,3]
pg 6.240 is active+recovery_wait+degraded, acting [3,9]
pg 6.242 is active+recovery_wait+degraded, acting [11,3]
pg 6.248 is stuck undersized for 108721.102478, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.24c is active+recovery_wait+degraded, acting [3,9]
pg 6.24d is active+recovery_wait+degraded, acting [12,4]
pg 6.251 is stuck undersized for 108664.058304, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.260 is active+recovery_wait+degraded, acting [11,4]
pg 6.262 is active+recovery_wait+degraded, acting [11,9]
pg 6.267 is active+recovery_wait+degraded, acting [3,9]
pg 6.269 is active+recovery_wait+degraded, acting [3,9]
pg 6.26a is stuck undersized for 108664.331804, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.275 is active+recovery_wait+degraded+remapped, acting [7,9]
pg 6.277 is active+recovery_wait+degraded+remapped, acting [3,9]
pg 6.27b is active+recovery_wait+degraded, acting [12,4]
pg 6.27c is active+recovery_wait+degraded, acting [11,3]
pg 6.27f is active+recovery_wait+degraded, acting [3,12]
pg 6.284 is active+recovery_wait+degraded, acting [12,3]
pg 6.290 is active+recovery_wait+degraded, acting [11,5]
pg 6.295 is active+recovery_wait+degraded, acting [11,4]
pg 6.2a0 is active+recovery_wait+degraded, acting [11,7]
pg 6.2a7 is stuck undersized for 441.031313, current state undersized+degraded+remapped+backfill_wait+peered, last acting [11]
pg 6.2a8 is active+recovery_wait+degraded, acting [12,4]
pg 6.2ad is active+recovery_wait+degraded, acting [3,12]
pg 6.2b7 is active+recovery_wait+degraded, acting [3,11]
pg 6.2c5 is active+recovery_wait+degraded+remapped, acting [7,9]

root@ceph1-osd1:~# consul members
Node Address Status Type Build Protocol DC Segment
ceph1-osd1 100.64.1.1:8301 alive server 1.5.2 2 petasan <all>
ceph1-osd2 100.64.1.2:8301 alive server 1.5.2 2 petasan <all>
ceph1-osd3 100.64.1.3:8301 alive server 1.5.2 2 petasan <all>

root@ceph1-osd1:~# rbd ls
image-00003

After restarting petasan-iscsi service disk comes back but in stopped state.

If Ceph layer is not OK, the iSCSI layer will not function correctly, so if you still have Ceph layer issues, focus on those, once the cluster is back OK, the iSCSI layer will work just fine.

If you restart the entire cluster, all iSCSI disk will be stopped, you need to start them manually via ui or er have a script in /opt/petasan/script/util to start all disks.

you should be able to tolerate a node failure without issues.

last it is better to have replica x 3 rather than 2.

Good afternoon PetaSAN team. Reynaldo Martinez from UBXCloud here!

We are having a similar situation as described in this post (latest PetaSAN release). Our disks are gone but the images are there. We verified with the commands:

rbd ls

disk_meta.py -image XXXX -pool XXX

Our cluster have 4 nodes. We added today some SSD disks for the kernel caching module and removed some OSDS (in batches) from nodes 01 first (then re added with the caching disk, 4 OSD x 1 cache) and after letting rebalancing/backfilling enough (until the objects degraded was bellow 24%) then did the same with Node02.

After doing node 02 we had some alarms at vmware with the datastores (both datastores gone). In the iscsi disk list all disks are done (also all paths) but the images are there in CEPH:

root@mts-b00101cep03:~# rbd ls
image-00001
image-00002

root@mts-b00101cep03:~# /opt/petasan/scripts/util/disk_meta.py read --image image-00001 --pool rbd
{
"acl": "iqn.1998-01.com.vmware:mts-b00101esx01,iqn.1998-01.com.vmware:mts-b00101esx02",
"create_date": "2019-12-12",
"data_pool": "",
"description": "",
"disk_name": "VMW1",
"id": "00001",
"iqn": "iqn.2016-05.com.petasan:00001",
"is_replication_target": false,
"password": "",
"paths": [
{
"eth": "eth3",
"ip": "10.12.9.50",
"locked_by": "",
"subnet_mask": "255.255.255.0",
"vlan_id": ""
},
{
"eth": "eth4",
"ip": "10.12.10.50",
"locked_by": "",
"subnet_mask": "255.255.255.0",
"vlan_id": ""
}
],
"pool": "rbd",
"replication_info": {},
"size": 5120,
"user": "",
"wwn": ""
}

 

root@mts-b00101cep03:~# /opt/petasan/scripts/util/disk_meta.py read --image image-00002 --pool rbd
{
"acl": "iqn.1998-01.com.vmware:mts-b00101esx01,iqn.1998-01.com.vmware:mts-b00101esx02",
"create_date": "2019-12-12",
"data_pool": "",
"description": "",
"disk_name": "VMW2",
"id": "00002",
"iqn": "iqn.2016-05.com.petasan:00002",
"is_replication_target": false,
"password": "",
"paths": [
{
"eth": "eth3",
"ip": "10.12.9.51",
"locked_by": "",
"subnet_mask": "255.255.255.0",
"vlan_id": ""
},
{
"eth": "eth4",
"ip": "10.12.10.51",
"locked_by": "",
"subnet_mask": "255.255.255.0",
"vlan_id": ""
}
],
"pool": "rbd",
"replication_info": {},
"size": 5120,
"user": "",
"wwn": ""
}

 

We tried to recover by restarting the iscsi service on all nodes (one by one) with systemctl restart petasan-iscsi but found the service hung (and a lot of crazy messages in the dmesg) on nodes 02 and 03 (had to reboot each node with 20 minutes between each in order to not affect the rebalancing).

More information:

 

root@mts-b00101cep03:~# ceph -w
cluster:
id: acedfc20-23ed-446a-840a-4c558571af9d
health: HEALTH_WARN
Reduced data availability: 60 pgs inactive, 60 pgs incomplete
Degraded data redundancy: 457703/2099210 objects degraded (21.804%), 461 pgs degraded, 461 pgs undersized

services:
mon: 3 daemons, quorum mts-b00101cep03,mts-b00101cep01,mts-b00101cep02 (age 7m)
mgr: mts-b00101cep01(active, since 13m), standbys: mts-b00101cep03, mts-b00101cep02
osd: 26 osds: 26 up (since 7m), 26 in (since 51m); 462 remapped pgs

data:
pools: 1 pools, 1024 pgs
objects: 1.05M objects, 3.5 TiB
usage: 5.6 TiB used, 121 TiB / 127 TiB avail
pgs: 5.859% pgs not active
457703/2099210 objects degraded (21.804%)
4330/2099210 objects misplaced (0.206%)
502 active+clean
456 active+undersized+degraded+remapped+backfill_wait
60 incomplete
5 active+undersized+degraded+remapped+backfilling
1 active+remapped+backfill_wait

io:
recovery: 62 MiB/s, 17 objects/s

2020-01-08 17:00:55.441074 mon.mts-b00101cep03 [WRN] Health check update: Degraded data redundancy: 457703/2099210 objects degraded (21.804%), 461 pgs degraded, 461 pgs undersized (PG_DEGRADED)
2020-01-08 17:00:58.072924 mon.mts-b00101cep03 [WRN] Health check update: Reduced data availability: 60 pgs inactive, 9 pgs peering, 51 pgs incomplete (PG_AVAILABILITY)
2020-01-08 17:01:00.441646 mon.mts-b00101cep03 [WRN] Health check update: Degraded data redundancy: 457601/2099210 objects degraded (21.799%), 460 pgs degraded, 460 pgs undersized (PG_DEGRADED)

Our backfill speed is set to medium. All disks are spinners connected to a LSI Raid card with BBU cache (each disk as independent raid 0). Our ceph osd tree:

Croot@mts-b00101cep03:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 126.76218 root default
-3 22.63199 host mts-b00101cep01
0 hdd 4.52640 osd.0 up 1.00000 1.00000
1 hdd 4.52640 osd.1 up 1.00000 1.00000
2 hdd 4.52640 osd.2 up 1.00000 1.00000
3 hdd 4.52640 osd.3 up 1.00000 1.00000
4 hdd 4.52640 osd.4 up 1.00000 1.00000
-5 22.63199 host mts-b00101cep02
5 hdd 4.52640 osd.5 up 1.00000 1.00000
6 hdd 4.52640 osd.6 up 1.00000 1.00000
7 hdd 4.52640 osd.7 up 1.00000 1.00000
8 hdd 4.52640 osd.8 up 1.00000 1.00000
9 hdd 4.52640 osd.9 up 1.00000 1.00000
-7 27.15234 host mts-b00101cep03
41 hdd 4.52539 osd.41 up 1.00000 1.00000
42 hdd 4.52539 osd.42 up 1.00000 1.00000
43 hdd 4.52539 osd.43 up 1.00000 1.00000
44 hdd 4.52539 osd.44 up 1.00000 1.00000
45 hdd 4.52539 osd.45 up 1.00000 1.00000
46 hdd 4.52539 osd.46 up 1.00000 1.00000
-9 54.34586 host mts-b00101cep04
47 hdd 5.43459 osd.47 up 1.00000 1.00000
48 hdd 5.43459 osd.48 up 1.00000 1.00000
49 hdd 5.43459 osd.49 up 1.00000 1.00000
50 hdd 5.43459 osd.50 up 1.00000 1.00000
51 hdd 5.43459 osd.51 up 1.00000 1.00000
52 hdd 5.43459 osd.52 up 1.00000 1.00000
53 hdd 5.43459 osd.53 up 1.00000 1.00000
54 hdd 5.43459 osd.54 up 1.00000 1.00000
55 hdd 5.43459 osd.55 up 1.00000 1.00000
56 hdd 5.43459 osd.56 up 1.00000 1.00000

So.. how do we recover our ISCSI disks?? The images are there but PetaSAN UI is not showing them (and of course we have no active paths)... which commands do we need to run here? we had some VM's on those datastore (mostly development as we are doing a lot of testing before adding PetaSAN to full production service again).

What do you think caused the issue with the disks?? backfill speed?? other issues?? wrong procedures?

Hello there,

The problem is with the 60 incomplete pgs, if this this fixed, the iSCSI layer and ui will be ok.

Even though you can list the list of images and read  image metadata, it does not mean all is ok, this is done by accessing pgs containing the directory and header objects only, but any io to image sectors stored in the incomplete pgs will stall.

Deleting OSDs on node 2 before waiting for full recovery from deletion of OSDs on node 1 could be the problem. OSD deletions should be done with caution specially if across more than 1 node, this is why we do not support it in ui. It is also better to set the weight to 0 and allow the OSD to drain before deletion.

So the focus is to get the pgs healthy again. do a pg query to get more info on error, if you get peering_blocked_by_history_les_bound try to enable  osd_find_best_info_ignore_history_les=true and see.

Good luck.