Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

health after node update

Hello,

Did and upgrade from 2.3.1 to 2.6.2.  Everything started back up but when performing a health check prior to updating the next node, we get this

ceph health detail
HEALTH_WARN Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized; 2 pgs not deep-scrubbed in time
PG_DEGRADED Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized
pg 1.1b is active+recovery_wait+degraded, acting [16,0,7]
pg 1.52 is active+recovery_wait+degraded, acting [12,2,6]
pg 1.59 is active+recovering+degraded, acting [14,7,3]
pg 1.62 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.92 is active+recovery_wait+degraded, acting [13,4,7]
pg 1.c2 is active+recovery_wait+degraded, acting [17,7,3]
pg 1.e5 is active+recovery_wait+degraded, acting [14,7,1]
pg 1.12d is active+recovery_wait+degraded, acting [17,1,7]
pg 1.15e is active+recovery_wait+degraded, acting [12,7,5]
pg 1.17a is active+recovery_wait+degraded, acting [16,7,5]
pg 1.1bd is active+recovery_wait+degraded, acting [12,6,0]
pg 1.1c0 is active+recovery_wait+degraded, acting [17,7,1]
pg 1.1f5 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.22e is active+recovery_wait+degraded, acting [12,7,3]
pg 1.25f is active+recovery_wait+degraded, acting [17,7,2]
pg 1.260 is active+recovery_wait+degraded, acting [12,7,1]
pg 1.301 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.305 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.320 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.380 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.3dd is stuck undersized for 168.195350, current state active+recovery_w ait+undersized+degraded+remapped, last acting [17,5]
PG_NOT_DEEP_SCRUBBED 2 pgs not deep-scrubbed in time
pg 1.149 not deep-scrubbed since 2020-09-17 21:50:44.435585
pg 1.c1 not deep-scrubbed since 2020-09-17 20:53:51.529547

Update:

Looks like things are all back and working.  Just for everyone's information, all the updates on the nodes did the same thing, just wait till the system catches up before the next node update.

Thanks to the folks at PetaSan,

nothing too alarming. maybe an OSD was slow to restart. Check that the numbers of degraded pgs goes down with time, you can track the time progress from the PG Status chart. In the maintenance tab set the recovery to slow for hdd for average  for ssd disks, just in case it is currently too slow.

If things are not changing, i would restart osd 7, systemctl restart ceph-osd@7 you need to run it from its node.

The charts will work when you update the first 3 nodes, as we have changed it to use https in reverse proxy.