Forums - PetaSAN

ForumGeneral Discussionhealth after node update
You need to log in to create posts and topics. Login · Register
health after node update

khopkins
96 Posts

September 30, 2020, 9:13 am
Quote from khopkins on September 30, 2020, 9:13 am
Hello,

Did and upgrade from 2.3.1 to 2.6.2. Everything started back up but when performing a health check prior to updating the next node, we get this

ceph health detail
HEALTH_WARN Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized; 2 pgs not deep-scrubbed in time
PG_DEGRADED Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized
pg 1.1b is active+recovery_wait+degraded, acting [16,0,7]
pg 1.52 is active+recovery_wait+degraded, acting [12,2,6]
pg 1.59 is active+recovering+degraded, acting [14,7,3]
pg 1.62 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.92 is active+recovery_wait+degraded, acting [13,4,7]
pg 1.c2 is active+recovery_wait+degraded, acting [17,7,3]
pg 1.e5 is active+recovery_wait+degraded, acting [14,7,1]
pg 1.12d is active+recovery_wait+degraded, acting [17,1,7]
pg 1.15e is active+recovery_wait+degraded, acting [12,7,5]
pg 1.17a is active+recovery_wait+degraded, acting [16,7,5]
pg 1.1bd is active+recovery_wait+degraded, acting [12,6,0]
pg 1.1c0 is active+recovery_wait+degraded, acting [17,7,1]
pg 1.1f5 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.22e is active+recovery_wait+degraded, acting [12,7,3]
pg 1.25f is active+recovery_wait+degraded, acting [17,7,2]
pg 1.260 is active+recovery_wait+degraded, acting [12,7,1]
pg 1.301 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.305 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.320 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.380 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.3dd is stuck undersized for 168.195350, current state active+recovery_w ait+undersized+degraded+remapped, last acting [17,5]
PG_NOT_DEEP_SCRUBBED 2 pgs not deep-scrubbed in time
pg 1.149 not deep-scrubbed since 2020-09-17 21:50:44.435585
pg 1.c1 not deep-scrubbed since 2020-09-17 20:53:51.529547

Update:

Looks like things are all back and working. Just for everyone's information, all the updates on the nodes did the same thing, just wait till the system catches up before the next node update.

Thanks to the folks at PetaSan,

Hello,

Did and upgrade from 2.3.1 to 2.6.2. Everything started back up but when performing a health check prior to updating the next node, we get this

ceph health detail
HEALTH_WARN Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized; 2 pgs not deep-scrubbed in time
PG_DEGRADED Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized
pg 1.1b is active+recovery_wait+degraded, acting [16,0,7]
pg 1.52 is active+recovery_wait+degraded, acting [12,2,6]
pg 1.59 is active+recovering+degraded, acting [14,7,3]
pg 1.62 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.92 is active+recovery_wait+degraded, acting [13,4,7]
pg 1.c2 is active+recovery_wait+degraded, acting [17,7,3]
pg 1.e5 is active+recovery_wait+degraded, acting [14,7,1]
pg 1.12d is active+recovery_wait+degraded, acting [17,1,7]
pg 1.15e is active+recovery_wait+degraded, acting [12,7,5]
pg 1.17a is active+recovery_wait+degraded, acting [16,7,5]
pg 1.1bd is active+recovery_wait+degraded, acting [12,6,0]
pg 1.1c0 is active+recovery_wait+degraded, acting [17,7,1]
pg 1.1f5 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.22e is active+recovery_wait+degraded, acting [12,7,3]
pg 1.25f is active+recovery_wait+degraded, acting [17,7,2]
pg 1.260 is active+recovery_wait+degraded, acting [12,7,1]
pg 1.301 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.305 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.320 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.380 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.3dd is stuck undersized for 168.195350, current state active+recovery_w ait+undersized+degraded+remapped, last acting [17,5]
PG_NOT_DEEP_SCRUBBED 2 pgs not deep-scrubbed in time
pg 1.149 not deep-scrubbed since 2020-09-17 21:50:44.435585
pg 1.c1 not deep-scrubbed since 2020-09-17 20:53:51.529547

Update:

Looks like things are all back and working. Just for everyone's information, all the updates on the nodes did the same thing, just wait till the system catches up before the next node update.

Thanks to the folks at PetaSan,

Last edited on September 30, 2020, 10:00 am by khopkins · #1

admin
2,930 Posts

September 30, 2020, 10:03 am
Quote from admin on September 30, 2020, 10:03 am
nothing too alarming. maybe an OSD was slow to restart. Check that the numbers of degraded pgs goes down with time, you can track the time progress from the PG Status chart. In the maintenance tab set the recovery to slow for hdd for average for ssd disks, just in case it is currently too slow.

If things are not changing, i would restart osd 7, systemctl restart ceph-osd@7 you need to run it from its node.

The charts will work when you update the first 3 nodes, as we have changed it to use https in reverse proxy.

nothing too alarming. maybe an OSD was slow to restart. Check that the numbers of degraded pgs goes down with time, you can track the time progress from the PG Status chart. In the maintenance tab set the recovery to slow for hdd for average for ssd disks, just in case it is currently too slow.

If things are not changing, i would restart osd 7, systemctl restart ceph-osd@7 you need to run it from its node.

The charts will work when you update the first 3 nodes, as we have changed it to use https in reverse proxy.

#2

Post Reply: health after node update

Cancel