health after node update
khopkins
96 Posts
September 30, 2020, 9:13 amQuote from khopkins on September 30, 2020, 9:13 amHello,
Did and upgrade from 2.3.1 to 2.6.2. Everything started back up but when performing a health check prior to updating the next node, we get this
ceph health detail
HEALTH_WARN Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized; 2 pgs not deep-scrubbed in time
PG_DEGRADED Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized
pg 1.1b is active+recovery_wait+degraded, acting [16,0,7]
pg 1.52 is active+recovery_wait+degraded, acting [12,2,6]
pg 1.59 is active+recovering+degraded, acting [14,7,3]
pg 1.62 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.92 is active+recovery_wait+degraded, acting [13,4,7]
pg 1.c2 is active+recovery_wait+degraded, acting [17,7,3]
pg 1.e5 is active+recovery_wait+degraded, acting [14,7,1]
pg 1.12d is active+recovery_wait+degraded, acting [17,1,7]
pg 1.15e is active+recovery_wait+degraded, acting [12,7,5]
pg 1.17a is active+recovery_wait+degraded, acting [16,7,5]
pg 1.1bd is active+recovery_wait+degraded, acting [12,6,0]
pg 1.1c0 is active+recovery_wait+degraded, acting [17,7,1]
pg 1.1f5 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.22e is active+recovery_wait+degraded, acting [12,7,3]
pg 1.25f is active+recovery_wait+degraded, acting [17,7,2]
pg 1.260 is active+recovery_wait+degraded, acting [12,7,1]
pg 1.301 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.305 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.320 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.380 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.3dd is stuck undersized for 168.195350, current state active+recovery_w ait+undersized+degraded+remapped, last acting [17,5]
PG_NOT_DEEP_SCRUBBED 2 pgs not deep-scrubbed in time
pg 1.149 not deep-scrubbed since 2020-09-17 21:50:44.435585
pg 1.c1 not deep-scrubbed since 2020-09-17 20:53:51.529547
Update:
Looks like things are all back and working. Just for everyone's information, all the updates on the nodes did the same thing, just wait till the system catches up before the next node update.
Thanks to the folks at PetaSan,
Hello,
Did and upgrade from 2.3.1 to 2.6.2. Everything started back up but when performing a health check prior to updating the next node, we get this
ceph health detail
HEALTH_WARN Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized; 2 pgs not deep-scrubbed in time
PG_DEGRADED Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized
pg 1.1b is active+recovery_wait+degraded, acting [16,0,7]
pg 1.52 is active+recovery_wait+degraded, acting [12,2,6]
pg 1.59 is active+recovering+degraded, acting [14,7,3]
pg 1.62 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.92 is active+recovery_wait+degraded, acting [13,4,7]
pg 1.c2 is active+recovery_wait+degraded, acting [17,7,3]
pg 1.e5 is active+recovery_wait+degraded, acting [14,7,1]
pg 1.12d is active+recovery_wait+degraded, acting [17,1,7]
pg 1.15e is active+recovery_wait+degraded, acting [12,7,5]
pg 1.17a is active+recovery_wait+degraded, acting [16,7,5]
pg 1.1bd is active+recovery_wait+degraded, acting [12,6,0]
pg 1.1c0 is active+recovery_wait+degraded, acting [17,7,1]
pg 1.1f5 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.22e is active+recovery_wait+degraded, acting [12,7,3]
pg 1.25f is active+recovery_wait+degraded, acting [17,7,2]
pg 1.260 is active+recovery_wait+degraded, acting [12,7,1]
pg 1.301 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.305 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.320 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.380 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.3dd is stuck undersized for 168.195350, current state active+recovery_w ait+undersized+degraded+remapped, last acting [17,5]
PG_NOT_DEEP_SCRUBBED 2 pgs not deep-scrubbed in time
pg 1.149 not deep-scrubbed since 2020-09-17 21:50:44.435585
pg 1.c1 not deep-scrubbed since 2020-09-17 20:53:51.529547
Update:
Looks like things are all back and working. Just for everyone's information, all the updates on the nodes did the same thing, just wait till the system catches up before the next node update.
Thanks to the folks at PetaSan,
Last edited on September 30, 2020, 10:00 am by khopkins · #1
admin
2,930 Posts
September 30, 2020, 10:03 amQuote from admin on September 30, 2020, 10:03 amnothing too alarming. maybe an OSD was slow to restart. Check that the numbers of degraded pgs goes down with time, you can track the time progress from the PG Status chart. In the maintenance tab set the recovery to slow for hdd for average for ssd disks, just in case it is currently too slow.
If things are not changing, i would restart osd 7, systemctl restart ceph-osd@7 you need to run it from its node.
The charts will work when you update the first 3 nodes, as we have changed it to use https in reverse proxy.
nothing too alarming. maybe an OSD was slow to restart. Check that the numbers of degraded pgs goes down with time, you can track the time progress from the PG Status chart. In the maintenance tab set the recovery to slow for hdd for average for ssd disks, just in case it is currently too slow.
If things are not changing, i would restart osd 7, systemctl restart ceph-osd@7 you need to run it from its node.
The charts will work when you update the first 3 nodes, as we have changed it to use https in reverse proxy.
health after node update
khopkins
96 Posts
Quote from khopkins on September 30, 2020, 9:13 amHello,
Did and upgrade from 2.3.1 to 2.6.2. Everything started back up but when performing a health check prior to updating the next node, we get this
ceph health detail
HEALTH_WARN Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized; 2 pgs not deep-scrubbed in time
PG_DEGRADED Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized
pg 1.1b is active+recovery_wait+degraded, acting [16,0,7]
pg 1.52 is active+recovery_wait+degraded, acting [12,2,6]
pg 1.59 is active+recovering+degraded, acting [14,7,3]
pg 1.62 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.92 is active+recovery_wait+degraded, acting [13,4,7]
pg 1.c2 is active+recovery_wait+degraded, acting [17,7,3]
pg 1.e5 is active+recovery_wait+degraded, acting [14,7,1]
pg 1.12d is active+recovery_wait+degraded, acting [17,1,7]
pg 1.15e is active+recovery_wait+degraded, acting [12,7,5]
pg 1.17a is active+recovery_wait+degraded, acting [16,7,5]
pg 1.1bd is active+recovery_wait+degraded, acting [12,6,0]
pg 1.1c0 is active+recovery_wait+degraded, acting [17,7,1]
pg 1.1f5 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.22e is active+recovery_wait+degraded, acting [12,7,3]
pg 1.25f is active+recovery_wait+degraded, acting [17,7,2]
pg 1.260 is active+recovery_wait+degraded, acting [12,7,1]
pg 1.301 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.305 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.320 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.380 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.3dd is stuck undersized for 168.195350, current state active+recovery_w ait+undersized+degraded+remapped, last acting [17,5]
PG_NOT_DEEP_SCRUBBED 2 pgs not deep-scrubbed in time
pg 1.149 not deep-scrubbed since 2020-09-17 21:50:44.435585
pg 1.c1 not deep-scrubbed since 2020-09-17 20:53:51.529547Update:
Looks like things are all back and working. Just for everyone's information, all the updates on the nodes did the same thing, just wait till the system catches up before the next node update.
Thanks to the folks at PetaSan,
Hello,
Did and upgrade from 2.3.1 to 2.6.2. Everything started back up but when performing a health check prior to updating the next node, we get this
ceph health detail
HEALTH_WARN Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized; 2 pgs not deep-scrubbed in time
PG_DEGRADED Degraded data redundancy: 27/8708721 objects degraded (0.000%), 21 p gs degraded, 1 pg undersized
pg 1.1b is active+recovery_wait+degraded, acting [16,0,7]
pg 1.52 is active+recovery_wait+degraded, acting [12,2,6]
pg 1.59 is active+recovering+degraded, acting [14,7,3]
pg 1.62 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.92 is active+recovery_wait+degraded, acting [13,4,7]
pg 1.c2 is active+recovery_wait+degraded, acting [17,7,3]
pg 1.e5 is active+recovery_wait+degraded, acting [14,7,1]
pg 1.12d is active+recovery_wait+degraded, acting [17,1,7]
pg 1.15e is active+recovery_wait+degraded, acting [12,7,5]
pg 1.17a is active+recovery_wait+degraded, acting [16,7,5]
pg 1.1bd is active+recovery_wait+degraded, acting [12,6,0]
pg 1.1c0 is active+recovery_wait+degraded, acting [17,7,1]
pg 1.1f5 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.22e is active+recovery_wait+degraded, acting [12,7,3]
pg 1.25f is active+recovery_wait+degraded, acting [17,7,2]
pg 1.260 is active+recovery_wait+degraded, acting [12,7,1]
pg 1.301 is active+recovery_wait+degraded, acting [13,0,7]
pg 1.305 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.320 is active+recovery_wait+degraded, acting [14,7,0]
pg 1.380 is active+recovery_wait+degraded, acting [16,3,7]
pg 1.3dd is stuck undersized for 168.195350, current state active+recovery_w ait+undersized+degraded+remapped, last acting [17,5]
PG_NOT_DEEP_SCRUBBED 2 pgs not deep-scrubbed in time
pg 1.149 not deep-scrubbed since 2020-09-17 21:50:44.435585
pg 1.c1 not deep-scrubbed since 2020-09-17 20:53:51.529547
Update:
Looks like things are all back and working. Just for everyone's information, all the updates on the nodes did the same thing, just wait till the system catches up before the next node update.
Thanks to the folks at PetaSan,
admin
2,930 Posts
Quote from admin on September 30, 2020, 10:03 amnothing too alarming. maybe an OSD was slow to restart. Check that the numbers of degraded pgs goes down with time, you can track the time progress from the PG Status chart. In the maintenance tab set the recovery to slow for hdd for average for ssd disks, just in case it is currently too slow.
If things are not changing, i would restart osd 7, systemctl restart ceph-osd@7 you need to run it from its node.
The charts will work when you update the first 3 nodes, as we have changed it to use https in reverse proxy.
nothing too alarming. maybe an OSD was slow to restart. Check that the numbers of degraded pgs goes down with time, you can track the time progress from the PG Status chart. In the maintenance tab set the recovery to slow for hdd for average for ssd disks, just in case it is currently too slow.
If things are not changing, i would restart osd 7, systemctl restart ceph-osd@7 you need to run it from its node.
The charts will work when you update the first 3 nodes, as we have changed it to use https in reverse proxy.