Forums - PetaSAN

ForumGeneral DiscussionNew cluster not healty
You need to log in to create posts and topics. Login · Register
New cluster not healty

Ste
125 Posts

March 10, 2020, 10:54 am
Quote from Ste on March 10, 2020, 10:54 am
Hello,

I finally setup my brand new cluster, also updated to the latest 2.5.1 version, but from the dashboard it does not appear to be healty and it does not get better with time, this is the message:

Ceph Health

1 filesystem is degraded
1 MDSs report slow metadata IOs
BlueFS spillover detected on 2 OSD(s)
171 PGs pending on creation
Reduced data availability: 462 pgs inactive
Degraded data redundancy: 15/45 objects degraded (33.333%), 11 pgs degraded
508 slow ops, oldest one blocked for 75300 sec, daemons [osd.1,osd.2,osd.3,osd.4] have slow ops.
too many PGs per OSD (645 > max 300)
clock skew detected on mon.petasan01

I'm going through the Administration guide, but it doesn't seem I messed up something... What can I do to fix the situation ?

Moreover, I think it is not nornal to have two inactive pools (see picture), isn't it ?

Thanks.

Hello,

I finally setup my brand new cluster, also updated to the latest 2.5.1 version, but from the dashboard it does not appear to be healty and it does not get better with time, this is the message:

Ceph Health

1 filesystem is degraded
1 MDSs report slow metadata IOs
BlueFS spillover detected on 2 OSD(s)
171 PGs pending on creation
Reduced data availability: 462 pgs inactive
Degraded data redundancy: 15/45 objects degraded (33.333%), 11 pgs degraded
508 slow ops, oldest one blocked for 75300 sec, daemons [osd.1,osd.2,osd.3,osd.4] have slow ops.
too many PGs per OSD (645 > max 300)
clock skew detected on mon.petasan01

I'm going through the Administration guide, but it doesn't seem I messed up something... What can I do to fix the situation ?

Moreover, I think it is not nornal to have two inactive pools (see picture), isn't it ?

Thanks.

Last edited on March 10, 2020, 10:55 am by Ste · #1

admin
2,930 Posts

March 10, 2020, 1:04 pm
Quote from admin on March 10, 2020, 1:04 pm
No this is not normal. If you just installed it with no real data, i would recommend trying to re-install it. Also if this does include the kernel you built yourself (prior posts) then it could be the issue.

No this is not normal. If you just installed it with no real data, i would recommend trying to re-install it. Also if this does include the kernel you built yourself (prior posts) then it could be the issue.

#2

Ste
125 Posts

March 10, 2020, 4:12 pm
Quote from Ste on March 10, 2020, 4:12 pm
Actually I built and replaced only the "atlantic.ko" module, could this cause the issue ? (well, to be more clear I built and installed in node 1, and replaced only the atlantic.ko file on node 2 and 3)

But even if I re-install all the three nodes I "must" use the new atlantic.ko module, otherwise the cluster would be useless...

Actually I built and replaced only the "atlantic.ko" module, could this cause the issue ? (well, to be more clear I built and installed in node 1, and replaced only the atlantic.ko file on node 2 and 3)

But even if I re-install all the three nodes I "must" use the new atlantic.ko module, otherwise the cluster would be useless...

Last edited on March 10, 2020, 4:23 pm by Ste · #3

admin
2,930 Posts

March 10, 2020, 5:50 pm
Quote from admin on March 10, 2020, 5:50 pm
The default pools created too many PGs for your OSD disk count. Most probably during cluster creation you specified a range of 15-50 disks while you had only 5.

To fix: manually delete the pools / filesystem and create new pools with smaller number of PGs ( total 256 PG in all )

The default pools created too many PGs for your OSD disk count. Most probably during cluster creation you specified a range of 15-50 disks while you had only 5.

To fix: manually delete the pools / filesystem and create new pools with smaller number of PGs ( total 256 PG in all )

#4

Ste
125 Posts

March 10, 2020, 6:36 pm
Quote from Ste on March 10, 2020, 6:36 pm

Quote from admin on March 10, 2020, 5:50 pm

Most probably during cluster creation you specified a range of 15-50 disks while you had only 5.

Correct ! I choosed up to 50 disks because I currently have 3 hosts which can have up to 10 HDD each, now there are only 5 but I'm buying the other 25. Then, in the next months, I want to add a 4th node, so at the end I'll have 40 spinning disks and 4 NVMe SSD.

If now I select a smaller number of disks, then will I be able to increase it in the future ?

For now I'll follow your suggestion, there are no data yet in the cluster.

Thanks.

Quote from admin on March 10, 2020, 5:50 pm

Most probably during cluster creation you specified a range of 15-50 disks while you had only 5.

Correct ! I choosed up to 50 disks because I currently have 3 hosts which can have up to 10 HDD each, now there are only 5 but I'm buying the other 25. Then, in the next months, I want to add a 4th node, so at the end I'll have 40 spinning disks and 4 NVMe SSD.

If now I select a smaller number of disks, then will I be able to increase it in the future ?

For now I'll follow your suggestion, there are no data yet in the cluster.

Thanks.

#5

Ste
125 Posts

March 11, 2020, 12:03 pm
Quote from Ste on March 11, 2020, 12:03 pm
Fixed ! Now there's only this warning left, does it appear to be serious or not ?

BlueFS spillover detected on 2 OSD(s)

Thanks. Ste

Fixed ! Now there's only this warning left, does it appear to be serious or not ?

BlueFS spillover detected on 2 OSD(s)

Thanks. Ste

#6

admin
2,930 Posts

March 11, 2020, 12:16 pm
Quote from admin on March 11, 2020, 12:16 pm
Do the online upgrade then deleted the journal and OSD then re-add the OSDs:

to delete OSDs

systemctl stop ceph-osd@X

delete it from ui

re-add fro ui

If this is is not a new cluster, you need to delete the OSDs and journal 1 node at a time and do not do the other node until the cluster is OK healthy.

If his is a new cluster, you can download the latest iso and re-install.

Do the online upgrade then deleted the journal and OSD then re-add the OSDs:

to delete OSDs

systemctl stop ceph-osd@X

delete it from ui

re-add fro ui

If this is is not a new cluster, you need to delete the OSDs and journal 1 node at a time and do not do the other node until the cluster is OK healthy.

If his is a new cluster, you can download the latest iso and re-install.

Last edited on March 11, 2020, 12:25 pm by admin · #7

Post Reply: New cluster not healty

Cancel