Problems after adding OSD nodes
Pages: 1 2
kpiti
23 Posts
November 26, 2024, 5:01 pmQuote from kpiti on November 26, 2024, 5:01 pmHi,
we had 5 nodes (3 mgr, 3metadata, 5 osd) with 6x8TB disks/osds each + 2 cache nvme + journal on ssd. We decided to upgrade with 5 more nodes with the same configuration and add bigger disks to all in the free slots. So each node got an additional 4x16TB disks.
I started adding nodes and I decided I'll just add nodes to the cluster and add/create OSDs when everything was up and healthy so I didn't mark any disks as OSDs/cache/journal in the install as I wanted to do that later. When all 5 new nodes were installed and joined I went to the dashboard and saw they are listed there but all are marked as down and there isn't any action available (like disk list etc..)
At the same time I got a warning about some PGs not deep scrubbed and heath went to WARN and I saw a lot of PG rebalancing going on. There was a warning about setting norebalance when I installed the new nodes but I (foolishly) thought as I'm not marking disks as OSDs, nothing is going to happen. And now this rebalancing is going on for a couple of days I started digging and I found out that we have 60 OSDs now instead of 30 and in osd tree I can see all the new nodes being full of OSDs.
Now for starters I still can't see the new nodes in the gui (I can see them but marked as Down), I also didn't wan't to add the new disks automatically as we wanted to make some changes and the big disks would help us move the data off the current ones so we could recreate the old pools anew.. As it seems the new nodes got automatically configured the same way the old ones were and the OSDs were activated and rebalancing was hyper active. I turned on norebalance now but I think a lot of data has been transferred to the new osds already..
The cluster looks like this at the moment:
# ceph -s
cluster:
id: 2e7a0a56-89a1-481d-b78b-7ed5a44f1881
health: HEALTH_WARN
noout,norebalance,norecover flag(s) set
703 pgs not deep-scrubbed in time
971 pgs not scrubbed in time
services:
mon: 3 daemons, quorum CEPH03,CEPH01,CEPH02 (age 10w)
mgr: CEPH01(active, since 10w), standbys: CEPH02, CEPH03
mds: 2/2 daemons up, 1 standby
osd: 60 osds: 60 up (since 5d), 60 in (since 5d); 480 remapped pgs
flags noout,norebalance,norecover
data:
volumes: 1/1 healthy
pools: 4 pools, 2113 pgs
objects: 33.95M objects, 41 TiB
usage: 127 TiB used, 313 TiB / 440 TiB avail
pgs: 2193991/101839914 objects misplaced (2.154%)
1633 active+clean
479 active+remapped+backfill_wait
1 active+remapped+backfilling
io:
client: 0 B/s rd, 6.6 KiB/s wr, 0 op/s rd, 1 op/s wr
progress:
Global Recovery Event (10d)
[=====================.......] (remaining: 3d)
How can I get the cluster to use just the first 5 nodes (30 OSDs) and leave the new stuff clean for the time being?
I suppose the 5 new nodes/OSDs should get active/Up after the cluster gets to OK health in the node list or is there a problem already?
Luckily the work people do is largely uninterrupted, some just noticed the free space/capacity was increasing..
Thanks, any help appreciated..
Cheers, Jure
Hi,
we had 5 nodes (3 mgr, 3metadata, 5 osd) with 6x8TB disks/osds each + 2 cache nvme + journal on ssd. We decided to upgrade with 5 more nodes with the same configuration and add bigger disks to all in the free slots. So each node got an additional 4x16TB disks.
I started adding nodes and I decided I'll just add nodes to the cluster and add/create OSDs when everything was up and healthy so I didn't mark any disks as OSDs/cache/journal in the install as I wanted to do that later. When all 5 new nodes were installed and joined I went to the dashboard and saw they are listed there but all are marked as down and there isn't any action available (like disk list etc..)
At the same time I got a warning about some PGs not deep scrubbed and heath went to WARN and I saw a lot of PG rebalancing going on. There was a warning about setting norebalance when I installed the new nodes but I (foolishly) thought as I'm not marking disks as OSDs, nothing is going to happen. And now this rebalancing is going on for a couple of days I started digging and I found out that we have 60 OSDs now instead of 30 and in osd tree I can see all the new nodes being full of OSDs.
Now for starters I still can't see the new nodes in the gui (I can see them but marked as Down), I also didn't wan't to add the new disks automatically as we wanted to make some changes and the big disks would help us move the data off the current ones so we could recreate the old pools anew.. As it seems the new nodes got automatically configured the same way the old ones were and the OSDs were activated and rebalancing was hyper active. I turned on norebalance now but I think a lot of data has been transferred to the new osds already..
The cluster looks like this at the moment:
# ceph -s
cluster:
id: 2e7a0a56-89a1-481d-b78b-7ed5a44f1881
health: HEALTH_WARN
noout,norebalance,norecover flag(s) set
703 pgs not deep-scrubbed in time
971 pgs not scrubbed in time
services:
mon: 3 daemons, quorum CEPH03,CEPH01,CEPH02 (age 10w)
mgr: CEPH01(active, since 10w), standbys: CEPH02, CEPH03
mds: 2/2 daemons up, 1 standby
osd: 60 osds: 60 up (since 5d), 60 in (since 5d); 480 remapped pgs
flags noout,norebalance,norecover
data:
volumes: 1/1 healthy
pools: 4 pools, 2113 pgs
objects: 33.95M objects, 41 TiB
usage: 127 TiB used, 313 TiB / 440 TiB avail
pgs: 2193991/101839914 objects misplaced (2.154%)
1633 active+clean
479 active+remapped+backfill_wait
1 active+remapped+backfilling
io:
client: 0 B/s rd, 6.6 KiB/s wr, 0 op/s rd, 1 op/s wr
progress:
Global Recovery Event (10d)
[=====================.......] (remaining: 3d)
How can I get the cluster to use just the first 5 nodes (30 OSDs) and leave the new stuff clean for the time being?
I suppose the 5 new nodes/OSDs should get active/Up after the cluster gets to OK health in the node list or is there a problem already?
Luckily the work people do is largely uninterrupted, some just noticed the free space/capacity was increasing..
Thanks, any help appreciated..
Cheers, Jure
admin
2,930 Posts
November 26, 2024, 7:08 pmQuote from admin on November 26, 2024, 7:08 pmyou can try to set the OSD crush weight of the new OSDs to 0, via the UI or via cli. Then when things are stable you can increase the weight gradually.
it is very strange the OSDs were automatically added. were these old OSDs that were used before or were they new drives?
you can try to set the OSD crush weight of the new OSDs to 0, via the UI or via cli. Then when things are stable you can increase the weight gradually.
it is very strange the OSDs were automatically added. were these old OSDs that were used before or were they new drives?
kpiti
23 Posts
November 26, 2024, 7:41 pmQuote from kpiti on November 26, 2024, 7:41 pmBrand new boxes & drives. I did expect I'd have a say in configuring them and was surprised to see them all active.. The extra bigger drives seem not to be activated..
If I set crush weight to 0 do I have to release the rebalancing so it will move the pgs off them?
Brand new boxes & drives. I did expect I'd have a say in configuring them and was surprised to see them all active.. The extra bigger drives seem not to be activated..
If I set crush weight to 0 do I have to release the rebalancing so it will move the pgs off them?
admin
2,930 Posts
November 26, 2024, 8:02 pmQuote from admin on November 26, 2024, 8:02 pmyes you should set back to normal the rebalance, noout flags
you can set the backfill speed to very slow, then slowly increase, look at the charts for % disk util as well as cpu and net, make sure they are not stressed before increasing speed.
we will try to reproduce your issue, but i doubt we will. if you have a virtual test environment, can you try to reproduce it ?
yes you should set back to normal the rebalance, noout flags
you can set the backfill speed to very slow, then slowly increase, look at the charts for % disk util as well as cpu and net, make sure they are not stressed before increasing speed.
we will try to reproduce your issue, but i doubt we will. if you have a virtual test environment, can you try to reproduce it ?
kpiti
23 Posts
November 26, 2024, 8:10 pmQuote from kpiti on November 26, 2024, 8:10 pmUnfortunately this is a physical system. I am happy to assist you with the investigation if that is feasible (as long as we don't crush the cluster ;-)..
What was expected and should be the result at the end is old functiong cluster with active OSDs and pools (5 boxes) and 5 similar vanilla boxes that shouldn't have data on.. I can send you some logs from the install if it is any help..
Thanks..
Jure
Unfortunately this is a physical system. I am happy to assist you with the investigation if that is feasible (as long as we don't crush the cluster ;-)..
What was expected and should be the result at the end is old functiong cluster with active OSDs and pools (5 boxes) and 5 similar vanilla boxes that shouldn't have data on.. I can send you some logs from the install if it is any help..
Thanks..
Jure
kpiti
23 Posts
November 26, 2024, 8:15 pmQuote from kpiti on November 26, 2024, 8:15 pmBtw, I found this in ceph docs:
ADJUSTING OSD WEIGHT
Note
Under normal conditions, OSDs automatically add themselves to the CRUSH map with the correct weight when they are created. The command in this section is rarely needed.
But I'd expect it to be applicable *if* OSDs are created first..
Btw, I found this in ceph docs:
ADJUSTING OSD WEIGHT
Note
Under normal conditions, OSDs automatically add themselves to the CRUSH map with the correct weight when they are created. The command in this section is rarely needed.
But I'd expect it to be applicable *if* OSDs are created first..
admin
2,930 Posts
November 28, 2024, 11:46 amQuote from admin on November 28, 2024, 11:46 amWe did tests and could not reproduce this.
Would it be possible for you to share the following taken from any of the nodes with issue (1 node is enough), please share them on a shared storage link.
Contents of log files
/opt/petasan/log/PetaSAN.log
/opt/petasan/log/ceph-volume.log
Output of following commads:
ceph-volume lvm list
consul members
consul kv get -recurse PetaSAN/Nodes
We did tests and could not reproduce this.
Would it be possible for you to share the following taken from any of the nodes with issue (1 node is enough), please share them on a shared storage link.
Contents of log files
/opt/petasan/log/PetaSAN.log
/opt/petasan/log/ceph-volume.log
Output of following commads:
ceph-volume lvm list
consul members
consul kv get -recurse PetaSAN/Nodes
kpiti
23 Posts
November 29, 2024, 10:11 amQuote from kpiti on November 29, 2024, 10:11 amHi, the logs are here - https://fl.forensis.si/logs.tgz
It seems consul is not running on the new nodes, how can I start it...
I've managed to remap all the data from the new nodes and finished all the scrubbing that has got stuck in the meantime.. Tnx..
Hi, the logs are here - https://fl.forensis.si/logs.tgz
It seems consul is not running on the new nodes, how can I start it...
I've managed to remap all the data from the new nodes and finished all the scrubbing that has got stuck in the meantime.. Tnx..
admin
2,930 Posts
November 29, 2024, 7:13 pmQuote from admin on November 29, 2024, 7:13 pmThanks for the logs, unfortunately the PetaSAN.log is truncated probably due to log file rotation, if you can find the first log covering when the node was added it will help a lot, try the other nodes and see if the initial log file exists.
You can start consul manually using the script
/opt/petasan/scripts/consul_client_start_up.py
it should have started automatically, so there may be other problem preventing it from starting. /var/log/syslog may have info in this case.
Thanks for the logs, unfortunately the PetaSAN.log is truncated probably due to log file rotation, if you can find the first log covering when the node was added it will help a lot, try the other nodes and see if the initial log file exists.
You can start consul manually using the script
/opt/petasan/scripts/consul_client_start_up.py
it should have started automatically, so there may be other problem preventing it from starting. /var/log/syslog may have info in this case.
kpiti
23 Posts
December 1, 2024, 11:05 amQuote from kpiti on December 1, 2024, 11:05 amHi,
I've included the rotation logs as well, consul started up without any issues so the commands went through ok. The link is the same..
Hi,
I've included the rotation logs as well, consul started up without any issues so the commands went through ok. The link is the same..
Pages: 1 2
Problems after adding OSD nodes
kpiti
23 Posts
Quote from kpiti on November 26, 2024, 5:01 pmHi,
we had 5 nodes (3 mgr, 3metadata, 5 osd) with 6x8TB disks/osds each + 2 cache nvme + journal on ssd. We decided to upgrade with 5 more nodes with the same configuration and add bigger disks to all in the free slots. So each node got an additional 4x16TB disks.
I started adding nodes and I decided I'll just add nodes to the cluster and add/create OSDs when everything was up and healthy so I didn't mark any disks as OSDs/cache/journal in the install as I wanted to do that later. When all 5 new nodes were installed and joined I went to the dashboard and saw they are listed there but all are marked as down and there isn't any action available (like disk list etc..)
At the same time I got a warning about some PGs not deep scrubbed and heath went to WARN and I saw a lot of PG rebalancing going on. There was a warning about setting norebalance when I installed the new nodes but I (foolishly) thought as I'm not marking disks as OSDs, nothing is going to happen. And now this rebalancing is going on for a couple of days I started digging and I found out that we have 60 OSDs now instead of 30 and in osd tree I can see all the new nodes being full of OSDs.
Now for starters I still can't see the new nodes in the gui (I can see them but marked as Down), I also didn't wan't to add the new disks automatically as we wanted to make some changes and the big disks would help us move the data off the current ones so we could recreate the old pools anew.. As it seems the new nodes got automatically configured the same way the old ones were and the OSDs were activated and rebalancing was hyper active. I turned on norebalance now but I think a lot of data has been transferred to the new osds already..
The cluster looks like this at the moment:
# ceph -s cluster: id: 2e7a0a56-89a1-481d-b78b-7ed5a44f1881 health: HEALTH_WARN noout,norebalance,norecover flag(s) set 703 pgs not deep-scrubbed in time 971 pgs not scrubbed in time services: mon: 3 daemons, quorum CEPH03,CEPH01,CEPH02 (age 10w) mgr: CEPH01(active, since 10w), standbys: CEPH02, CEPH03 mds: 2/2 daemons up, 1 standby osd: 60 osds: 60 up (since 5d), 60 in (since 5d); 480 remapped pgs flags noout,norebalance,norecover data: volumes: 1/1 healthy pools: 4 pools, 2113 pgs objects: 33.95M objects, 41 TiB usage: 127 TiB used, 313 TiB / 440 TiB avail pgs: 2193991/101839914 objects misplaced (2.154%) 1633 active+clean 479 active+remapped+backfill_wait 1 active+remapped+backfilling io: client: 0 B/s rd, 6.6 KiB/s wr, 0 op/s rd, 1 op/s wr progress: Global Recovery Event (10d) [=====================.......] (remaining: 3d)
How can I get the cluster to use just the first 5 nodes (30 OSDs) and leave the new stuff clean for the time being?
I suppose the 5 new nodes/OSDs should get active/Up after the cluster gets to OK health in the node list or is there a problem already?
Luckily the work people do is largely uninterrupted, some just noticed the free space/capacity was increasing..
Thanks, any help appreciated..
Cheers, Jure
Hi,
we had 5 nodes (3 mgr, 3metadata, 5 osd) with 6x8TB disks/osds each + 2 cache nvme + journal on ssd. We decided to upgrade with 5 more nodes with the same configuration and add bigger disks to all in the free slots. So each node got an additional 4x16TB disks.
I started adding nodes and I decided I'll just add nodes to the cluster and add/create OSDs when everything was up and healthy so I didn't mark any disks as OSDs/cache/journal in the install as I wanted to do that later. When all 5 new nodes were installed and joined I went to the dashboard and saw they are listed there but all are marked as down and there isn't any action available (like disk list etc..)
At the same time I got a warning about some PGs not deep scrubbed and heath went to WARN and I saw a lot of PG rebalancing going on. There was a warning about setting norebalance when I installed the new nodes but I (foolishly) thought as I'm not marking disks as OSDs, nothing is going to happen. And now this rebalancing is going on for a couple of days I started digging and I found out that we have 60 OSDs now instead of 30 and in osd tree I can see all the new nodes being full of OSDs.
Now for starters I still can't see the new nodes in the gui (I can see them but marked as Down), I also didn't wan't to add the new disks automatically as we wanted to make some changes and the big disks would help us move the data off the current ones so we could recreate the old pools anew.. As it seems the new nodes got automatically configured the same way the old ones were and the OSDs were activated and rebalancing was hyper active. I turned on norebalance now but I think a lot of data has been transferred to the new osds already..
The cluster looks like this at the moment:
# ceph -s cluster: id: 2e7a0a56-89a1-481d-b78b-7ed5a44f1881 health: HEALTH_WARN noout,norebalance,norecover flag(s) set 703 pgs not deep-scrubbed in time 971 pgs not scrubbed in time services: mon: 3 daemons, quorum CEPH03,CEPH01,CEPH02 (age 10w) mgr: CEPH01(active, since 10w), standbys: CEPH02, CEPH03 mds: 2/2 daemons up, 1 standby osd: 60 osds: 60 up (since 5d), 60 in (since 5d); 480 remapped pgs flags noout,norebalance,norecover data: volumes: 1/1 healthy pools: 4 pools, 2113 pgs objects: 33.95M objects, 41 TiB usage: 127 TiB used, 313 TiB / 440 TiB avail pgs: 2193991/101839914 objects misplaced (2.154%) 1633 active+clean 479 active+remapped+backfill_wait 1 active+remapped+backfilling io: client: 0 B/s rd, 6.6 KiB/s wr, 0 op/s rd, 1 op/s wr progress: Global Recovery Event (10d) [=====================.......] (remaining: 3d)
How can I get the cluster to use just the first 5 nodes (30 OSDs) and leave the new stuff clean for the time being?
I suppose the 5 new nodes/OSDs should get active/Up after the cluster gets to OK health in the node list or is there a problem already?
Luckily the work people do is largely uninterrupted, some just noticed the free space/capacity was increasing..
Thanks, any help appreciated..
Cheers, Jure
admin
2,930 Posts
Quote from admin on November 26, 2024, 7:08 pmyou can try to set the OSD crush weight of the new OSDs to 0, via the UI or via cli. Then when things are stable you can increase the weight gradually.
it is very strange the OSDs were automatically added. were these old OSDs that were used before or were they new drives?
you can try to set the OSD crush weight of the new OSDs to 0, via the UI or via cli. Then when things are stable you can increase the weight gradually.
it is very strange the OSDs were automatically added. were these old OSDs that were used before or were they new drives?
kpiti
23 Posts
Quote from kpiti on November 26, 2024, 7:41 pmBrand new boxes & drives. I did expect I'd have a say in configuring them and was surprised to see them all active.. The extra bigger drives seem not to be activated..
If I set crush weight to 0 do I have to release the rebalancing so it will move the pgs off them?
Brand new boxes & drives. I did expect I'd have a say in configuring them and was surprised to see them all active.. The extra bigger drives seem not to be activated..
If I set crush weight to 0 do I have to release the rebalancing so it will move the pgs off them?
admin
2,930 Posts
Quote from admin on November 26, 2024, 8:02 pmyes you should set back to normal the rebalance, noout flags
you can set the backfill speed to very slow, then slowly increase, look at the charts for % disk util as well as cpu and net, make sure they are not stressed before increasing speed.
we will try to reproduce your issue, but i doubt we will. if you have a virtual test environment, can you try to reproduce it ?
yes you should set back to normal the rebalance, noout flags
you can set the backfill speed to very slow, then slowly increase, look at the charts for % disk util as well as cpu and net, make sure they are not stressed before increasing speed.
we will try to reproduce your issue, but i doubt we will. if you have a virtual test environment, can you try to reproduce it ?
kpiti
23 Posts
Quote from kpiti on November 26, 2024, 8:10 pmUnfortunately this is a physical system. I am happy to assist you with the investigation if that is feasible (as long as we don't crush the cluster ;-)..
What was expected and should be the result at the end is old functiong cluster with active OSDs and pools (5 boxes) and 5 similar vanilla boxes that shouldn't have data on.. I can send you some logs from the install if it is any help..
Thanks..
Jure
Unfortunately this is a physical system. I am happy to assist you with the investigation if that is feasible (as long as we don't crush the cluster ;-)..
What was expected and should be the result at the end is old functiong cluster with active OSDs and pools (5 boxes) and 5 similar vanilla boxes that shouldn't have data on.. I can send you some logs from the install if it is any help..
Thanks..
Jure
kpiti
23 Posts
Quote from kpiti on November 26, 2024, 8:15 pmBtw, I found this in ceph docs:
ADJUSTING OSD WEIGHT
Note
Under normal conditions, OSDs automatically add themselves to the CRUSH map with the correct weight when they are created. The command in this section is rarely needed.
But I'd expect it to be applicable *if* OSDs are created first..
Btw, I found this in ceph docs:
ADJUSTING OSD WEIGHT
Note
Under normal conditions, OSDs automatically add themselves to the CRUSH map with the correct weight when they are created. The command in this section is rarely needed.
But I'd expect it to be applicable *if* OSDs are created first..
admin
2,930 Posts
Quote from admin on November 28, 2024, 11:46 amWe did tests and could not reproduce this.
Would it be possible for you to share the following taken from any of the nodes with issue (1 node is enough), please share them on a shared storage link.
Contents of log files
/opt/petasan/log/PetaSAN.log
/opt/petasan/log/ceph-volume.logOutput of following commads:
ceph-volume lvm list
consul members
consul kv get -recurse PetaSAN/Nodes
We did tests and could not reproduce this.
Would it be possible for you to share the following taken from any of the nodes with issue (1 node is enough), please share them on a shared storage link.
Contents of log files
/opt/petasan/log/PetaSAN.log
/opt/petasan/log/ceph-volume.log
Output of following commads:
ceph-volume lvm list
consul members
consul kv get -recurse PetaSAN/Nodes
kpiti
23 Posts
Quote from kpiti on November 29, 2024, 10:11 amHi, the logs are here - https://fl.forensis.si/logs.tgz
It seems consul is not running on the new nodes, how can I start it...
I've managed to remap all the data from the new nodes and finished all the scrubbing that has got stuck in the meantime.. Tnx..
Hi, the logs are here - https://fl.forensis.si/logs.tgz
It seems consul is not running on the new nodes, how can I start it...
I've managed to remap all the data from the new nodes and finished all the scrubbing that has got stuck in the meantime.. Tnx..
admin
2,930 Posts
Quote from admin on November 29, 2024, 7:13 pmThanks for the logs, unfortunately the PetaSAN.log is truncated probably due to log file rotation, if you can find the first log covering when the node was added it will help a lot, try the other nodes and see if the initial log file exists.
You can start consul manually using the script
/opt/petasan/scripts/consul_client_start_up.py
it should have started automatically, so there may be other problem preventing it from starting. /var/log/syslog may have info in this case.
Thanks for the logs, unfortunately the PetaSAN.log is truncated probably due to log file rotation, if you can find the first log covering when the node was added it will help a lot, try the other nodes and see if the initial log file exists.
You can start consul manually using the script
/opt/petasan/scripts/consul_client_start_up.py
it should have started automatically, so there may be other problem preventing it from starting. /var/log/syslog may have info in this case.
kpiti
23 Posts
Quote from kpiti on December 1, 2024, 11:05 amHi,
I've included the rotation logs as well, consul started up without any issues so the commands went through ok. The link is the same..
Hi,
I've included the rotation logs as well, consul started up without any issues so the commands went through ok. The link is the same..