cluster sizing
rophee
8 Posts
February 28, 2024, 6:37 pmQuote from rophee on February 28, 2024, 6:37 pmwe have at least 5TB worth of data. our cluster is made up of 4 nodes with 8X898GB OSDs. we created 2X 5TB iSCSI drives. we connected the 2 5TB to a Windows Server 2022 creating 2 drives. we started migrating our 5TB data, splitting it to 2.5TB between the 2 drives. is our cluster undersized? How does Petasan (CEPH) reclaim space where data/file was deleted? we kept our data under 5TB in total by deleting files and yet it seems the space is not being reused.
898GBX32 = 28,736TB (Total space)
3X5TB = 15TB (Theoretical used space)
we have at least 5TB worth of data. our cluster is made up of 4 nodes with 8X898GB OSDs. we created 2X 5TB iSCSI drives. we connected the 2 5TB to a Windows Server 2022 creating 2 drives. we started migrating our 5TB data, splitting it to 2.5TB between the 2 drives. is our cluster undersized? How does Petasan (CEPH) reclaim space where data/file was deleted? we kept our data under 5TB in total by deleting files and yet it seems the space is not being reused.
898GBX32 = 28,736TB (Total space)
3X5TB = 15TB (Theoretical used space)
admin
2,930 Posts
February 28, 2024, 10:41 pmQuote from admin on February 28, 2024, 10:41 pmAre you running out of space: look at the storage wheel on dashboard, OSD %usage, storage chart or pool all on dashboard. You can also use the ceph df command.
When you delete large files, the filesystem on Windows (NTFS) does not actually unwrite the data block/sectors written, it will just change some metadata to indicate the previously writt blocks can be over-written so will report the space as free from Windows, but PetaSAN at the block layer does not understand the client filesystem so it will report the data as used. It is possible to edit the disk and enable trim/reclaim which will make the filesystem at times send block commands to remove the sectors labeled as free, but this is not recommended for Windows, it will slow things without a real advantage.
Are you running out of space: look at the storage wheel on dashboard, OSD %usage, storage chart or pool all on dashboard. You can also use the ceph df command.
When you delete large files, the filesystem on Windows (NTFS) does not actually unwrite the data block/sectors written, it will just change some metadata to indicate the previously writt blocks can be over-written so will report the space as free from Windows, but PetaSAN at the block layer does not understand the client filesystem so it will report the data as used. It is possible to edit the disk and enable trim/reclaim which will make the filesystem at times send block commands to remove the sectors labeled as free, but this is not recommended for Windows, it will slow things without a real advantage.
rophee
8 Posts
February 29, 2024, 1:09 amQuote from rophee on February 29, 2024, 1:09 amyour comment has the same tone as what i've read from googling. our cluster is totally out of space. since we are at a point where we can still rework the cluster what can we do differently?
#ceph -s
root@ps-node-01:~# ceph -s
cluster:
id: b544c7c8-9f3b-4572-b999-d863ed0f99dc
health: HEALTH_ERR
10 backfillfull osd(s)
1 full osd(s)
5 nearfull osd(s)
4 pool(s) full
services:
mon: 3 daemons, quorum ps-node-03,ps-node-01,ps-node-02 (age 2w)
mgr: ps-node-02(active, since 2w), standbys: ps-node-03, ps-node-01
mds: 1/1 daemons up, 2 standby
osd: 32 osds: 32 up (since 2w), 32 in (since 5w)
data:
volumes: 1/1 healthy
pools: 4 pools, 1696 pgs
objects: 1.96M objects, 7.5 TiB
usage: 24 TiB used, 3.8 TiB / 28 TiB avail
pgs: 1695 active+clean
1 active+clean+scrubbing+deep
your comment has the same tone as what i've read from googling. our cluster is totally out of space. since we are at a point where we can still rework the cluster what can we do differently?
#ceph -s
root@ps-node-01:~# ceph -s
cluster:
id: b544c7c8-9f3b-4572-b999-d863ed0f99dc
health: HEALTH_ERR
10 backfillfull osd(s)
1 full osd(s)
5 nearfull osd(s)
4 pool(s) full
services:
mon: 3 daemons, quorum ps-node-03,ps-node-01,ps-node-02 (age 2w)
mgr: ps-node-02(active, since 2w), standbys: ps-node-03, ps-node-01
mds: 1/1 daemons up, 2 standby
osd: 32 osds: 32 up (since 2w), 32 in (since 5w)
data:
volumes: 1/1 healthy
pools: 4 pools, 1696 pgs
objects: 1.96M objects, 7.5 TiB
usage: 24 TiB used, 3.8 TiB / 28 TiB avail
pgs: 1695 active+clean
1 active+clean+scrubbing+deep
admin
2,930 Posts
February 29, 2024, 5:53 amQuote from admin on February 29, 2024, 5:53 ambest is to add new OSD drives. start with the host that has the full OSD, you could see this from dashboard.
less recommended is to try to lower the OSD crush weight slightly of the full OSDs, but this may result in creating other OSDs that are filled. least will be to enable trim/reclaim by editing the iSCSI and issue commands from Windows to invoke trim. last. last would be to backup data from one iSCSI disk and delete it then recreate smaller one.
best is to add new OSD drives. start with the host that has the full OSD, you could see this from dashboard.
less recommended is to try to lower the OSD crush weight slightly of the full OSDs, but this may result in creating other OSDs that are filled. least will be to enable trim/reclaim by editing the iSCSI and issue commands from Windows to invoke trim. last. last would be to backup data from one iSCSI disk and delete it then recreate smaller one.
Last edited on February 29, 2024, 5:54 am by admin · #4
rophee
8 Posts
March 3, 2024, 2:13 pmQuote from rophee on March 3, 2024, 2:13 pmfor completeness and posterity, and to give thanks to PetaSAN for the continued help to people like us.
as recommended, this part we have an unused drive in our storage sled, we added new OSD to the node that has the full OSD - viewing from the dashboard. as the cluster starts to backfill, we monitored the usage and added new OSD where the high usage is, until we've added new OSDs on all the nodes. we only added 1 OSD per node. it took a while and let the cluster heal itself. only after the cluster is back to "HEALTH_OK", that we started to modify each iSCSI drive and ticked 'Yes' under 'Enable Trim/Discard'. At first, we thought of letting the cluster do it's cleanup, but it's somehow needing a nudge from our Windows server. we ran 'optimize-volume -driveletter -<drive letter> -defrag -verbose', and the cluster started to gain free space. we ran the powershell command a couple of times on each iSCSI drive as the cluster slowly reclaim free space. the cluster stopped almost at the theoretical used drive space, 15.2TB.
for completeness and posterity, and to give thanks to PetaSAN for the continued help to people like us.
as recommended, this part we have an unused drive in our storage sled, we added new OSD to the node that has the full OSD - viewing from the dashboard. as the cluster starts to backfill, we monitored the usage and added new OSD where the high usage is, until we've added new OSDs on all the nodes. we only added 1 OSD per node. it took a while and let the cluster heal itself. only after the cluster is back to "HEALTH_OK", that we started to modify each iSCSI drive and ticked 'Yes' under 'Enable Trim/Discard'. At first, we thought of letting the cluster do it's cleanup, but it's somehow needing a nudge from our Windows server. we ran 'optimize-volume -driveletter -<drive letter> -defrag -verbose', and the cluster started to gain free space. we ran the powershell command a couple of times on each iSCSI drive as the cluster slowly reclaim free space. the cluster stopped almost at the theoretical used drive space, 15.2TB.
admin
2,930 Posts
March 3, 2024, 7:19 pmQuote from admin on March 3, 2024, 7:19 pmThanks for the feedback and glad things worked.
Just a recommendation, since you added storage, there is no need for trim. As per earlier, the filesystem will re-use the space it gained from deleting files, so no space waste without having to use trim. Trim overhead can reduce performance if done regularly.
Thanks for the feedback and glad things worked.
Just a recommendation, since you added storage, there is no need for trim. As per earlier, the filesystem will re-use the space it gained from deleting files, so no space waste without having to use trim. Trim overhead can reduce performance if done regularly.
rophee
8 Posts
March 8, 2024, 1:26 amQuote from rophee on March 8, 2024, 1:26 amyour recommendation is to turn off 'Enable Trim/Discard'?
your recommendation is to turn off 'Enable Trim/Discard'?
cluster sizing
rophee
8 Posts
Quote from rophee on February 28, 2024, 6:37 pmwe have at least 5TB worth of data. our cluster is made up of 4 nodes with 8X898GB OSDs. we created 2X 5TB iSCSI drives. we connected the 2 5TB to a Windows Server 2022 creating 2 drives. we started migrating our 5TB data, splitting it to 2.5TB between the 2 drives. is our cluster undersized? How does Petasan (CEPH) reclaim space where data/file was deleted? we kept our data under 5TB in total by deleting files and yet it seems the space is not being reused.
898GBX32 = 28,736TB (Total space)
3X5TB = 15TB (Theoretical used space)
we have at least 5TB worth of data. our cluster is made up of 4 nodes with 8X898GB OSDs. we created 2X 5TB iSCSI drives. we connected the 2 5TB to a Windows Server 2022 creating 2 drives. we started migrating our 5TB data, splitting it to 2.5TB between the 2 drives. is our cluster undersized? How does Petasan (CEPH) reclaim space where data/file was deleted? we kept our data under 5TB in total by deleting files and yet it seems the space is not being reused.
898GBX32 = 28,736TB (Total space)
3X5TB = 15TB (Theoretical used space)
admin
2,930 Posts
Quote from admin on February 28, 2024, 10:41 pmAre you running out of space: look at the storage wheel on dashboard, OSD %usage, storage chart or pool all on dashboard. You can also use the ceph df command.
When you delete large files, the filesystem on Windows (NTFS) does not actually unwrite the data block/sectors written, it will just change some metadata to indicate the previously writt blocks can be over-written so will report the space as free from Windows, but PetaSAN at the block layer does not understand the client filesystem so it will report the data as used. It is possible to edit the disk and enable trim/reclaim which will make the filesystem at times send block commands to remove the sectors labeled as free, but this is not recommended for Windows, it will slow things without a real advantage.
Are you running out of space: look at the storage wheel on dashboard, OSD %usage, storage chart or pool all on dashboard. You can also use the ceph df command.
When you delete large files, the filesystem on Windows (NTFS) does not actually unwrite the data block/sectors written, it will just change some metadata to indicate the previously writt blocks can be over-written so will report the space as free from Windows, but PetaSAN at the block layer does not understand the client filesystem so it will report the data as used. It is possible to edit the disk and enable trim/reclaim which will make the filesystem at times send block commands to remove the sectors labeled as free, but this is not recommended for Windows, it will slow things without a real advantage.
rophee
8 Posts
Quote from rophee on February 29, 2024, 1:09 amyour comment has the same tone as what i've read from googling. our cluster is totally out of space. since we are at a point where we can still rework the cluster what can we do differently?
#ceph -s
root@ps-node-01:~# ceph -s
cluster:
id: b544c7c8-9f3b-4572-b999-d863ed0f99dc
health: HEALTH_ERR
10 backfillfull osd(s)
1 full osd(s)
5 nearfull osd(s)
4 pool(s) fullservices:
mon: 3 daemons, quorum ps-node-03,ps-node-01,ps-node-02 (age 2w)
mgr: ps-node-02(active, since 2w), standbys: ps-node-03, ps-node-01
mds: 1/1 daemons up, 2 standby
osd: 32 osds: 32 up (since 2w), 32 in (since 5w)data:
volumes: 1/1 healthy
pools: 4 pools, 1696 pgs
objects: 1.96M objects, 7.5 TiB
usage: 24 TiB used, 3.8 TiB / 28 TiB avail
pgs: 1695 active+clean
1 active+clean+scrubbing+deep
your comment has the same tone as what i've read from googling. our cluster is totally out of space. since we are at a point where we can still rework the cluster what can we do differently?
#ceph -s
root@ps-node-01:~# ceph -s
cluster:
id: b544c7c8-9f3b-4572-b999-d863ed0f99dc
health: HEALTH_ERR
10 backfillfull osd(s)
1 full osd(s)
5 nearfull osd(s)
4 pool(s) full
services:
mon: 3 daemons, quorum ps-node-03,ps-node-01,ps-node-02 (age 2w)
mgr: ps-node-02(active, since 2w), standbys: ps-node-03, ps-node-01
mds: 1/1 daemons up, 2 standby
osd: 32 osds: 32 up (since 2w), 32 in (since 5w)
data:
volumes: 1/1 healthy
pools: 4 pools, 1696 pgs
objects: 1.96M objects, 7.5 TiB
usage: 24 TiB used, 3.8 TiB / 28 TiB avail
pgs: 1695 active+clean
1 active+clean+scrubbing+deep
admin
2,930 Posts
Quote from admin on February 29, 2024, 5:53 ambest is to add new OSD drives. start with the host that has the full OSD, you could see this from dashboard.
less recommended is to try to lower the OSD crush weight slightly of the full OSDs, but this may result in creating other OSDs that are filled. least will be to enable trim/reclaim by editing the iSCSI and issue commands from Windows to invoke trim. last. last would be to backup data from one iSCSI disk and delete it then recreate smaller one.
best is to add new OSD drives. start with the host that has the full OSD, you could see this from dashboard.
less recommended is to try to lower the OSD crush weight slightly of the full OSDs, but this may result in creating other OSDs that are filled. least will be to enable trim/reclaim by editing the iSCSI and issue commands from Windows to invoke trim. last. last would be to backup data from one iSCSI disk and delete it then recreate smaller one.
rophee
8 Posts
Quote from rophee on March 3, 2024, 2:13 pmfor completeness and posterity, and to give thanks to PetaSAN for the continued help to people like us.
as recommended, this part we have an unused drive in our storage sled, we added new OSD to the node that has the full OSD - viewing from the dashboard. as the cluster starts to backfill, we monitored the usage and added new OSD where the high usage is, until we've added new OSDs on all the nodes. we only added 1 OSD per node. it took a while and let the cluster heal itself. only after the cluster is back to "HEALTH_OK", that we started to modify each iSCSI drive and ticked 'Yes' under 'Enable Trim/Discard'. At first, we thought of letting the cluster do it's cleanup, but it's somehow needing a nudge from our Windows server. we ran 'optimize-volume -driveletter -<drive letter> -defrag -verbose', and the cluster started to gain free space. we ran the powershell command a couple of times on each iSCSI drive as the cluster slowly reclaim free space. the cluster stopped almost at the theoretical used drive space, 15.2TB.
for completeness and posterity, and to give thanks to PetaSAN for the continued help to people like us.
as recommended, this part we have an unused drive in our storage sled, we added new OSD to the node that has the full OSD - viewing from the dashboard. as the cluster starts to backfill, we monitored the usage and added new OSD where the high usage is, until we've added new OSDs on all the nodes. we only added 1 OSD per node. it took a while and let the cluster heal itself. only after the cluster is back to "HEALTH_OK", that we started to modify each iSCSI drive and ticked 'Yes' under 'Enable Trim/Discard'. At first, we thought of letting the cluster do it's cleanup, but it's somehow needing a nudge from our Windows server. we ran 'optimize-volume -driveletter -<drive letter> -defrag -verbose', and the cluster started to gain free space. we ran the powershell command a couple of times on each iSCSI drive as the cluster slowly reclaim free space. the cluster stopped almost at the theoretical used drive space, 15.2TB.
admin
2,930 Posts
Quote from admin on March 3, 2024, 7:19 pmThanks for the feedback and glad things worked.
Just a recommendation, since you added storage, there is no need for trim. As per earlier, the filesystem will re-use the space it gained from deleting files, so no space waste without having to use trim. Trim overhead can reduce performance if done regularly.
Thanks for the feedback and glad things worked.
Just a recommendation, since you added storage, there is no need for trim. As per earlier, the filesystem will re-use the space it gained from deleting files, so no space waste without having to use trim. Trim overhead can reduce performance if done regularly.
rophee
8 Posts
Quote from rophee on March 8, 2024, 1:26 amyour recommendation is to turn off 'Enable Trim/Discard'?
your recommendation is to turn off 'Enable Trim/Discard'?