blkdiscard to reclaim space - special requirements?
icecoke
10 Posts
November 15, 2019, 10:18 amQuote from icecoke on November 15, 2019, 10:18 amHi there,
I'm completely to ceph/petasan, so please be kind to me, if I'm asking silly questions 🙂
We currently in a testing stage of ceph/petasan, so all nodes/OSDs are virtualized. Our setup is: we run FreeBSD VMs on XenServer (7.2) with ZFS filesystem in the VM. The petasan storage (existing of 4 xen-virtualized nodes, each with 2 OSDs on local SSDs) is connected as iSCSI target which seems to work well (performance etc.).
Our Problem:
After trying around for a while, the storage fills up as we created and removed different VM devices on the iSCSI storage. No remove of a devices was recognized as free space in the petasan dashboard.
So we tried to call blkdiscard on the iSCSI device from within the xenserver host wich returned:
BLKDISCARD ioctl failed: Operation not supported
I know, by reading posts and blogs, the space is still available for new devices/files but 'just' not shown as free space in the petasan dashboard.
Our Question:
- Is there a way to get blkdiscard or any similar call to the iSCSI device (ceph) working? What requirements (maybe physical like, special SSDs etc.) are needed?
- Is there (as alternative) a way in petasan/ceph to see the 'unused' space instead as the 'free' space, as some of the 'used' space - according to the mentioned posts/blogs - is useable to new files on the storage.
Another point:
As testcase, we filled the storage so high, that petasan (as expected) shows erros:
1 backfillfull osd(s)
4 nearfull osd(s)
1 pool(s) backfillfull
Degraded data redundancy (low space): 7 pgs backfill_toofull
So next important question:
- How to resolve this? How can the pool be shown as full/nearfull etc. if NO file is existing in the iSCSI mount?
- Shouldn't ceph or petasan take care for removing freed blocks when the clients remove files from the iSCSI mount? The current way, it would run into an unusable state each time an over usage occurs for whatever circumstances.
This time, we might could resolve this by just rebuild the complete storage, but what if there are tons of data, we cannot move anywhere as it is a production system? Therefore this questions are very, very vital for us before using it in a real productive environment.
So any help would be really, really appreciated!! Many thanks in advance!
Jimmy
Hi there,
I'm completely to ceph/petasan, so please be kind to me, if I'm asking silly questions 🙂
We currently in a testing stage of ceph/petasan, so all nodes/OSDs are virtualized. Our setup is: we run FreeBSD VMs on XenServer (7.2) with ZFS filesystem in the VM. The petasan storage (existing of 4 xen-virtualized nodes, each with 2 OSDs on local SSDs) is connected as iSCSI target which seems to work well (performance etc.).
Our Problem:
After trying around for a while, the storage fills up as we created and removed different VM devices on the iSCSI storage. No remove of a devices was recognized as free space in the petasan dashboard.
So we tried to call blkdiscard on the iSCSI device from within the xenserver host wich returned:
BLKDISCARD ioctl failed: Operation not supported
I know, by reading posts and blogs, the space is still available for new devices/files but 'just' not shown as free space in the petasan dashboard.
Our Question:
- Is there a way to get blkdiscard or any similar call to the iSCSI device (ceph) working? What requirements (maybe physical like, special SSDs etc.) are needed?
- Is there (as alternative) a way in petasan/ceph to see the 'unused' space instead as the 'free' space, as some of the 'used' space - according to the mentioned posts/blogs - is useable to new files on the storage.
Another point:
As testcase, we filled the storage so high, that petasan (as expected) shows erros:
1 backfillfull osd(s)
4 nearfull osd(s)
1 pool(s) backfillfull
Degraded data redundancy (low space): 7 pgs backfill_toofull
So next important question:
- How to resolve this? How can the pool be shown as full/nearfull etc. if NO file is existing in the iSCSI mount?
- Shouldn't ceph or petasan take care for removing freed blocks when the clients remove files from the iSCSI mount? The current way, it would run into an unusable state each time an over usage occurs for whatever circumstances.
This time, we might could resolve this by just rebuild the complete storage, but what if there are tons of data, we cannot move anywhere as it is a production system? Therefore this questions are very, very vital for us before using it in a real productive environment.
So any help would be really, really appreciated!! Many thanks in advance!
Jimmy
admin
2,930 Posts
November 15, 2019, 11:43 amQuote from admin on November 15, 2019, 11:43 am1) unmap/discard is supported: make sure emulate_tpu is set to 1 in:
/opt/petasan/config/tuning/current/lio_tunings
I am not sure about Xen, if you have 1 vm per datastore, you can always delete the datastore from PetaSAN, else if Xen does manage multiple vms per datastore then it should call unmap/discard itself when deleting a vm. There is no other way for PetaSAN or any other SAN to know which blocks are no longer needed.
2) For the cluster full error :
best option is to add more physical disk / osds, so the data will be rebalanced and your current osds will not be full.
second best is to look at which osds are more full than others via
ceph osd df
and try to re-adjust crush weights and give disks with lower usage higher weights
third option is not increase the ratio at which ceph will stop, but use this with caution:
mon_osd_backfillfull_ratio default 0.900000
mon_osd_full_ratio default 0.950000
osd_failsafe_full_ratio default 0.970000
3) No, if you delete files from a filesystem, any SAN will not detect this, as this is done at the filesystem allocation tables which is only understood by the filesystem itself, this is why a delete takes almost no time , the written block are not removed. Some filesystems provide some command like fstrim which when run by the user will internally call discard on the needed blocks.
1) unmap/discard is supported: make sure emulate_tpu is set to 1 in:
/opt/petasan/config/tuning/current/lio_tunings
I am not sure about Xen, if you have 1 vm per datastore, you can always delete the datastore from PetaSAN, else if Xen does manage multiple vms per datastore then it should call unmap/discard itself when deleting a vm. There is no other way for PetaSAN or any other SAN to know which blocks are no longer needed.
2) For the cluster full error :
best option is to add more physical disk / osds, so the data will be rebalanced and your current osds will not be full.
second best is to look at which osds are more full than others via
ceph osd df
and try to re-adjust crush weights and give disks with lower usage higher weights
third option is not increase the ratio at which ceph will stop, but use this with caution:
mon_osd_backfillfull_ratio default 0.900000
mon_osd_full_ratio default 0.950000
osd_failsafe_full_ratio default 0.970000
3) No, if you delete files from a filesystem, any SAN will not detect this, as this is done at the filesystem allocation tables which is only understood by the filesystem itself, this is why a delete takes almost no time , the written block are not removed. Some filesystems provide some command like fstrim which when run by the user will internally call discard on the needed blocks.
Last edited on November 15, 2019, 11:47 am by admin · #2
icecoke
10 Posts
November 15, 2019, 5:05 pmQuote from icecoke on November 15, 2019, 5:05 pmThanks a lot! That helps much for our next steps here!
Thanks a lot! That helps much for our next steps here!
icecoke
10 Posts
December 27, 2019, 1:55 pmQuote from icecoke on December 27, 2019, 1:55 pmone additional point about TRIM/DISCARD:
I checked the settings (emulate_tpu) above and it is set to 1, nevertheless TRIM does'nt seem to work. We use bluestore in our test, and I read about TRIM and bluestore about these settings
bdev_enable_discard: true
bdev_async_discard: true
but did not found any of these in the configs. Can this be set? Where can I read the current value of these? C0uld this setting be the thing we need?
one additional point about TRIM/DISCARD:
I checked the settings (emulate_tpu) above and it is set to 1, nevertheless TRIM does'nt seem to work. We use bluestore in our test, and I read about TRIM and bluestore about these settings
bdev_enable_discard: true
bdev_async_discard: true
but did not found any of these in the configs. Can this be set? Where can I read the current value of these? C0uld this setting be the thing we need?
blkdiscard to reclaim space - special requirements?
icecoke
10 Posts
Quote from icecoke on November 15, 2019, 10:18 amHi there,
I'm completely to ceph/petasan, so please be kind to me, if I'm asking silly questions 🙂
We currently in a testing stage of ceph/petasan, so all nodes/OSDs are virtualized. Our setup is: we run FreeBSD VMs on XenServer (7.2) with ZFS filesystem in the VM. The petasan storage (existing of 4 xen-virtualized nodes, each with 2 OSDs on local SSDs) is connected as iSCSI target which seems to work well (performance etc.).
Our Problem:
After trying around for a while, the storage fills up as we created and removed different VM devices on the iSCSI storage. No remove of a devices was recognized as free space in the petasan dashboard.
So we tried to call blkdiscard on the iSCSI device from within the xenserver host wich returned:
BLKDISCARD ioctl failed: Operation not supported
I know, by reading posts and blogs, the space is still available for new devices/files but 'just' not shown as free space in the petasan dashboard.
Our Question:
- Is there a way to get blkdiscard or any similar call to the iSCSI device (ceph) working? What requirements (maybe physical like, special SSDs etc.) are needed?
- Is there (as alternative) a way in petasan/ceph to see the 'unused' space instead as the 'free' space, as some of the 'used' space - according to the mentioned posts/blogs - is useable to new files on the storage.
Another point:
As testcase, we filled the storage so high, that petasan (as expected) shows erros:
1 backfillfull osd(s)
4 nearfull osd(s)
1 pool(s) backfillfull
Degraded data redundancy (low space): 7 pgs backfill_toofullSo next important question:
- How to resolve this? How can the pool be shown as full/nearfull etc. if NO file is existing in the iSCSI mount?
- Shouldn't ceph or petasan take care for removing freed blocks when the clients remove files from the iSCSI mount? The current way, it would run into an unusable state each time an over usage occurs for whatever circumstances.
This time, we might could resolve this by just rebuild the complete storage, but what if there are tons of data, we cannot move anywhere as it is a production system? Therefore this questions are very, very vital for us before using it in a real productive environment.
So any help would be really, really appreciated!! Many thanks in advance!
Jimmy
Hi there,
I'm completely to ceph/petasan, so please be kind to me, if I'm asking silly questions 🙂
We currently in a testing stage of ceph/petasan, so all nodes/OSDs are virtualized. Our setup is: we run FreeBSD VMs on XenServer (7.2) with ZFS filesystem in the VM. The petasan storage (existing of 4 xen-virtualized nodes, each with 2 OSDs on local SSDs) is connected as iSCSI target which seems to work well (performance etc.).
Our Problem:
After trying around for a while, the storage fills up as we created and removed different VM devices on the iSCSI storage. No remove of a devices was recognized as free space in the petasan dashboard.
So we tried to call blkdiscard on the iSCSI device from within the xenserver host wich returned:
BLKDISCARD ioctl failed: Operation not supported
I know, by reading posts and blogs, the space is still available for new devices/files but 'just' not shown as free space in the petasan dashboard.
Our Question:
- Is there a way to get blkdiscard or any similar call to the iSCSI device (ceph) working? What requirements (maybe physical like, special SSDs etc.) are needed?
- Is there (as alternative) a way in petasan/ceph to see the 'unused' space instead as the 'free' space, as some of the 'used' space - according to the mentioned posts/blogs - is useable to new files on the storage.
Another point:
As testcase, we filled the storage so high, that petasan (as expected) shows erros:
1 backfillfull osd(s)
4 nearfull osd(s)
1 pool(s) backfillfull
Degraded data redundancy (low space): 7 pgs backfill_toofull
So next important question:
- How to resolve this? How can the pool be shown as full/nearfull etc. if NO file is existing in the iSCSI mount?
- Shouldn't ceph or petasan take care for removing freed blocks when the clients remove files from the iSCSI mount? The current way, it would run into an unusable state each time an over usage occurs for whatever circumstances.
This time, we might could resolve this by just rebuild the complete storage, but what if there are tons of data, we cannot move anywhere as it is a production system? Therefore this questions are very, very vital for us before using it in a real productive environment.
So any help would be really, really appreciated!! Many thanks in advance!
Jimmy
admin
2,930 Posts
Quote from admin on November 15, 2019, 11:43 am1) unmap/discard is supported: make sure emulate_tpu is set to 1 in:
/opt/petasan/config/tuning/current/lio_tunings
I am not sure about Xen, if you have 1 vm per datastore, you can always delete the datastore from PetaSAN, else if Xen does manage multiple vms per datastore then it should call unmap/discard itself when deleting a vm. There is no other way for PetaSAN or any other SAN to know which blocks are no longer needed.
2) For the cluster full error :
best option is to add more physical disk / osds, so the data will be rebalanced and your current osds will not be full.
second best is to look at which osds are more full than others via
ceph osd df
and try to re-adjust crush weights and give disks with lower usage higher weights
third option is not increase the ratio at which ceph will stop, but use this with caution:
mon_osd_backfillfull_ratio default 0.900000
mon_osd_full_ratio default 0.950000
osd_failsafe_full_ratio default 0.9700003) No, if you delete files from a filesystem, any SAN will not detect this, as this is done at the filesystem allocation tables which is only understood by the filesystem itself, this is why a delete takes almost no time , the written block are not removed. Some filesystems provide some command like fstrim which when run by the user will internally call discard on the needed blocks.
1) unmap/discard is supported: make sure emulate_tpu is set to 1 in:
/opt/petasan/config/tuning/current/lio_tunings
I am not sure about Xen, if you have 1 vm per datastore, you can always delete the datastore from PetaSAN, else if Xen does manage multiple vms per datastore then it should call unmap/discard itself when deleting a vm. There is no other way for PetaSAN or any other SAN to know which blocks are no longer needed.
2) For the cluster full error :
best option is to add more physical disk / osds, so the data will be rebalanced and your current osds will not be full.
second best is to look at which osds are more full than others via
ceph osd df
and try to re-adjust crush weights and give disks with lower usage higher weights
third option is not increase the ratio at which ceph will stop, but use this with caution:
mon_osd_backfillfull_ratio default 0.900000
mon_osd_full_ratio default 0.950000
osd_failsafe_full_ratio default 0.970000
3) No, if you delete files from a filesystem, any SAN will not detect this, as this is done at the filesystem allocation tables which is only understood by the filesystem itself, this is why a delete takes almost no time , the written block are not removed. Some filesystems provide some command like fstrim which when run by the user will internally call discard on the needed blocks.
icecoke
10 Posts
Quote from icecoke on November 15, 2019, 5:05 pmThanks a lot! That helps much for our next steps here!
Thanks a lot! That helps much for our next steps here!
icecoke
10 Posts
Quote from icecoke on December 27, 2019, 1:55 pmone additional point about TRIM/DISCARD:
I checked the settings (emulate_tpu) above and it is set to 1, nevertheless TRIM does'nt seem to work. We use bluestore in our test, and I read about TRIM and bluestore about these settings
bdev_enable_discard: true
bdev_async_discard: truebut did not found any of these in the configs. Can this be set? Where can I read the current value of these? C0uld this setting be the thing we need?
one additional point about TRIM/DISCARD:
I checked the settings (emulate_tpu) above and it is set to 1, nevertheless TRIM does'nt seem to work. We use bluestore in our test, and I read about TRIM and bluestore about these settings
bdev_enable_discard: true
bdev_async_discard: true
but did not found any of these in the configs. Can this be set? Where can I read the current value of these? C0uld this setting be the thing we need?