Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

blkdiscard to reclaim space - special requirements?

Hi there,

I'm completely to ceph/petasan, so please be kind to me, if I'm asking silly questions 🙂

 

We currently in a testing stage of ceph/petasan, so all nodes/OSDs are virtualized. Our setup is: we run FreeBSD VMs on XenServer (7.2) with ZFS filesystem in the VM. The petasan storage (existing of 4 xen-virtualized nodes, each with 2 OSDs on local SSDs) is connected as iSCSI target which seems to work well (performance etc.).

Our Problem:

After trying around for a while, the storage fills up as we created and removed different VM devices on the iSCSI storage. No remove of a devices was recognized as free space in the petasan dashboard.

So we tried to call blkdiscard on the iSCSI device from within the xenserver host wich returned:

BLKDISCARD ioctl failed: Operation not supported

I know, by reading posts and blogs, the space is still available for new devices/files but 'just' not shown as free space in the petasan dashboard.

Our Question:

  • Is there a way to get blkdiscard or any similar call to the iSCSI device (ceph) working? What requirements (maybe physical like, special SSDs etc.) are needed?
  • Is there (as alternative) a way in petasan/ceph to see the 'unused' space instead as the 'free' space, as some of the 'used' space - according to the mentioned posts/blogs - is useable to new files on the storage.

Another point:

As testcase, we filled the storage so high, that petasan (as expected) shows erros:

1 backfillfull osd(s)
4 nearfull osd(s)
1 pool(s) backfillfull
Degraded data redundancy (low space): 7 pgs backfill_toofull

So next important question:

  • How to resolve this? How can the pool be shown as full/nearfull etc. if NO file is existing in the iSCSI mount?
  • Shouldn't ceph or petasan take care for removing freed blocks when the clients remove files from the iSCSI mount? The current way, it would run into an unusable state each time an over usage occurs for whatever circumstances.

This time, we might could resolve this by just rebuild the complete storage, but what if there are tons of data, we cannot move anywhere as it is a production system? Therefore this questions are very, very vital for us before using it in a real productive environment.

So any help would be really, really appreciated!! Many thanks in advance!

Jimmy

1)   unmap/discard is supported: make sure emulate_tpu is set to 1 in:

/opt/petasan/config/tuning/current/lio_tunings

I am not sure about Xen, if you have 1 vm per datastore, you can always delete the datastore from PetaSAN, else if Xen does manage multiple vms per datastore then it should call unmap/discard itself when deleting a vm. There is no other way for PetaSAN or any other SAN to know which blocks are no longer needed.

2) For the cluster full error :

best option is to add more physical disk / osds, so the data will be rebalanced and your current osds will not be full.

second best is to look at which osds are more full than others via

ceph osd df

and try to  re-adjust crush weights and give disks with lower usage higher weights

third option is not increase the ratio at which ceph will stop, but use this with caution:

mon_osd_backfillfull_ratio default 0.900000
mon_osd_full_ratio default 0.950000
osd_failsafe_full_ratio  default 0.970000

3) No, if you delete files from a filesystem, any SAN will not detect this, as this is done at the filesystem allocation tables which is only understood by the filesystem itself, this is why a delete takes almost no time , the written block are not removed.  Some filesystems provide some command like fstrim which when run by the user will internally call discard on the needed blocks.

 

 

Thanks a lot! That helps much for our next steps here!

one additional point about TRIM/DISCARD:

I checked the settings (emulate_tpu) above and it is set to 1, nevertheless TRIM does'nt seem to work. We use bluestore in our test, and I read about TRIM and bluestore about these settings

bdev_enable_discard: true
bdev_async_discard: true

but did not found any of these in the configs. Can this be set? Where can I read the current value of these? C0uld this setting be the thing we need?