Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Unable to create file system with erasure coding

Hello...

I am trying to create a file system for CIFS and/or NFS export on PetaSAN 2.7.3,  that's using a data pool with 3+2  erasure coding, but PetaSAN will only show replicated pools available for selection in the "Add File System" dialog.

According to this

https://docs.ceph.com/en/latest/cephfs/createfs/

using erasure coded pools for cephfs should be possible, as long as they have "overwrites" enabled. No idea what that is, but tried to activate it with this command like suggested in the ceph documentation:

ceph osd pool set my_ec_pool allow_ec_overwrites true

But no change. PetaSAN will still not allow me to select the ec coded data pool when adding a file system...

Suggestions anyone?

as per the link you posted:

If erasure-coded pools are planned for the file system, it is usually better to use a replicated pool for the default data pool to improve small-object write and read performance for updating backtraces. Separately, another erasure-coded data pool can be added (see also Erasure code) that can be used on an entire hierarchy of directories and files (see also File layouts).

so after you create a filesystem with replicated data pool, you can create a filelayout in PetaSAN with an EC pool. Later when you use CIFS/NFS to create shares, you will specify the layout to store the share data.

If it only was that easy... I could crate a file system using a replicated pool. But when I click on the "+ Add Layout" button I only get a big red banner saying: "Alert! Cannot open the Add Layout page."

 

actually it should be that easy 🙂

Is the cluster health ok ? can you show the output of ceph status

root@psan1-nd1:~# ceph status
cluster:
id: 4c9b0226-31dd-44fd-b869-86b58cccc078
health: HEALTH_OK

services:
mon: 3 daemons, quorum psan1-nd3,psan1-nd1,psan1-nd2 (age 21h)
mgr: psan1-nd3(active, since 21h), standbys: psan1-nd2, psan1-nd1
mds: NFS:1 {0=psan1-nd1=up:active} 2 up:standby
osd: 5 osds: 5 up (since 20h), 5 in (since 20h)

task status:
scrub status:
mds.psan1-nd1: idle

data:
pools: 2 pools, 192 pgs
objects: 22 objects, 18 KiB
usage: 5.1 GiB used, 55 TiB / 55 TiB avail
pgs: 192 active+clean

Health looks good. What is strange is at this stage you should have 3 pools not 2. As i understand you did create 2 replicated pools from the pools page , their usage should be cephfs then created a filesystem using these 2 pools : one for data then other for metadata, many ceph users name these pools as cephfs_data and cephfs_metadata. After this you create a new EC pool also with cephfs as usage, then you add a new layout to the existing filesystem and specify the EC pool you just created.

 

Ok, sorry for making a beginners mistake here. I did not know that it's not allowed to use the same pool for metadata and storage. So after creating and using separate pools, I could indeed add layouts as intended - including one with erasure coding.

Unfortunately, the problems did not end here.  First off, I was unable to connect to NFS exports from Proxmox (obviously no reaction to the showmount command???). However I could successfully connect using CIFS. So everything fine? Not quite... because performance was utterly mediocre here. And I don't mean just a little slow - I mean slow to the point of uselessness. I mean... what's the point of using erasure coding anyway? Isn't it because someone has to store a huge amount of data (a bunch of terabytes) and wants to save some money on harddrives in comparison to using replication? But I am sure that this guy would not want to wait for weeks until his data is copied to the cluster - unless he is using ultra high performance hardware like 100% SSDs and top notch CPUs, possibly combined with huge amounts of RAM, which would result not in saving, but in spending even more money...

So the bottom line here for me is: Save yourself the headache and use replication. Its less complicated, and you get decent performance out of standard hardware. Yes, you have to buy a few more drives, but that's obviously worth it.

Hard to say as it depends on what hardware you have, what performance you get, what workload and what client you use to test, We have many installations using EC and they work very well. When doing writes with large block size you can saturate your network. Generally EC is good for backups and large file copy tools, but is poor for small block size applications like virtualisation and databases, in the latter case it would be approx 2 times slower even with decent hardware.

Not sure why NFS is not working, we do have a bug if using custom NFS gateway, else things work really well. Check your mount command and check running it from other clients such as from other PetaSAN nodes to see if it is client related. I assume your cluster is healthy and your NFS Status page show the service is up.