Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Info on PetaSAN administration and settings meaning

Good morning,
I would like to ask where I can find some documentation to study in more detail
some arguments I found in the PetaSAN Administration Guide (v. 1.5), to understand
what are their use and when I might need to customize them, such as:

  • EC profiles;
  • Buckets;
  • Rules;
  • In which case would I need to create other pools ?

Thanks for any info. Ste.

EC profiles : EC stands for erasure coding, instead of creating multiple copies/replicas it add some code blocks for redundancy. A 3+2 profile will chop an object into 3 (k) data blcoks, then compute 2(m) other blocks (think of them like parity bits) for a total of 5 blocks, each block will be saved on a separate host. This setup allows up to any 2 failures, yet requires a storage overhead of 5/3 = 1.66x rather that 3.0x. It is however slower than replicated. This is an example of EC 3+2 "profile", you can also have other profiles like 4+2, you can also create profiles with specific code plugins which could be faster on some cpus..

Buckets: A logic container, instead of having a flat structure of disks being contained in hosts, you can define more complex arrangement for hosts to be in separate racks/rooms/centers..

Rules:  Describe how data for an object is to be placed/stored. You can write a rule to place a copy of the object in different racks, rooms..you can also define a rule to place data on nvme devices.

You may need to have a fast pool for virtualization, slow pool for backups, an extra redundant pool with 5 copies...

 

 

 

 

 

 

Hi.

 

I am also new to this very interesting project.

However I haven't seen/understood how you add certain ODS to a pool , as you mentioned a pool for virtualisation (fast disks) , a slow pool for backup (slow big disks)

 

Maybe you have somewhere a good workshop document that I haven't found yet.

 

Thanks

Best regards, Guy

 

If you have hdds/ssds:

create 2 new CRUSH rules using the provided templates "by-host-hdd" and "by-host-ssd"

create 2 new pools, one using the first new rule, the other the second

The above should be enough. If you want to get more fancier, you can use the CRUSH buckets tree to graphically place hosts under different buckets and write a rule that references/access these buckets (that is why they are refereed to as placement rules), then create pools using these fancy rules.

 

Thank you.

I suppose that the config line "step take default" in the rule "replicated_rule"  means to use the bucket named "default"

 

 

 

 

yes. but for your case, just use the read made templates

Quote from admin on March 26, 2020, 4:11 pm

EC profiles : ... This setup allows up to any 2 failures, yet requires a storage overhead of 5/3 = 1.66x rather that 3.0x. It is however slower than replicated. ...

Thank you for the explaination, I'interested in the quoted sentence. Just to clarify my undestanding: replica <n> means that I can loose <n> disks or <n> hosts (which means for me 10 x <n> disks), without suffering any data loss ?  I understood <n> hosts...

A 3+2 EC profile would be something like RAID6, where two parity blocks are computed ? This will allow up to 2 failures, but again wich failure: disks or hosts ? If this gives the same level of safety (hosts), this might be a good solution for non-performance-critical cluster, to save a lot of space...

Thanks for the clarification. S.

For EC 3+2 profile: each object/disk sector will be chopped into 3 chunks, then another 2 chunks are computed for a total of 5. Each chunk will be stored on a separate host. The data is not lost if any 3 of the 5 chunks is present. On what hosts/disks each object will have its chunks stored is  pseudo-random combination determined by the CRUSH algorithm.

For more complex deployments, you can  specify CRUSH rules to put your chunks each on different racks, rooms...etc. rather than just any 5 different hosts.