Info on PetaSAN administration and settings meaning
Ste
125 Posts
March 26, 2020, 9:33 amQuote from Ste on March 26, 2020, 9:33 amGood morning,
I would like to ask where I can find some documentation to study in more detail
some arguments I found in the PetaSAN Administration Guide (v. 1.5), to understand
what are their use and when I might need to customize them, such as:
- EC profiles;
- Buckets;
- Rules;
- In which case would I need to create other pools ?
Thanks for any info. Ste.
Good morning,
I would like to ask where I can find some documentation to study in more detail
some arguments I found in the PetaSAN Administration Guide (v. 1.5), to understand
what are their use and when I might need to customize them, such as:
- EC profiles;
- Buckets;
- Rules;
- In which case would I need to create other pools ?
Thanks for any info. Ste.
admin
2,930 Posts
March 26, 2020, 4:11 pmQuote from admin on March 26, 2020, 4:11 pmEC profiles : EC stands for erasure coding, instead of creating multiple copies/replicas it add some code blocks for redundancy. A 3+2 profile will chop an object into 3 (k) data blcoks, then compute 2(m) other blocks (think of them like parity bits) for a total of 5 blocks, each block will be saved on a separate host. This setup allows up to any 2 failures, yet requires a storage overhead of 5/3 = 1.66x rather that 3.0x. It is however slower than replicated. This is an example of EC 3+2 "profile", you can also have other profiles like 4+2, you can also create profiles with specific code plugins which could be faster on some cpus..
Buckets: A logic container, instead of having a flat structure of disks being contained in hosts, you can define more complex arrangement for hosts to be in separate racks/rooms/centers..
Rules: Describe how data for an object is to be placed/stored. You can write a rule to place a copy of the object in different racks, rooms..you can also define a rule to place data on nvme devices.
You may need to have a fast pool for virtualization, slow pool for backups, an extra redundant pool with 5 copies...
EC profiles : EC stands for erasure coding, instead of creating multiple copies/replicas it add some code blocks for redundancy. A 3+2 profile will chop an object into 3 (k) data blcoks, then compute 2(m) other blocks (think of them like parity bits) for a total of 5 blocks, each block will be saved on a separate host. This setup allows up to any 2 failures, yet requires a storage overhead of 5/3 = 1.66x rather that 3.0x. It is however slower than replicated. This is an example of EC 3+2 "profile", you can also have other profiles like 4+2, you can also create profiles with specific code plugins which could be faster on some cpus..
Buckets: A logic container, instead of having a flat structure of disks being contained in hosts, you can define more complex arrangement for hosts to be in separate racks/rooms/centers..
Rules: Describe how data for an object is to be placed/stored. You can write a rule to place a copy of the object in different racks, rooms..you can also define a rule to place data on nvme devices.
You may need to have a fast pool for virtualization, slow pool for backups, an extra redundant pool with 5 copies...
guy71
3 Posts
April 5, 2020, 8:30 amQuote from guy71 on April 5, 2020, 8:30 amHi.
I am also new to this very interesting project.
However I haven't seen/understood how you add certain ODS to a pool , as you mentioned a pool for virtualisation (fast disks) , a slow pool for backup (slow big disks)
Maybe you have somewhere a good workshop document that I haven't found yet.
Thanks
Best regards, Guy
Hi.
I am also new to this very interesting project.
However I haven't seen/understood how you add certain ODS to a pool , as you mentioned a pool for virtualisation (fast disks) , a slow pool for backup (slow big disks)
Maybe you have somewhere a good workshop document that I haven't found yet.
Thanks
Best regards, Guy
admin
2,930 Posts
April 5, 2020, 12:25 pmQuote from admin on April 5, 2020, 12:25 pmIf you have hdds/ssds:
create 2 new CRUSH rules using the provided templates "by-host-hdd" and "by-host-ssd"
create 2 new pools, one using the first new rule, the other the second
The above should be enough. If you want to get more fancier, you can use the CRUSH buckets tree to graphically place hosts under different buckets and write a rule that references/access these buckets (that is why they are refereed to as placement rules), then create pools using these fancy rules.
If you have hdds/ssds:
create 2 new CRUSH rules using the provided templates "by-host-hdd" and "by-host-ssd"
create 2 new pools, one using the first new rule, the other the second
The above should be enough. If you want to get more fancier, you can use the CRUSH buckets tree to graphically place hosts under different buckets and write a rule that references/access these buckets (that is why they are refereed to as placement rules), then create pools using these fancy rules.
guy71
3 Posts
April 8, 2020, 12:10 pmQuote from guy71 on April 8, 2020, 12:10 pmThank you.
I suppose that the config line "step take default" in the rule "replicated_rule" means to use the bucket named "default"
Thank you.
I suppose that the config line "step take default" in the rule "replicated_rule" means to use the bucket named "default"
admin
2,930 Posts
April 8, 2020, 1:16 pmQuote from admin on April 8, 2020, 1:16 pmyes. but for your case, just use the read made templates
yes. but for your case, just use the read made templates
Ste
125 Posts
April 8, 2020, 2:33 pmQuote from Ste on April 8, 2020, 2:33 pm
Quote from admin on March 26, 2020, 4:11 pm
EC profiles : ... This setup allows up to any 2 failures, yet requires a storage overhead of 5/3 = 1.66x rather that 3.0x. It is however slower than replicated. ...
Thank you for the explaination, I'interested in the quoted sentence. Just to clarify my undestanding: replica <n> means that I can loose <n> disks or <n> hosts (which means for me 10 x <n> disks), without suffering any data loss ? I understood <n> hosts...
A 3+2 EC profile would be something like RAID6, where two parity blocks are computed ? This will allow up to 2 failures, but again wich failure: disks or hosts ? If this gives the same level of safety (hosts), this might be a good solution for non-performance-critical cluster, to save a lot of space...
Thanks for the clarification. S.
Quote from admin on March 26, 2020, 4:11 pm
EC profiles : ... This setup allows up to any 2 failures, yet requires a storage overhead of 5/3 = 1.66x rather that 3.0x. It is however slower than replicated. ...
Thank you for the explaination, I'interested in the quoted sentence. Just to clarify my undestanding: replica <n> means that I can loose <n> disks or <n> hosts (which means for me 10 x <n> disks), without suffering any data loss ? I understood <n> hosts...
A 3+2 EC profile would be something like RAID6, where two parity blocks are computed ? This will allow up to 2 failures, but again wich failure: disks or hosts ? If this gives the same level of safety (hosts), this might be a good solution for non-performance-critical cluster, to save a lot of space...
Thanks for the clarification. S.
admin
2,930 Posts
April 8, 2020, 5:42 pmQuote from admin on April 8, 2020, 5:42 pmFor EC 3+2 profile: each object/disk sector will be chopped into 3 chunks, then another 2 chunks are computed for a total of 5. Each chunk will be stored on a separate host. The data is not lost if any 3 of the 5 chunks is present. On what hosts/disks each object will have its chunks stored is pseudo-random combination determined by the CRUSH algorithm.
For more complex deployments, you can specify CRUSH rules to put your chunks each on different racks, rooms...etc. rather than just any 5 different hosts.
For EC 3+2 profile: each object/disk sector will be chopped into 3 chunks, then another 2 chunks are computed for a total of 5. Each chunk will be stored on a separate host. The data is not lost if any 3 of the 5 chunks is present. On what hosts/disks each object will have its chunks stored is pseudo-random combination determined by the CRUSH algorithm.
For more complex deployments, you can specify CRUSH rules to put your chunks each on different racks, rooms...etc. rather than just any 5 different hosts.
Last edited on April 8, 2020, 5:44 pm by admin · #8
Info on PetaSAN administration and settings meaning
Ste
125 Posts
Quote from Ste on March 26, 2020, 9:33 amGood morning,
I would like to ask where I can find some documentation to study in more detail
some arguments I found in the PetaSAN Administration Guide (v. 1.5), to understand
what are their use and when I might need to customize them, such as:
- EC profiles;
- Buckets;
- Rules;
- In which case would I need to create other pools ?
Thanks for any info. Ste.
Good morning,
I would like to ask where I can find some documentation to study in more detail
some arguments I found in the PetaSAN Administration Guide (v. 1.5), to understand
what are their use and when I might need to customize them, such as:
- EC profiles;
- Buckets;
- Rules;
- In which case would I need to create other pools ?
Thanks for any info. Ste.
admin
2,930 Posts
Quote from admin on March 26, 2020, 4:11 pmEC profiles : EC stands for erasure coding, instead of creating multiple copies/replicas it add some code blocks for redundancy. A 3+2 profile will chop an object into 3 (k) data blcoks, then compute 2(m) other blocks (think of them like parity bits) for a total of 5 blocks, each block will be saved on a separate host. This setup allows up to any 2 failures, yet requires a storage overhead of 5/3 = 1.66x rather that 3.0x. It is however slower than replicated. This is an example of EC 3+2 "profile", you can also have other profiles like 4+2, you can also create profiles with specific code plugins which could be faster on some cpus..
Buckets: A logic container, instead of having a flat structure of disks being contained in hosts, you can define more complex arrangement for hosts to be in separate racks/rooms/centers..
Rules: Describe how data for an object is to be placed/stored. You can write a rule to place a copy of the object in different racks, rooms..you can also define a rule to place data on nvme devices.
You may need to have a fast pool for virtualization, slow pool for backups, an extra redundant pool with 5 copies...
EC profiles : EC stands for erasure coding, instead of creating multiple copies/replicas it add some code blocks for redundancy. A 3+2 profile will chop an object into 3 (k) data blcoks, then compute 2(m) other blocks (think of them like parity bits) for a total of 5 blocks, each block will be saved on a separate host. This setup allows up to any 2 failures, yet requires a storage overhead of 5/3 = 1.66x rather that 3.0x. It is however slower than replicated. This is an example of EC 3+2 "profile", you can also have other profiles like 4+2, you can also create profiles with specific code plugins which could be faster on some cpus..
Buckets: A logic container, instead of having a flat structure of disks being contained in hosts, you can define more complex arrangement for hosts to be in separate racks/rooms/centers..
Rules: Describe how data for an object is to be placed/stored. You can write a rule to place a copy of the object in different racks, rooms..you can also define a rule to place data on nvme devices.
You may need to have a fast pool for virtualization, slow pool for backups, an extra redundant pool with 5 copies...
guy71
3 Posts
Quote from guy71 on April 5, 2020, 8:30 amHi.
I am also new to this very interesting project.
However I haven't seen/understood how you add certain ODS to a pool , as you mentioned a pool for virtualisation (fast disks) , a slow pool for backup (slow big disks)
Maybe you have somewhere a good workshop document that I haven't found yet.
Thanks
Best regards, Guy
Hi.
I am also new to this very interesting project.
However I haven't seen/understood how you add certain ODS to a pool , as you mentioned a pool for virtualisation (fast disks) , a slow pool for backup (slow big disks)
Maybe you have somewhere a good workshop document that I haven't found yet.
Thanks
Best regards, Guy
admin
2,930 Posts
Quote from admin on April 5, 2020, 12:25 pmIf you have hdds/ssds:
create 2 new CRUSH rules using the provided templates "by-host-hdd" and "by-host-ssd"
create 2 new pools, one using the first new rule, the other the second
The above should be enough. If you want to get more fancier, you can use the CRUSH buckets tree to graphically place hosts under different buckets and write a rule that references/access these buckets (that is why they are refereed to as placement rules), then create pools using these fancy rules.
If you have hdds/ssds:
create 2 new CRUSH rules using the provided templates "by-host-hdd" and "by-host-ssd"
create 2 new pools, one using the first new rule, the other the second
The above should be enough. If you want to get more fancier, you can use the CRUSH buckets tree to graphically place hosts under different buckets and write a rule that references/access these buckets (that is why they are refereed to as placement rules), then create pools using these fancy rules.
guy71
3 Posts
Quote from guy71 on April 8, 2020, 12:10 pmThank you.
I suppose that the config line "step take default" in the rule "replicated_rule" means to use the bucket named "default"
Thank you.
I suppose that the config line "step take default" in the rule "replicated_rule" means to use the bucket named "default"
admin
2,930 Posts
Quote from admin on April 8, 2020, 1:16 pmyes. but for your case, just use the read made templates
yes. but for your case, just use the read made templates
Ste
125 Posts
Quote from Ste on April 8, 2020, 2:33 pmQuote from admin on March 26, 2020, 4:11 pmEC profiles : ... This setup allows up to any 2 failures, yet requires a storage overhead of 5/3 = 1.66x rather that 3.0x. It is however slower than replicated. ...
Thank you for the explaination, I'interested in the quoted sentence. Just to clarify my undestanding: replica <n> means that I can loose <n> disks or <n> hosts (which means for me 10 x <n> disks), without suffering any data loss ? I understood <n> hosts...
A 3+2 EC profile would be something like RAID6, where two parity blocks are computed ? This will allow up to 2 failures, but again wich failure: disks or hosts ? If this gives the same level of safety (hosts), this might be a good solution for non-performance-critical cluster, to save a lot of space...
Thanks for the clarification. S.
Quote from admin on March 26, 2020, 4:11 pmEC profiles : ... This setup allows up to any 2 failures, yet requires a storage overhead of 5/3 = 1.66x rather that 3.0x. It is however slower than replicated. ...
Thank you for the explaination, I'interested in the quoted sentence. Just to clarify my undestanding: replica <n> means that I can loose <n> disks or <n> hosts (which means for me 10 x <n> disks), without suffering any data loss ? I understood <n> hosts...
A 3+2 EC profile would be something like RAID6, where two parity blocks are computed ? This will allow up to 2 failures, but again wich failure: disks or hosts ? If this gives the same level of safety (hosts), this might be a good solution for non-performance-critical cluster, to save a lot of space...
Thanks for the clarification. S.
admin
2,930 Posts
Quote from admin on April 8, 2020, 5:42 pmFor EC 3+2 profile: each object/disk sector will be chopped into 3 chunks, then another 2 chunks are computed for a total of 5. Each chunk will be stored on a separate host. The data is not lost if any 3 of the 5 chunks is present. On what hosts/disks each object will have its chunks stored is pseudo-random combination determined by the CRUSH algorithm.
For more complex deployments, you can specify CRUSH rules to put your chunks each on different racks, rooms...etc. rather than just any 5 different hosts.
For EC 3+2 profile: each object/disk sector will be chopped into 3 chunks, then another 2 chunks are computed for a total of 5. Each chunk will be stored on a separate host. The data is not lost if any 3 of the 5 chunks is present. On what hosts/disks each object will have its chunks stored is pseudo-random combination determined by the CRUSH algorithm.
For more complex deployments, you can specify CRUSH rules to put your chunks each on different racks, rooms...etc. rather than just any 5 different hosts.