PetaSAN 2.2 Released!
admin
2,930 Posts
November 10, 2018, 12:00 pmQuote from admin on November 10, 2018, 12:00 pmWe do have some tooltips in the ui on this, we will also be upgrading the admin guide to include ec.
Some basic explanation:
Each iSCSI disk is composed of 4MB sectors / objects. A 4GB disk has 1000 objects.
For a replicated pool each object is stored in 2x/3x/.. (the size attribute of pool) exact copies, each copy is stored in a different "place". As long as there is 1 copy of the object, there is no data loss.
For an ec pool, each 4MB object is divided into k "chunks", an additional m chunks are computed, each of the k+m chunks are stored in different "place". As long as there is a total of k chunks present there is no data loss. A k=4, m=2 can sustain any 2 lost chunks on an object without data loss.
The different "places" to store the object is controlled by the crush placement rule. Obviously you do not want to store all replicas/chunks for an object on 1 disk or on 1 node. You can have a rule to place the replicas on different host or different racks and on specific device types such as hdds or ssds.
In PetaSAN we have different rule templates for the most common usage. If you select a rule to place each chunk on a separate host, to use k=4, m=2 profile you need at least 6 hosts.
If you want to try EC pools on a 3 node system:
- First create an EC rule from the built in templates: ec-by-host-hdd or ec-by-host-ssd
- Add a new EC pool, choose the k=2 m=1 ec profile which requires 3 hosts + select the EC rule you just created.
- When adding an iSCSI disk to store data on EC, you need to specify a replicated pool to store image metada and an EC pool under "data pool" where data is actually stored.
We do have some tooltips in the ui on this, we will also be upgrading the admin guide to include ec.
Some basic explanation:
Each iSCSI disk is composed of 4MB sectors / objects. A 4GB disk has 1000 objects.
For a replicated pool each object is stored in 2x/3x/.. (the size attribute of pool) exact copies, each copy is stored in a different "place". As long as there is 1 copy of the object, there is no data loss.
For an ec pool, each 4MB object is divided into k "chunks", an additional m chunks are computed, each of the k+m chunks are stored in different "place". As long as there is a total of k chunks present there is no data loss. A k=4, m=2 can sustain any 2 lost chunks on an object without data loss.
The different "places" to store the object is controlled by the crush placement rule. Obviously you do not want to store all replicas/chunks for an object on 1 disk or on 1 node. You can have a rule to place the replicas on different host or different racks and on specific device types such as hdds or ssds.
In PetaSAN we have different rule templates for the most common usage. If you select a rule to place each chunk on a separate host, to use k=4, m=2 profile you need at least 6 hosts.
If you want to try EC pools on a 3 node system:
- First create an EC rule from the built in templates: ec-by-host-hdd or ec-by-host-ssd
- Add a new EC pool, choose the k=2 m=1 ec profile which requires 3 hosts + select the EC rule you just created.
- When adding an iSCSI disk to store data on EC, you need to specify a replicated pool to store image metada and an EC pool under "data pool" where data is actually stored.
Last edited on November 10, 2018, 12:03 pm by admin · #11
wailer
75 Posts
November 12, 2018, 12:31 pmQuote from wailer on November 12, 2018, 12:31 pm
Quote from admin on November 9, 2018, 5:02 pm
You need k+m nodes/racks to run EC
For 3 nodes you can use the k=2 m=1 profile for testing but this is not recommended for real production. for testing you can set the min pool size to 2 (we set it to k+1 = 3) so your io will still be active if 1 node fails..again this is not safe and not recommended as you would be writing to a pool which now has no redundancy.
Hi Admin,
I was wondering about this warning about using 3 nodes for EC , we are about to deploy a cluster for cold storage, and erasure coding little overhead compared to replica model looks pretty tempting.
When using replica, when one node fails, data gets replicated again to another working node, only risk here is you might ran out of space if you don't have enough raw storage. Right?
When using EC and one node fails , data chunks aren't reallocated again into another node? How would you recover redundancy in this case?
Thanks!
Quote from admin on November 9, 2018, 5:02 pm
You need k+m nodes/racks to run EC
For 3 nodes you can use the k=2 m=1 profile for testing but this is not recommended for real production. for testing you can set the min pool size to 2 (we set it to k+1 = 3) so your io will still be active if 1 node fails..again this is not safe and not recommended as you would be writing to a pool which now has no redundancy.
Hi Admin,
I was wondering about this warning about using 3 nodes for EC , we are about to deploy a cluster for cold storage, and erasure coding little overhead compared to replica model looks pretty tempting.
When using replica, when one node fails, data gets replicated again to another working node, only risk here is you might ran out of space if you don't have enough raw storage. Right?
When using EC and one node fails , data chunks aren't reallocated again into another node? How would you recover redundancy in this case?
Thanks!
admin
2,930 Posts
November 12, 2018, 1:15 pmQuote from admin on November 12, 2018, 1:15 pmWith k=2 m=1, you need 3 chunks per object. The placement rule will put them on separate hosts, it will not put 2 chunks in 1 host as this would defeat the purpose. So in such profile with 3 hosts and 1 host fails, no recovery will take place. We also set the min size of the pool to k+1, this is the recommended value, this controls whether the pool will still serve io or not ( ie if it is active ), so in this case io will stop since the min size is 3 and you have 2 nodes. You can edit the min size to make it 2 so your pool is active again, but you will be writing to a pool that now has no redundancy.
Knowing this you can go ahead and use this profile, but we recommend a min profile of k=3 m=2, or recommended k=4 m=2, the later requires 6 nodes/racks.
With k=2 m=1, you need 3 chunks per object. The placement rule will put them on separate hosts, it will not put 2 chunks in 1 host as this would defeat the purpose. So in such profile with 3 hosts and 1 host fails, no recovery will take place. We also set the min size of the pool to k+1, this is the recommended value, this controls whether the pool will still serve io or not ( ie if it is active ), so in this case io will stop since the min size is 3 and you have 2 nodes. You can edit the min size to make it 2 so your pool is active again, but you will be writing to a pool that now has no redundancy.
Knowing this you can go ahead and use this profile, but we recommend a min profile of k=3 m=2, or recommended k=4 m=2, the later requires 6 nodes/racks.
wailer
75 Posts
November 12, 2018, 1:23 pmQuote from wailer on November 12, 2018, 1:23 pmGot it , so would it be correct to say that we would get redundancy back , as soon the failed node comes back up?
Got it , so would it be correct to say that we would get redundancy back , as soon the failed node comes back up?
admin
2,930 Posts
msalem
87 Posts
November 14, 2018, 2:50 pmQuote from msalem on November 14, 2018, 2:50 pmHey Admin
I have 6 nodes and I need to setup EC, K=4 M=2 .. There are a few options here if you can explain what needs to be set.
https://ibb.co/kih9LL
Thanks
Hey Admin
I have 6 nodes and I need to setup EC, K=4 M=2 .. There are a few options here if you can explain what needs to be set.
Thanks
admin
2,930 Posts
November 14, 2018, 4:34 pmQuote from admin on November 14, 2018, 4:34 pmThe Rule defines how you want the replicas/chunks to be placed. In your case you would first go to Configuration ->CRUSH->Rules and create an EC rule, you can select a ready made template ec-by-host-hdd or ec-by-host-ssd. For more elaborate setups you can choose to place the replicas in different racks (ec-by-rack-hdd) ..etc and use the fancy ui to define racks/room...
Re number of PGs, there is tooltip on this..basically an OSD/disk can handle to serve anywhere from 20 to 300 PGs, with 100 being ideal, anything less or more will affect operation such as load, weight balancing..etc. It may not be easy but you should make an effort to have as many PGs in your pools so that each OSD will have a number close to 100. This value includes replicas/chunks.
Total of ( Pool PGs x Pool Size) / num of OSDs serving the pools = approx 100
For users who just use the default pool, PetaSAN can assign this value but for more complex setups with many pools, and different placement rules you have to do this yourself.
The Rule defines how you want the replicas/chunks to be placed. In your case you would first go to Configuration ->CRUSH->Rules and create an EC rule, you can select a ready made template ec-by-host-hdd or ec-by-host-ssd. For more elaborate setups you can choose to place the replicas in different racks (ec-by-rack-hdd) ..etc and use the fancy ui to define racks/room...
Re number of PGs, there is tooltip on this..basically an OSD/disk can handle to serve anywhere from 20 to 300 PGs, with 100 being ideal, anything less or more will affect operation such as load, weight balancing..etc. It may not be easy but you should make an effort to have as many PGs in your pools so that each OSD will have a number close to 100. This value includes replicas/chunks.
Total of ( Pool PGs x Pool Size) / num of OSDs serving the pools = approx 100
For users who just use the default pool, PetaSAN can assign this value but for more complex setups with many pools, and different placement rules you have to do this yourself.
Last edited on November 14, 2018, 4:35 pm by admin · #17
msalem
87 Posts
November 19, 2018, 8:23 amQuote from msalem on November 19, 2018, 8:23 amHello Admin,
After creating the cluster, I have this message any way to resolve it.
"1 pools have many more objects per pg than average"
Hello Admin,
After creating the cluster, I have this message any way to resolve it.
"1 pools have many more objects per pg than average"
admin
2,930 Posts
November 19, 2018, 1:09 pmQuote from admin on November 19, 2018, 1:09 pmThis is usually an indication that the pool(s) containing most of the data in the cluster have too few PGs, and/or that other pools that do not contain as much data have too many PGs.
The threshold can be raised to silence the health warning by adjusting the mon_pg_warn_max_object_skew config option on the monitors. A non-positive number disables this setting.
This is usually an indication that the pool(s) containing most of the data in the cluster have too few PGs, and/or that other pools that do not contain as much data have too many PGs.
The threshold can be raised to silence the health warning by adjusting the mon_pg_warn_max_object_skew config option on the monitors. A non-positive number disables this setting.
msalem
87 Posts
November 19, 2018, 1:55 pmQuote from msalem on November 19, 2018, 1:55 pmSo far all we have is one pool that we will be chunking ISCSI targets from. -
So this can be ignored ? -- can you please send the steps to silence this alerts.
Thanks
So far all we have is one pool that we will be chunking ISCSI targets from. -
So this can be ignored ? -- can you please send the steps to silence this alerts.
Thanks
PetaSAN 2.2 Released!
admin
2,930 Posts
Quote from admin on November 10, 2018, 12:00 pmWe do have some tooltips in the ui on this, we will also be upgrading the admin guide to include ec.
Some basic explanation:
Each iSCSI disk is composed of 4MB sectors / objects. A 4GB disk has 1000 objects.
For a replicated pool each object is stored in 2x/3x/.. (the size attribute of pool) exact copies, each copy is stored in a different "place". As long as there is 1 copy of the object, there is no data loss.
For an ec pool, each 4MB object is divided into k "chunks", an additional m chunks are computed, each of the k+m chunks are stored in different "place". As long as there is a total of k chunks present there is no data loss. A k=4, m=2 can sustain any 2 lost chunks on an object without data loss.The different "places" to store the object is controlled by the crush placement rule. Obviously you do not want to store all replicas/chunks for an object on 1 disk or on 1 node. You can have a rule to place the replicas on different host or different racks and on specific device types such as hdds or ssds.
In PetaSAN we have different rule templates for the most common usage. If you select a rule to place each chunk on a separate host, to use k=4, m=2 profile you need at least 6 hosts.If you want to try EC pools on a 3 node system:
- First create an EC rule from the built in templates: ec-by-host-hdd or ec-by-host-ssd
- Add a new EC pool, choose the k=2 m=1 ec profile which requires 3 hosts + select the EC rule you just created.
- When adding an iSCSI disk to store data on EC, you need to specify a replicated pool to store image metada and an EC pool under "data pool" where data is actually stored.
We do have some tooltips in the ui on this, we will also be upgrading the admin guide to include ec.
Some basic explanation:
Each iSCSI disk is composed of 4MB sectors / objects. A 4GB disk has 1000 objects.
For a replicated pool each object is stored in 2x/3x/.. (the size attribute of pool) exact copies, each copy is stored in a different "place". As long as there is 1 copy of the object, there is no data loss.
For an ec pool, each 4MB object is divided into k "chunks", an additional m chunks are computed, each of the k+m chunks are stored in different "place". As long as there is a total of k chunks present there is no data loss. A k=4, m=2 can sustain any 2 lost chunks on an object without data loss.
The different "places" to store the object is controlled by the crush placement rule. Obviously you do not want to store all replicas/chunks for an object on 1 disk or on 1 node. You can have a rule to place the replicas on different host or different racks and on specific device types such as hdds or ssds.
In PetaSAN we have different rule templates for the most common usage. If you select a rule to place each chunk on a separate host, to use k=4, m=2 profile you need at least 6 hosts.
If you want to try EC pools on a 3 node system:
- First create an EC rule from the built in templates: ec-by-host-hdd or ec-by-host-ssd
- Add a new EC pool, choose the k=2 m=1 ec profile which requires 3 hosts + select the EC rule you just created.
- When adding an iSCSI disk to store data on EC, you need to specify a replicated pool to store image metada and an EC pool under "data pool" where data is actually stored.
wailer
75 Posts
Quote from wailer on November 12, 2018, 12:31 pmQuote from admin on November 9, 2018, 5:02 pmYou need k+m nodes/racks to run EC
For 3 nodes you can use the k=2 m=1 profile for testing but this is not recommended for real production. for testing you can set the min pool size to 2 (we set it to k+1 = 3) so your io will still be active if 1 node fails..again this is not safe and not recommended as you would be writing to a pool which now has no redundancy.
Hi Admin,
I was wondering about this warning about using 3 nodes for EC , we are about to deploy a cluster for cold storage, and erasure coding little overhead compared to replica model looks pretty tempting.
When using replica, when one node fails, data gets replicated again to another working node, only risk here is you might ran out of space if you don't have enough raw storage. Right?
When using EC and one node fails , data chunks aren't reallocated again into another node? How would you recover redundancy in this case?
Thanks!
Quote from admin on November 9, 2018, 5:02 pmYou need k+m nodes/racks to run EC
For 3 nodes you can use the k=2 m=1 profile for testing but this is not recommended for real production. for testing you can set the min pool size to 2 (we set it to k+1 = 3) so your io will still be active if 1 node fails..again this is not safe and not recommended as you would be writing to a pool which now has no redundancy.
Hi Admin,
I was wondering about this warning about using 3 nodes for EC , we are about to deploy a cluster for cold storage, and erasure coding little overhead compared to replica model looks pretty tempting.
When using replica, when one node fails, data gets replicated again to another working node, only risk here is you might ran out of space if you don't have enough raw storage. Right?
When using EC and one node fails , data chunks aren't reallocated again into another node? How would you recover redundancy in this case?
Thanks!
admin
2,930 Posts
Quote from admin on November 12, 2018, 1:15 pmWith k=2 m=1, you need 3 chunks per object. The placement rule will put them on separate hosts, it will not put 2 chunks in 1 host as this would defeat the purpose. So in such profile with 3 hosts and 1 host fails, no recovery will take place. We also set the min size of the pool to k+1, this is the recommended value, this controls whether the pool will still serve io or not ( ie if it is active ), so in this case io will stop since the min size is 3 and you have 2 nodes. You can edit the min size to make it 2 so your pool is active again, but you will be writing to a pool that now has no redundancy.
Knowing this you can go ahead and use this profile, but we recommend a min profile of k=3 m=2, or recommended k=4 m=2, the later requires 6 nodes/racks.
With k=2 m=1, you need 3 chunks per object. The placement rule will put them on separate hosts, it will not put 2 chunks in 1 host as this would defeat the purpose. So in such profile with 3 hosts and 1 host fails, no recovery will take place. We also set the min size of the pool to k+1, this is the recommended value, this controls whether the pool will still serve io or not ( ie if it is active ), so in this case io will stop since the min size is 3 and you have 2 nodes. You can edit the min size to make it 2 so your pool is active again, but you will be writing to a pool that now has no redundancy.
Knowing this you can go ahead and use this profile, but we recommend a min profile of k=3 m=2, or recommended k=4 m=2, the later requires 6 nodes/racks.
wailer
75 Posts
Quote from wailer on November 12, 2018, 1:23 pmGot it , so would it be correct to say that we would get redundancy back , as soon the failed node comes back up?
Got it , so would it be correct to say that we would get redundancy back , as soon the failed node comes back up?
admin
2,930 Posts
msalem
87 Posts
Quote from msalem on November 14, 2018, 2:50 pmHey Admin
I have 6 nodes and I need to setup EC, K=4 M=2 .. There are a few options here if you can explain what needs to be set.
https://ibb.co/kih9LL
Thanks
Hey Admin
I have 6 nodes and I need to setup EC, K=4 M=2 .. There are a few options here if you can explain what needs to be set.
Thanks
admin
2,930 Posts
Quote from admin on November 14, 2018, 4:34 pmThe Rule defines how you want the replicas/chunks to be placed. In your case you would first go to Configuration ->CRUSH->Rules and create an EC rule, you can select a ready made template ec-by-host-hdd or ec-by-host-ssd. For more elaborate setups you can choose to place the replicas in different racks (ec-by-rack-hdd) ..etc and use the fancy ui to define racks/room...
Re number of PGs, there is tooltip on this..basically an OSD/disk can handle to serve anywhere from 20 to 300 PGs, with 100 being ideal, anything less or more will affect operation such as load, weight balancing..etc. It may not be easy but you should make an effort to have as many PGs in your pools so that each OSD will have a number close to 100. This value includes replicas/chunks.
Total of ( Pool PGs x Pool Size) / num of OSDs serving the pools = approx 100
For users who just use the default pool, PetaSAN can assign this value but for more complex setups with many pools, and different placement rules you have to do this yourself.
The Rule defines how you want the replicas/chunks to be placed. In your case you would first go to Configuration ->CRUSH->Rules and create an EC rule, you can select a ready made template ec-by-host-hdd or ec-by-host-ssd. For more elaborate setups you can choose to place the replicas in different racks (ec-by-rack-hdd) ..etc and use the fancy ui to define racks/room...
Re number of PGs, there is tooltip on this..basically an OSD/disk can handle to serve anywhere from 20 to 300 PGs, with 100 being ideal, anything less or more will affect operation such as load, weight balancing..etc. It may not be easy but you should make an effort to have as many PGs in your pools so that each OSD will have a number close to 100. This value includes replicas/chunks.
Total of ( Pool PGs x Pool Size) / num of OSDs serving the pools = approx 100
For users who just use the default pool, PetaSAN can assign this value but for more complex setups with many pools, and different placement rules you have to do this yourself.
msalem
87 Posts
Quote from msalem on November 19, 2018, 8:23 amHello Admin,
After creating the cluster, I have this message any way to resolve it.
"1 pools have many more objects per pg than average"
Hello Admin,
After creating the cluster, I have this message any way to resolve it.
"1 pools have many more objects per pg than average"
admin
2,930 Posts
Quote from admin on November 19, 2018, 1:09 pmThis is usually an indication that the pool(s) containing most of the data in the cluster have too few PGs, and/or that other pools that do not contain as much data have too many PGs.
The threshold can be raised to silence the health warning by adjusting the mon_pg_warn_max_object_skew config option on the monitors. A non-positive number disables this setting.
This is usually an indication that the pool(s) containing most of the data in the cluster have too few PGs, and/or that other pools that do not contain as much data have too many PGs.
The threshold can be raised to silence the health warning by adjusting the mon_pg_warn_max_object_skew config option on the monitors. A non-positive number disables this setting.
msalem
87 Posts
Quote from msalem on November 19, 2018, 1:55 pmSo far all we have is one pool that we will be chunking ISCSI targets from. -
So this can be ignored ? -- can you please send the steps to silence this alerts.
Thanks
So far all we have is one pool that we will be chunking ISCSI targets from. -
So this can be ignored ? -- can you please send the steps to silence this alerts.
Thanks