creating > 1 pool at time of config

hak
23 Posts
June 26, 2023, 4:05 pmQuote from hak on June 26, 2023, 4:05 pmmy goal is to test different disk classes in a 3 node test cluster
- nvme (alone in it's own class of disk)
- ssd + spinning 10k sas (for a second class of storage)
seems i can only make one type at setup?
Question 1: what's the best way to setup the 2nd type of pool?
also when doing iscsi
- the auto increment for my first set of targets seemed to use all of the IPs...
- when i wanted to add another iscsi disk, it said I was out of IPs.
in other SANs i work with the same set of target IPs can represent > 1 LUN or target volume.
Question 2: is petasan not like this? what's the best way to add additional iscsi 'disks' over time without having to add many target IPs on our source hosts?
(environment: test vsphere hosts running 6.7, and 7.x connecting with iscsi, each petasan server has dual 10G to the front, not bonded but to diverse 10G storage switches, and dual 40G to the back, bonded across 2x 40G switches)
Thanks!
my goal is to test different disk classes in a 3 node test cluster
- nvme (alone in it's own class of disk)
- ssd + spinning 10k sas (for a second class of storage)
seems i can only make one type at setup?
Question 1: what's the best way to setup the 2nd type of pool?
also when doing iscsi
- the auto increment for my first set of targets seemed to use all of the IPs...
- when i wanted to add another iscsi disk, it said I was out of IPs.
in other SANs i work with the same set of target IPs can represent > 1 LUN or target volume.
Question 2: is petasan not like this? what's the best way to add additional iscsi 'disks' over time without having to add many target IPs on our source hosts?
(environment: test vsphere hosts running 6.7, and 7.x connecting with iscsi, each petasan server has dual 10G to the front, not bonded but to diverse 10G storage switches, and dual 40G to the back, bonded across 2x 40G switches)
Thanks!
Last edited on June 26, 2023, 4:08 pm by hak · #1

admin
2,967 Posts
June 26, 2023, 8:39 pmQuote from admin on June 26, 2023, 8:39 pmIn PetaSAN 3.2 you can rely on 2 built in device classes, hdd and ssd. These will be auto-detected when you add an OSD disk and assigned to it by default. ssd class will be assigned for both SSD and nvme disks (any non-rotational drive).
When you create a pool, you should choose a correct crush rule which reference a specific class : hdd or ssd. There are several built-in templates for crush rules.
PetaSAN 3.3 will allow you to define custom device classes if you need more that those 2. In version 3.2 you would need to add these from the command line if you need.
For your case, you should be fine with the default classes: ssd for the nvme OSDs and second pool use hdd class for HDD OSDs, i understand you intent to use your SSD devices as journals to the HDD OSDs and not as OSDs.
If you run out of IPs for your iSCSI paths, you could just increase the ip range from the iSCSI settings page.
PetaSAN uses a 1 LUN per target configuration, this is a common setup. It is also the only configuration possible to support active/active iSCSI paths for same disk across multiple server hosts.
In PetaSAN 3.2 you can rely on 2 built in device classes, hdd and ssd. These will be auto-detected when you add an OSD disk and assigned to it by default. ssd class will be assigned for both SSD and nvme disks (any non-rotational drive).
When you create a pool, you should choose a correct crush rule which reference a specific class : hdd or ssd. There are several built-in templates for crush rules.
PetaSAN 3.3 will allow you to define custom device classes if you need more that those 2. In version 3.2 you would need to add these from the command line if you need.
For your case, you should be fine with the default classes: ssd for the nvme OSDs and second pool use hdd class for HDD OSDs, i understand you intent to use your SSD devices as journals to the HDD OSDs and not as OSDs.
If you run out of IPs for your iSCSI paths, you could just increase the ip range from the iSCSI settings page.
PetaSAN uses a 1 LUN per target configuration, this is a common setup. It is also the only configuration possible to support active/active iSCSI paths for same disk across multiple server hosts.
Last edited on June 26, 2023, 8:42 pm by admin · #2

hak
23 Posts
June 27, 2023, 1:41 pmQuote from hak on June 27, 2023, 1:41 pmUnderstood - thank you.
yes, for the 'hybrid' pool it would be high dwpd ssd (i have optanes on hand) for journal vs. HDD for data. Question, for the pool rule: "by-host-hdd" it states:
# Placement rule per PG/object:
# For N replicas, choose N distinct hosts
# For each, choose a single HDD OSD disk
# Number of replicas N is defined in Pool creation
when trying to use this, it states: "Rule requires device types not available in your cluster."
so when i check my drives in each host (manage nodes > node list > physical disk list ), I do see:
- existing pool is a single 3.8TB nvme drive per host, UP, not touching those.
and, still available (per host):
- a single 480GB nvme/optane (auto marked as journal)
- 4x 600GB 10k sas hdd
If i try to combine a 600GB sas with the optane, it seems i can only do 1x hdd at a time (if i declare the 'journal - external, tagging the optane for that role) i cannot select all 4 HDD's to share the nvme on this page....
instead I went into the physical disk list and enabled each HDD as an OSD first. this then did allow me to add a new Crush Rule --- but seems these HDDs went into the default buckets tree and are intermixed with the 3.84TB NVME tier which I don't want...
how do i create a separte pool using the 4x HDD sharing the 480GB optanes (per host) to create a hybrid pool properly?
thank you,
Understood - thank you.
yes, for the 'hybrid' pool it would be high dwpd ssd (i have optanes on hand) for journal vs. HDD for data. Question, for the pool rule: "by-host-hdd" it states:
# Placement rule per PG/object:
# For N replicas, choose N distinct hosts
# For each, choose a single HDD OSD disk
# Number of replicas N is defined in Pool creation
when trying to use this, it states: "Rule requires device types not available in your cluster."
so when i check my drives in each host (manage nodes > node list > physical disk list ), I do see:
- existing pool is a single 3.8TB nvme drive per host, UP, not touching those.
and, still available (per host):
- a single 480GB nvme/optane (auto marked as journal)
- 4x 600GB 10k sas hdd
If i try to combine a 600GB sas with the optane, it seems i can only do 1x hdd at a time (if i declare the 'journal - external, tagging the optane for that role) i cannot select all 4 HDD's to share the nvme on this page....
instead I went into the physical disk list and enabled each HDD as an OSD first. this then did allow me to add a new Crush Rule --- but seems these HDDs went into the default buckets tree and are intermixed with the 3.84TB NVME tier which I don't want...
how do i create a separte pool using the 4x HDD sharing the 480GB optanes (per host) to create a hybrid pool properly?
thank you,
Last edited on June 27, 2023, 1:51 pm by hak · #3

admin
2,967 Posts
June 27, 2023, 4:39 pmQuote from admin on June 27, 2023, 4:39 pm "Rule requires device types not available in your cluster."
You should add a least 1 OSD first of the type you want to define crush rule, in this case hdd. So add OSD, then rule then define pool.
If i try to combine a 600GB sas with the optane, it seems i can only do 1x hdd at a time (if i declare the 'journal - external, tagging the optane for that role) i cannot select all 4 HDD's to share the nvme on this page....
Yes, in the Deployment Wizard app when creating cluster you can specify many OSDs to add at once, after this when the cluster in up a and running you need to add 1 OSD at at time specifying external journal, the system will automatically assign free partition on journal device to OSD.
"Rule requires device types not available in your cluster."
You should add a least 1 OSD first of the type you want to define crush rule, in this case hdd. So add OSD, then rule then define pool.
If i try to combine a 600GB sas with the optane, it seems i can only do 1x hdd at a time (if i declare the 'journal - external, tagging the optane for that role) i cannot select all 4 HDD's to share the nvme on this page....
Yes, in the Deployment Wizard app when creating cluster you can specify many OSDs to add at once, after this when the cluster in up a and running you need to add 1 OSD at at time specifying external journal, the system will automatically assign free partition on journal device to OSD.
Last edited on June 27, 2023, 4:40 pm by admin · #4

hak
23 Posts
June 27, 2023, 4:43 pmQuote from hak on June 27, 2023, 4:43 pmah. thank you.
so now that i (incorrectly) added all my 4x hdd per host (without specifying external journal) i'd like to remove these 12 hdd's (4 per host) as OSDs and then re-add them with 'external journal' marked.
how to properly retire these HDDs?
ah. thank you.
so now that i (incorrectly) added all my 4x hdd per host (without specifying external journal) i'd like to remove these 12 hdd's (4 per host) as OSDs and then re-add them with 'external journal' marked.
how to properly retire these HDDs?

admin
2,967 Posts
June 27, 2023, 8:29 pmQuote from admin on June 27, 2023, 8:29 pmyou can stop the OSD service manually
systemctl step ceph-osd@XX where XX is the OSD ID
then from UI delete the stopped OSD
you can stop the OSD service manually
systemctl step ceph-osd@XX where XX is the OSD ID
then from UI delete the stopped OSD

hak
23 Posts
June 27, 2023, 9:03 pmQuote from hak on June 27, 2023, 9:03 pmthank you. is there a rule of thumb for how long to wait between the CLI and deleting in the GUI to avoid dataloss? this is test only but a VM on the iscsi 'disk' has all of the load testing kit setup to run.... would rather not have to rebuild that vm...
i did 2 and paused...
"2 osds down
Degraded data redundancy: 204015/2328126 objects degraded (8.763%), 52 pgs degraded, 107 pgs undersized"
thank you. is there a rule of thumb for how long to wait between the CLI and deleting in the GUI to avoid dataloss? this is test only but a VM on the iscsi 'disk' has all of the load testing kit setup to run.... would rather not have to rebuild that vm...
i did 2 and paused...
"2 osds down
Degraded data redundancy: 204015/2328126 objects degraded (8.763%), 52 pgs degraded, 107 pgs undersized"

admin
2,967 Posts
June 28, 2023, 11:06 amQuote from admin on June 28, 2023, 11:06 amYou can delete mulitple OSDs on the same host at the same time without issue, the vms will be responding and recovery/rebalance happens in background. If you delete on many hosts => replica count at same time, you will lose data and obviously your pool will be stuck/lost forever. If you delete on 2 hosts and and your replica size=3 min_size=2, you do not have data loss but you only have 1 replica which is less than min_size, so vm i/o will not be responsive until the recovery completes and the system has enough replica to respond to i/o.
So the rule of thumb, do not delete many OSDs on many hosts at the same time, do it host by host and wait for cluster to receover/rebalance until the health is Ok to move to the next host.
If this is a test setup, instead of waiting for recovery of test data, just delete delete the pool so the system does not waste time recovering or be stuck in case of deleting on all repliacas.
You can delete mulitple OSDs on the same host at the same time without issue, the vms will be responding and recovery/rebalance happens in background. If you delete on many hosts => replica count at same time, you will lose data and obviously your pool will be stuck/lost forever. If you delete on 2 hosts and and your replica size=3 min_size=2, you do not have data loss but you only have 1 replica which is less than min_size, so vm i/o will not be responsive until the recovery completes and the system has enough replica to respond to i/o.
So the rule of thumb, do not delete many OSDs on many hosts at the same time, do it host by host and wait for cluster to receover/rebalance until the health is Ok to move to the next host.
If this is a test setup, instead of waiting for recovery of test data, just delete delete the pool so the system does not waste time recovering or be stuck in case of deleting on all repliacas.
Last edited on June 28, 2023, 11:09 am by admin · #8

hak
23 Posts
June 29, 2023, 4:20 pmQuote from hak on June 29, 2023, 4:20 pmif i need to move this testcluster to a different rack ... what's the best practice to shut things down gracefully? (no isci disks are mounted by vsphere at this point)
if i need to move this testcluster to a different rack ... what's the best practice to shut things down gracefully? (no isci disks are mounted by vsphere at this point)

admin
2,967 Posts
June 30, 2023, 8:42 amQuote from admin on June 30, 2023, 8:42 amJust shut down the nodes, no problems at all, there is no risk of data loss as all data changes are flushed and not buffered. Maybe it is better to shut all nodes withing a small timeframe so you do not kick off recovery, although it is not a big deal either.
Just shut down the nodes, no problems at all, there is no risk of data loss as all data changes are flushed and not buffered. Maybe it is better to shut all nodes withing a small timeframe so you do not kick off recovery, although it is not a big deal either.
creating > 1 pool at time of config
hak
23 Posts
Quote from hak on June 26, 2023, 4:05 pmmy goal is to test different disk classes in a 3 node test cluster
- nvme (alone in it's own class of disk)
- ssd + spinning 10k sas (for a second class of storage)
seems i can only make one type at setup?
Question 1: what's the best way to setup the 2nd type of pool?
also when doing iscsi
- the auto increment for my first set of targets seemed to use all of the IPs...
- when i wanted to add another iscsi disk, it said I was out of IPs.
in other SANs i work with the same set of target IPs can represent > 1 LUN or target volume.
Question 2: is petasan not like this? what's the best way to add additional iscsi 'disks' over time without having to add many target IPs on our source hosts?
(environment: test vsphere hosts running 6.7, and 7.x connecting with iscsi, each petasan server has dual 10G to the front, not bonded but to diverse 10G storage switches, and dual 40G to the back, bonded across 2x 40G switches)
Thanks!
my goal is to test different disk classes in a 3 node test cluster
- nvme (alone in it's own class of disk)
- ssd + spinning 10k sas (for a second class of storage)
seems i can only make one type at setup?
Question 1: what's the best way to setup the 2nd type of pool?
also when doing iscsi
- the auto increment for my first set of targets seemed to use all of the IPs...
- when i wanted to add another iscsi disk, it said I was out of IPs.
in other SANs i work with the same set of target IPs can represent > 1 LUN or target volume.
Question 2: is petasan not like this? what's the best way to add additional iscsi 'disks' over time without having to add many target IPs on our source hosts?
(environment: test vsphere hosts running 6.7, and 7.x connecting with iscsi, each petasan server has dual 10G to the front, not bonded but to diverse 10G storage switches, and dual 40G to the back, bonded across 2x 40G switches)
Thanks!
admin
2,967 Posts
Quote from admin on June 26, 2023, 8:39 pmIn PetaSAN 3.2 you can rely on 2 built in device classes, hdd and ssd. These will be auto-detected when you add an OSD disk and assigned to it by default. ssd class will be assigned for both SSD and nvme disks (any non-rotational drive).
When you create a pool, you should choose a correct crush rule which reference a specific class : hdd or ssd. There are several built-in templates for crush rules.
PetaSAN 3.3 will allow you to define custom device classes if you need more that those 2. In version 3.2 you would need to add these from the command line if you need.
For your case, you should be fine with the default classes: ssd for the nvme OSDs and second pool use hdd class for HDD OSDs, i understand you intent to use your SSD devices as journals to the HDD OSDs and not as OSDs.
If you run out of IPs for your iSCSI paths, you could just increase the ip range from the iSCSI settings page.
PetaSAN uses a 1 LUN per target configuration, this is a common setup. It is also the only configuration possible to support active/active iSCSI paths for same disk across multiple server hosts.
In PetaSAN 3.2 you can rely on 2 built in device classes, hdd and ssd. These will be auto-detected when you add an OSD disk and assigned to it by default. ssd class will be assigned for both SSD and nvme disks (any non-rotational drive).
When you create a pool, you should choose a correct crush rule which reference a specific class : hdd or ssd. There are several built-in templates for crush rules.
PetaSAN 3.3 will allow you to define custom device classes if you need more that those 2. In version 3.2 you would need to add these from the command line if you need.
For your case, you should be fine with the default classes: ssd for the nvme OSDs and second pool use hdd class for HDD OSDs, i understand you intent to use your SSD devices as journals to the HDD OSDs and not as OSDs.
If you run out of IPs for your iSCSI paths, you could just increase the ip range from the iSCSI settings page.
PetaSAN uses a 1 LUN per target configuration, this is a common setup. It is also the only configuration possible to support active/active iSCSI paths for same disk across multiple server hosts.
hak
23 Posts
Quote from hak on June 27, 2023, 1:41 pmUnderstood - thank you.
yes, for the 'hybrid' pool it would be high dwpd ssd (i have optanes on hand) for journal vs. HDD for data. Question, for the pool rule: "by-host-hdd" it states:
# Placement rule per PG/object:
# For N replicas, choose N distinct hosts
# For each, choose a single HDD OSD disk
# Number of replicas N is defined in Pool creationwhen trying to use this, it states: "Rule requires device types not available in your cluster."
so when i check my drives in each host (manage nodes > node list > physical disk list ), I do see:
- existing pool is a single 3.8TB nvme drive per host, UP, not touching those.
and, still available (per host):
- a single 480GB nvme/optane (auto marked as journal)
- 4x 600GB 10k sas hdd
If i try to combine a 600GB sas with the optane, it seems i can only do 1x hdd at a time (if i declare the 'journal - external, tagging the optane for that role) i cannot select all 4 HDD's to share the nvme on this page....
instead I went into the physical disk list and enabled each HDD as an OSD first. this then did allow me to add a new Crush Rule --- but seems these HDDs went into the default buckets tree and are intermixed with the 3.84TB NVME tier which I don't want...
how do i create a separte pool using the 4x HDD sharing the 480GB optanes (per host) to create a hybrid pool properly?
thank you,
Understood - thank you.
yes, for the 'hybrid' pool it would be high dwpd ssd (i have optanes on hand) for journal vs. HDD for data. Question, for the pool rule: "by-host-hdd" it states:
# Placement rule per PG/object:
# For N replicas, choose N distinct hosts
# For each, choose a single HDD OSD disk
# Number of replicas N is defined in Pool creation
when trying to use this, it states: "Rule requires device types not available in your cluster."
so when i check my drives in each host (manage nodes > node list > physical disk list ), I do see:
- existing pool is a single 3.8TB nvme drive per host, UP, not touching those.
and, still available (per host):
- a single 480GB nvme/optane (auto marked as journal)
- 4x 600GB 10k sas hdd
If i try to combine a 600GB sas with the optane, it seems i can only do 1x hdd at a time (if i declare the 'journal - external, tagging the optane for that role) i cannot select all 4 HDD's to share the nvme on this page....
instead I went into the physical disk list and enabled each HDD as an OSD first. this then did allow me to add a new Crush Rule --- but seems these HDDs went into the default buckets tree and are intermixed with the 3.84TB NVME tier which I don't want...
how do i create a separte pool using the 4x HDD sharing the 480GB optanes (per host) to create a hybrid pool properly?
thank you,
admin
2,967 Posts
Quote from admin on June 27, 2023, 4:39 pm"Rule requires device types not available in your cluster."
You should add a least 1 OSD first of the type you want to define crush rule, in this case hdd. So add OSD, then rule then define pool.
If i try to combine a 600GB sas with the optane, it seems i can only do 1x hdd at a time (if i declare the 'journal - external, tagging the optane for that role) i cannot select all 4 HDD's to share the nvme on this page....
Yes, in the Deployment Wizard app when creating cluster you can specify many OSDs to add at once, after this when the cluster in up a and running you need to add 1 OSD at at time specifying external journal, the system will automatically assign free partition on journal device to OSD.
"Rule requires device types not available in your cluster."
You should add a least 1 OSD first of the type you want to define crush rule, in this case hdd. So add OSD, then rule then define pool.
If i try to combine a 600GB sas with the optane, it seems i can only do 1x hdd at a time (if i declare the 'journal - external, tagging the optane for that role) i cannot select all 4 HDD's to share the nvme on this page....
Yes, in the Deployment Wizard app when creating cluster you can specify many OSDs to add at once, after this when the cluster in up a and running you need to add 1 OSD at at time specifying external journal, the system will automatically assign free partition on journal device to OSD.
hak
23 Posts
Quote from hak on June 27, 2023, 4:43 pmah. thank you.
so now that i (incorrectly) added all my 4x hdd per host (without specifying external journal) i'd like to remove these 12 hdd's (4 per host) as OSDs and then re-add them with 'external journal' marked.
how to properly retire these HDDs?
ah. thank you.
so now that i (incorrectly) added all my 4x hdd per host (without specifying external journal) i'd like to remove these 12 hdd's (4 per host) as OSDs and then re-add them with 'external journal' marked.
how to properly retire these HDDs?
admin
2,967 Posts
Quote from admin on June 27, 2023, 8:29 pmyou can stop the OSD service manually
systemctl step ceph-osd@XX where XX is the OSD ID
then from UI delete the stopped OSD
you can stop the OSD service manually
systemctl step ceph-osd@XX where XX is the OSD ID
then from UI delete the stopped OSD
hak
23 Posts
Quote from hak on June 27, 2023, 9:03 pmthank you. is there a rule of thumb for how long to wait between the CLI and deleting in the GUI to avoid dataloss? this is test only but a VM on the iscsi 'disk' has all of the load testing kit setup to run.... would rather not have to rebuild that vm...
i did 2 and paused...
"2 osds down
Degraded data redundancy: 204015/2328126 objects degraded (8.763%), 52 pgs degraded, 107 pgs undersized"
thank you. is there a rule of thumb for how long to wait between the CLI and deleting in the GUI to avoid dataloss? this is test only but a VM on the iscsi 'disk' has all of the load testing kit setup to run.... would rather not have to rebuild that vm...
i did 2 and paused...
"2 osds down
Degraded data redundancy: 204015/2328126 objects degraded (8.763%), 52 pgs degraded, 107 pgs undersized"
admin
2,967 Posts
Quote from admin on June 28, 2023, 11:06 amYou can delete mulitple OSDs on the same host at the same time without issue, the vms will be responding and recovery/rebalance happens in background. If you delete on many hosts => replica count at same time, you will lose data and obviously your pool will be stuck/lost forever. If you delete on 2 hosts and and your replica size=3 min_size=2, you do not have data loss but you only have 1 replica which is less than min_size, so vm i/o will not be responsive until the recovery completes and the system has enough replica to respond to i/o.
So the rule of thumb, do not delete many OSDs on many hosts at the same time, do it host by host and wait for cluster to receover/rebalance until the health is Ok to move to the next host.
If this is a test setup, instead of waiting for recovery of test data, just delete delete the pool so the system does not waste time recovering or be stuck in case of deleting on all repliacas.
You can delete mulitple OSDs on the same host at the same time without issue, the vms will be responding and recovery/rebalance happens in background. If you delete on many hosts => replica count at same time, you will lose data and obviously your pool will be stuck/lost forever. If you delete on 2 hosts and and your replica size=3 min_size=2, you do not have data loss but you only have 1 replica which is less than min_size, so vm i/o will not be responsive until the recovery completes and the system has enough replica to respond to i/o.
So the rule of thumb, do not delete many OSDs on many hosts at the same time, do it host by host and wait for cluster to receover/rebalance until the health is Ok to move to the next host.
If this is a test setup, instead of waiting for recovery of test data, just delete delete the pool so the system does not waste time recovering or be stuck in case of deleting on all repliacas.
hak
23 Posts
Quote from hak on June 29, 2023, 4:20 pmif i need to move this testcluster to a different rack ... what's the best practice to shut things down gracefully? (no isci disks are mounted by vsphere at this point)
if i need to move this testcluster to a different rack ... what's the best practice to shut things down gracefully? (no isci disks are mounted by vsphere at this point)
admin
2,967 Posts
Quote from admin on June 30, 2023, 8:42 amJust shut down the nodes, no problems at all, there is no risk of data loss as all data changes are flushed and not buffered. Maybe it is better to shut all nodes withing a small timeframe so you do not kick off recovery, although it is not a big deal either.
Just shut down the nodes, no problems at all, there is no risk of data loss as all data changes are flushed and not buffered. Maybe it is better to shut all nodes withing a small timeframe so you do not kick off recovery, although it is not a big deal either.