Understanding how to add 3 SSD Nodes to a cluster of 3 HDD Nodes
peter
2 Posts
May 12, 2022, 9:11 amQuote from peter on May 12, 2022, 9:11 amHello,
I'm not sure if my understanding is correct, so I hope to get it verified in that Thread.
At the moment I'm running 3 nodes with nvme journal and spinning disks. My Idea is to get better read and write performance, I'll add 3 SSD nodes to the cluster.
So if new data will be written, it gets written to the SSDs and old - less or never accessed - data wi'll be moved to the HDDs.
Is it working like that? and do I have to modify the crush, or should ist work like that by default?
Are there any better solutions to get better performance for regulary accessed data?
I use Petasan as Iscsi target.
Thanks in advance
Peter
Hello,
I'm not sure if my understanding is correct, so I hope to get it verified in that Thread.
At the moment I'm running 3 nodes with nvme journal and spinning disks. My Idea is to get better read and write performance, I'll add 3 SSD nodes to the cluster.
So if new data will be written, it gets written to the SSDs and old - less or never accessed - data wi'll be moved to the HDDs.
Is it working like that? and do I have to modify the crush, or should ist work like that by default?
Are there any better solutions to get better performance for regulary accessed data?
I use Petasan as Iscsi target.
Thanks in advance
Peter
admin
2,933 Posts
May 12, 2022, 5:55 pmQuote from admin on May 12, 2022, 5:55 pmIf you use the default built in rule, it will store data on all OSDs irrespective of their type, if you later want to use more rules for different device classes, you need to reclassify the existing crush map first:
https://docs.ceph.com/docs/master/rados/operations/crush-map-edits/#crush-reclassify
this will ensure the default rule will be changed to use the specific current device. after this you can create another rule based on the provided by-host-ssd template which uses SSD device class and use this rule to create an SSD only pool.
If you use the default built in rule, it will store data on all OSDs irrespective of their type, if you later want to use more rules for different device classes, you need to reclassify the existing crush map first:
https://docs.ceph.com/docs/master/rados/operations/crush-map-edits/#crush-reclassify
this will ensure the default rule will be changed to use the specific current device. after this you can create another rule based on the provided by-host-ssd template which uses SSD device class and use this rule to create an SSD only pool.
Last edited on May 12, 2022, 6:03 pm by admin · #2
eazyadm
25 Posts
July 8, 2022, 8:16 amQuote from eazyadm on July 8, 2022, 8:16 amHi admin
I think we want to get the same result as peter, but if I understand your answer in write, the result is not an iscsi target with mixed osd where the SSD are available as higher tier.
The result is that I have another pool, with only ssds on which I can put another iscsi lun.
Is that correct?
And if is like described how to get the mixed mode on a single iscsi lun ?
Thanks
Hi admin
I think we want to get the same result as peter, but if I understand your answer in write, the result is not an iscsi target with mixed osd where the SSD are available as higher tier.
The result is that I have another pool, with only ssds on which I can put another iscsi lun.
Is that correct?
And if is like described how to get the mixed mode on a single iscsi lun ?
Thanks
admin
2,933 Posts
July 8, 2022, 1:24 pmQuote from admin on July 8, 2022, 1:24 pmHaving a cache tier within same pool has been deprecated for a couple of years now:
You either create a separate fast pool or you can use caching at the block disk level (rather than Ceph pool level), we support dm-writecache which we found much better than dm-cache quoted in the link
Having a cache tier within same pool has been deprecated for a couple of years now:
You either create a separate fast pool or you can use caching at the block disk level (rather than Ceph pool level), we support dm-writecache which we found much better than dm-cache quoted in the link
DennisV
5 Posts
July 14, 2022, 8:48 amQuote from DennisV on July 14, 2022, 8:48 amAfter typing a section I've read you are already on NVME for journal.
Than it's a matter of adding the SSD as cache disk.
If you got 1xSSD per 1xHDD that would be the best and you can use the entire SSD, else divide the space for max 3-4 HDD.
As far as I know dm-writecache doesn't access reads a lot, but writes....ow boy that helped
Mind you, you will need a fast / min 10Gb network at least to make advantage of this.
If you want to read up on what I've been trying/doing:
https://www.reddit.com/r/ceph/comments/vmieqe/advice_on_new_ceph_cluster_3_node_3xssd_3xhdd_for/
After typing a section I've read you are already on NVME for journal.
Than it's a matter of adding the SSD as cache disk.
If you got 1xSSD per 1xHDD that would be the best and you can use the entire SSD, else divide the space for max 3-4 HDD.
As far as I know dm-writecache doesn't access reads a lot, but writes....ow boy that helped
Mind you, you will need a fast / min 10Gb network at least to make advantage of this.
If you want to read up on what I've been trying/doing:
Advice on new ceph cluster (3 node , 3xSSD , 3xHDD) for vSphere
byu/DennisV_EXNL inceph
peter
2 Posts
July 15, 2022, 9:53 amQuote from peter on July 15, 2022, 9:53 amHi Dennis and admin,
thanks for your answers.
Just to go for sure.
If my understanding is correct, I can't use dedicated SSD Nodes in a useful way, and have to replace some spinning disks in my existing nodes by SSDs to use dm-writecache.
My network backend ist 10GBE
Thanks
Hi Dennis and admin,
thanks for your answers.
Just to go for sure.
If my understanding is correct, I can't use dedicated SSD Nodes in a useful way, and have to replace some spinning disks in my existing nodes by SSDs to use dm-writecache.
My network backend ist 10GBE
Thanks
DennisV
5 Posts
July 15, 2022, 2:14 pmQuote from DennisV on July 15, 2022, 2:14 pmThe caching method is local only, so the SSD and HDD need to be in the same node.
If you have new nodes that have only HDD or SSDs in them you can't use this method.
You can create a new SSD-pool and use it alongside the HDD-pool and migrate data.
In theory you could create an SSD-pool and use it as a caching pool for the HDD-pool, but it's deprecated and will result in far less speed as this also needs to traverse the network multiple times.
If you can't / don't want to create a new pool and you need to accellerate the existing HDD-pool you will need to place the SSDs in the same node.
You will need to take out one or some of the HDDs in the current nodes and replace them with SSD and reconfig an OSD to use the SSD for caching.
Do this disk-by-disk and make sure your data is safe/in sync before taking out another disk or you will lose data.
It's not recommended to add the cache to the OSD live (if even possible).
You will need to take down and remove the OSD from the pool and reconfigure it with the journal on NVME and caching on the SSD.
After the OSD is added to the pool you could rebalance the pool to even out the OSD usage.
You can reuse the HDD in the new nodes along with the SSD to create ""more of the same / similar nodes"".
TLDR:
SSD+HDD in same node
Reconfig OSD needs removal and re-adding.
Make sure sync / ceph is healty before the next step.
Balance after completion.
The caching method is local only, so the SSD and HDD need to be in the same node.
If you have new nodes that have only HDD or SSDs in them you can't use this method.
You can create a new SSD-pool and use it alongside the HDD-pool and migrate data.
In theory you could create an SSD-pool and use it as a caching pool for the HDD-pool, but it's deprecated and will result in far less speed as this also needs to traverse the network multiple times.
If you can't / don't want to create a new pool and you need to accellerate the existing HDD-pool you will need to place the SSDs in the same node.
You will need to take out one or some of the HDDs in the current nodes and replace them with SSD and reconfig an OSD to use the SSD for caching.
Do this disk-by-disk and make sure your data is safe/in sync before taking out another disk or you will lose data.
It's not recommended to add the cache to the OSD live (if even possible).
You will need to take down and remove the OSD from the pool and reconfigure it with the journal on NVME and caching on the SSD.
After the OSD is added to the pool you could rebalance the pool to even out the OSD usage.
You can reuse the HDD in the new nodes along with the SSD to create ""more of the same / similar nodes"".
TLDR:
SSD+HDD in same node
Reconfig OSD needs removal and re-adding.
Make sure sync / ceph is healty before the next step.
Balance after completion.
Understanding how to add 3 SSD Nodes to a cluster of 3 HDD Nodes
peter
2 Posts
Quote from peter on May 12, 2022, 9:11 amHello,
I'm not sure if my understanding is correct, so I hope to get it verified in that Thread.
At the moment I'm running 3 nodes with nvme journal and spinning disks. My Idea is to get better read and write performance, I'll add 3 SSD nodes to the cluster.
So if new data will be written, it gets written to the SSDs and old - less or never accessed - data wi'll be moved to the HDDs.
Is it working like that? and do I have to modify the crush, or should ist work like that by default?
Are there any better solutions to get better performance for regulary accessed data?
I use Petasan as Iscsi target.
Thanks in advance
Peter
Hello,
I'm not sure if my understanding is correct, so I hope to get it verified in that Thread.
At the moment I'm running 3 nodes with nvme journal and spinning disks. My Idea is to get better read and write performance, I'll add 3 SSD nodes to the cluster.
So if new data will be written, it gets written to the SSDs and old - less or never accessed - data wi'll be moved to the HDDs.
Is it working like that? and do I have to modify the crush, or should ist work like that by default?
Are there any better solutions to get better performance for regulary accessed data?
I use Petasan as Iscsi target.
Thanks in advance
Peter
admin
2,933 Posts
Quote from admin on May 12, 2022, 5:55 pmIf you use the default built in rule, it will store data on all OSDs irrespective of their type, if you later want to use more rules for different device classes, you need to reclassify the existing crush map first:
https://docs.ceph.com/docs/master/rados/operations/crush-map-edits/#crush-reclassify
this will ensure the default rule will be changed to use the specific current device. after this you can create another rule based on the provided by-host-ssd template which uses SSD device class and use this rule to create an SSD only pool.
If you use the default built in rule, it will store data on all OSDs irrespective of their type, if you later want to use more rules for different device classes, you need to reclassify the existing crush map first:
https://docs.ceph.com/docs/master/rados/operations/crush-map-edits/#crush-reclassify
this will ensure the default rule will be changed to use the specific current device. after this you can create another rule based on the provided by-host-ssd template which uses SSD device class and use this rule to create an SSD only pool.
eazyadm
25 Posts
Quote from eazyadm on July 8, 2022, 8:16 amHi admin
I think we want to get the same result as peter, but if I understand your answer in write, the result is not an iscsi target with mixed osd where the SSD are available as higher tier.
The result is that I have another pool, with only ssds on which I can put another iscsi lun.
Is that correct?
And if is like described how to get the mixed mode on a single iscsi lun ?
Thanks
Hi admin
I think we want to get the same result as peter, but if I understand your answer in write, the result is not an iscsi target with mixed osd where the SSD are available as higher tier.
The result is that I have another pool, with only ssds on which I can put another iscsi lun.
Is that correct?
And if is like described how to get the mixed mode on a single iscsi lun ?
Thanks
admin
2,933 Posts
Quote from admin on July 8, 2022, 1:24 pmHaving a cache tier within same pool has been deprecated for a couple of years now:
You either create a separate fast pool or you can use caching at the block disk level (rather than Ceph pool level), we support dm-writecache which we found much better than dm-cache quoted in the link
Having a cache tier within same pool has been deprecated for a couple of years now:
You either create a separate fast pool or you can use caching at the block disk level (rather than Ceph pool level), we support dm-writecache which we found much better than dm-cache quoted in the link
DennisV
5 Posts
Quote from DennisV on July 14, 2022, 8:48 amAfter typing a section I've read you are already on NVME for journal.
Than it's a matter of adding the SSD as cache disk.
If you got 1xSSD per 1xHDD that would be the best and you can use the entire SSD, else divide the space for max 3-4 HDD.
As far as I know dm-writecache doesn't access reads a lot, but writes....ow boy that helped
Mind you, you will need a fast / min 10Gb network at least to make advantage of this.If you want to read up on what I've been trying/doing:
https://www.reddit.com/r/ceph/comments/vmieqe/advice_on_new_ceph_cluster_3_node_3xssd_3xhdd_for/
After typing a section I've read you are already on NVME for journal.
Than it's a matter of adding the SSD as cache disk.
If you got 1xSSD per 1xHDD that would be the best and you can use the entire SSD, else divide the space for max 3-4 HDD.
As far as I know dm-writecache doesn't access reads a lot, but writes....ow boy that helped
Mind you, you will need a fast / min 10Gb network at least to make advantage of this.
If you want to read up on what I've been trying/doing:
Advice on new ceph cluster (3 node , 3xSSD , 3xHDD) for vSphere
byu/DennisV_EXNL inceph
peter
2 Posts
Quote from peter on July 15, 2022, 9:53 amHi Dennis and admin,
thanks for your answers.
Just to go for sure.
If my understanding is correct, I can't use dedicated SSD Nodes in a useful way, and have to replace some spinning disks in my existing nodes by SSDs to use dm-writecache.My network backend ist 10GBE
Thanks
Hi Dennis and admin,
thanks for your answers.
Just to go for sure.
If my understanding is correct, I can't use dedicated SSD Nodes in a useful way, and have to replace some spinning disks in my existing nodes by SSDs to use dm-writecache.
My network backend ist 10GBE
Thanks
DennisV
5 Posts
Quote from DennisV on July 15, 2022, 2:14 pmThe caching method is local only, so the SSD and HDD need to be in the same node.
If you have new nodes that have only HDD or SSDs in them you can't use this method.
You can create a new SSD-pool and use it alongside the HDD-pool and migrate data.
In theory you could create an SSD-pool and use it as a caching pool for the HDD-pool, but it's deprecated and will result in far less speed as this also needs to traverse the network multiple times.If you can't / don't want to create a new pool and you need to accellerate the existing HDD-pool you will need to place the SSDs in the same node.
You will need to take out one or some of the HDDs in the current nodes and replace them with SSD and reconfig an OSD to use the SSD for caching.
Do this disk-by-disk and make sure your data is safe/in sync before taking out another disk or you will lose data.It's not recommended to add the cache to the OSD live (if even possible).
You will need to take down and remove the OSD from the pool and reconfigure it with the journal on NVME and caching on the SSD.
After the OSD is added to the pool you could rebalance the pool to even out the OSD usage.You can reuse the HDD in the new nodes along with the SSD to create ""more of the same / similar nodes"".
TLDR:
SSD+HDD in same node
Reconfig OSD needs removal and re-adding.
Make sure sync / ceph is healty before the next step.
Balance after completion.
The caching method is local only, so the SSD and HDD need to be in the same node.
If you have new nodes that have only HDD or SSDs in them you can't use this method.
You can create a new SSD-pool and use it alongside the HDD-pool and migrate data.
In theory you could create an SSD-pool and use it as a caching pool for the HDD-pool, but it's deprecated and will result in far less speed as this also needs to traverse the network multiple times.
If you can't / don't want to create a new pool and you need to accellerate the existing HDD-pool you will need to place the SSDs in the same node.
You will need to take out one or some of the HDDs in the current nodes and replace them with SSD and reconfig an OSD to use the SSD for caching.
Do this disk-by-disk and make sure your data is safe/in sync before taking out another disk or you will lose data.
It's not recommended to add the cache to the OSD live (if even possible).
You will need to take down and remove the OSD from the pool and reconfigure it with the journal on NVME and caching on the SSD.
After the OSD is added to the pool you could rebalance the pool to even out the OSD usage.
You can reuse the HDD in the new nodes along with the SSD to create ""more of the same / similar nodes"".
TLDR:
SSD+HDD in same node
Reconfig OSD needs removal and re-adding.
Make sure sync / ceph is healty before the next step.
Balance after completion.