Forums - PetaSAN

ForumGeneral DiscussionUnderstanding how to add 3 SSD No …
You need to log in to create posts and topics. Login · Register
Understanding how to add 3 SSD Nodes to a cluster of 3 HDD Nodes

peter
2 Posts

May 12, 2022, 9:11 am
Quote from peter on May 12, 2022, 9:11 am
Hello,

I'm not sure if my understanding is correct, so I hope to get it verified in that Thread.

At the moment I'm running 3 nodes with nvme journal and spinning disks. My Idea is to get better read and write performance, I'll add 3 SSD nodes to the cluster.

So if new data will be written, it gets written to the SSDs and old - less or never accessed - data wi'll be moved to the HDDs.

Is it working like that? and do I have to modify the crush, or should ist work like that by default?

Are there any better solutions to get better performance for regulary accessed data?

I use Petasan as Iscsi target.

Thanks in advance

Peter

Hello,

I'm not sure if my understanding is correct, so I hope to get it verified in that Thread.

At the moment I'm running 3 nodes with nvme journal and spinning disks. My Idea is to get better read and write performance, I'll add 3 SSD nodes to the cluster.

So if new data will be written, it gets written to the SSDs and old - less or never accessed - data wi'll be moved to the HDDs.

Is it working like that? and do I have to modify the crush, or should ist work like that by default?

Are there any better solutions to get better performance for regulary accessed data?

I use Petasan as Iscsi target.

Thanks in advance

Peter

#1

admin
2,966 Posts

May 12, 2022, 5:55 pm
Quote from admin on May 12, 2022, 5:55 pm
If you use the default built in rule, it will store data on all OSDs irrespective of their type, if you later want to use more rules for different device classes, you need to reclassify the existing crush map first:

https://docs.ceph.com/docs/master/rados/operations/crush-map-edits/#crush-reclassify

this will ensure the default rule will be changed to use the specific current device. after this you can create another rule based on the provided by-host-ssd template which uses SSD device class and use this rule to create an SSD only pool.

If you use the default built in rule, it will store data on all OSDs irrespective of their type, if you later want to use more rules for different device classes, you need to reclassify the existing crush map first:

https://docs.ceph.com/docs/master/rados/operations/crush-map-edits/#crush-reclassify

this will ensure the default rule will be changed to use the specific current device. after this you can create another rule based on the provided by-host-ssd template which uses SSD device class and use this rule to create an SSD only pool.

Last edited on May 12, 2022, 6:03 pm by admin · #2

eazyadm
25 Posts

July 8, 2022, 8:16 am
Quote from eazyadm on July 8, 2022, 8:16 am
Hi admin 😉

I think we want to get the same result as peter, but if I understand your answer in write, the result is not an iscsi target with mixed osd where the SSD are available as higher tier.

The result is that I have another pool, with only ssds on which I can put another iscsi lun.

Is that correct?

And if is like described how to get the mixed mode on a single iscsi lun ?

Thanks

Hi admin 😉

I think we want to get the same result as peter, but if I understand your answer in write, the result is not an iscsi target with mixed osd where the SSD are available as higher tier.

The result is that I have another pool, with only ssds on which I can put another iscsi lun.

Is that correct?

And if is like described how to get the mixed mode on a single iscsi lun ?

Thanks

#3

admin
2,966 Posts

July 8, 2022, 1:24 pm
Quote from admin on July 8, 2022, 1:24 pm
Having a cache tier within same pool has been deprecated for a couple of years now:

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2.0/html/release_notes/deprecated_functionality

You either create a separate fast pool or you can use caching at the block disk level (rather than Ceph pool level), we support dm-writecache which we found much better than dm-cache quoted in the link

Having a cache tier within same pool has been deprecated for a couple of years now:

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2.0/html/release_notes/deprecated_functionality

You either create a separate fast pool or you can use caching at the block disk level (rather than Ceph pool level), we support dm-writecache which we found much better than dm-cache quoted in the link

#4

DennisV
5 Posts

July 14, 2022, 8:48 am
Quote from DennisV on July 14, 2022, 8:48 am
After typing a section I've read you are already on NVME for journal.
Than it's a matter of adding the SSD as cache disk.
If you got 1xSSD per 1xHDD that would be the best and you can use the entire SSD, else divide the space for max 3-4 HDD.
As far as I know dm-writecache doesn't access reads a lot, but writes....ow boy that helped 🙂
Mind you, you will need a fast / min 10Gb network at least to make advantage of this.

If you want to read up on what I've been trying/doing:
https://www.reddit.com/r/ceph/comments/vmieqe/advice_on_new_ceph_cluster_3_node_3xssd_3xhdd_for/

After typing a section I've read you are already on NVME for journal.
Than it's a matter of adding the SSD as cache disk.
If you got 1xSSD per 1xHDD that would be the best and you can use the entire SSD, else divide the space for max 3-4 HDD.
As far as I know dm-writecache doesn't access reads a lot, but writes....ow boy that helped 🙂
Mind you, you will need a fast / min 10Gb network at least to make advantage of this.

If you want to read up on what I've been trying/doing:

Advice on new ceph cluster (3 node , 3xSSD , 3xHDD) for vSphere
byu/DennisV_EXNL inceph

#5

peter
2 Posts

July 15, 2022, 9:53 am
Quote from peter on July 15, 2022, 9:53 am
Hi Dennis and admin,

thanks for your answers.

Just to go for sure.
If my understanding is correct, I can't use dedicated SSD Nodes in a useful way, and have to replace some spinning disks in my existing nodes by SSDs to use dm-writecache.

My network backend ist 10GBE

Thanks

Hi Dennis and admin,

thanks for your answers.

Just to go for sure.
If my understanding is correct, I can't use dedicated SSD Nodes in a useful way, and have to replace some spinning disks in my existing nodes by SSDs to use dm-writecache.

My network backend ist 10GBE

Thanks

#6

DennisV
5 Posts

July 15, 2022, 2:14 pm
Quote from DennisV on July 15, 2022, 2:14 pm
The caching method is local only, so the SSD and HDD need to be in the same node.
If you have new nodes that have only HDD or SSDs in them you can't use this method.
You can create a new SSD-pool and use it alongside the HDD-pool and migrate data.
In theory you could create an SSD-pool and use it as a caching pool for the HDD-pool, but it's deprecated and will result in far less speed as this also needs to traverse the network multiple times.

If you can't / don't want to create a new pool and you need to accellerate the existing HDD-pool you will need to place the SSDs in the same node.
You will need to take out one or some of the HDDs in the current nodes and replace them with SSD and reconfig an OSD to use the SSD for caching.
Do this disk-by-disk and make sure your data is safe/in sync before taking out another disk or you will lose data.

It's not recommended to add the cache to the OSD live (if even possible).
You will need to take down and remove the OSD from the pool and reconfigure it with the journal on NVME and caching on the SSD.
After the OSD is added to the pool you could rebalance the pool to even out the OSD usage.

You can reuse the HDD in the new nodes along with the SSD to create ""more of the same / similar nodes"".

TLDR:
SSD+HDD in same node
Reconfig OSD needs removal and re-adding.
Make sure sync / ceph is healty before the next step.
Balance after completion.

The caching method is local only, so the SSD and HDD need to be in the same node.
If you have new nodes that have only HDD or SSDs in them you can't use this method.
You can create a new SSD-pool and use it alongside the HDD-pool and migrate data.
In theory you could create an SSD-pool and use it as a caching pool for the HDD-pool, but it's deprecated and will result in far less speed as this also needs to traverse the network multiple times.

If you can't / don't want to create a new pool and you need to accellerate the existing HDD-pool you will need to place the SSDs in the same node.
You will need to take out one or some of the HDDs in the current nodes and replace them with SSD and reconfig an OSD to use the SSD for caching.
Do this disk-by-disk and make sure your data is safe/in sync before taking out another disk or you will lose data.

It's not recommended to add the cache to the OSD live (if even possible).
You will need to take down and remove the OSD from the pool and reconfigure it with the journal on NVME and caching on the SSD.
After the OSD is added to the pool you could rebalance the pool to even out the OSD usage.

You can reuse the HDD in the new nodes along with the SSD to create ""more of the same / similar nodes"".

TLDR:
SSD+HDD in same node
Reconfig OSD needs removal and re-adding.
Make sure sync / ceph is healty before the next step.
Balance after completion.

#7

Post Reply: Understanding how to add 3 SSD Nodes to a cluster of 3 HDD Nodes

Cancel