Design questions
alienn
37 Posts
September 19, 2017, 11:00 amQuote from alienn on September 19, 2017, 11:00 amHi,
I'm absolutly new to ceph and found PetaSAN during my researches. Reading the forum I found some interesting topics and now I have some questions regarding a possible implementation.
I'm trying to build a redundant and high available storage for my vmWare environment (at the moment for testing purposes only).
I have three systems with 4 disks each.
I would design the storage in the following way:
- each system has 4 OSD
- for each datum there should be 2 replicas (the data should be available three times)
- a rule should place each copy of the data on each system. No two copies of a datum should be on the same node.
Following the above rules I do not need any form of software-/hardwareraid and I should get the most out of the available I/O, diskspace and reliability.
For performance reasons I thought about placing ssd in my esx hosts for caching. Does this make sense?
Is it possible, given the above configuration, that I add a second cluster in a seperate location that stores another copy/copies of the data to be geo redundant? Is there any kind of "replication queue" that caches data for the remote location in case of a saturated uplink? Or does a saturated uplink slow dows the whole cluster?
I hope that someone can point me into the right directions and show me where I probably misunderstood some basic concepts...
Thanks in advance.
Hi,
I'm absolutly new to ceph and found PetaSAN during my researches. Reading the forum I found some interesting topics and now I have some questions regarding a possible implementation.
I'm trying to build a redundant and high available storage for my vmWare environment (at the moment for testing purposes only).
I have three systems with 4 disks each.
I would design the storage in the following way:
- each system has 4 OSD
- for each datum there should be 2 replicas (the data should be available three times)
- a rule should place each copy of the data on each system. No two copies of a datum should be on the same node.
Following the above rules I do not need any form of software-/hardwareraid and I should get the most out of the available I/O, diskspace and reliability.
For performance reasons I thought about placing ssd in my esx hosts for caching. Does this make sense?
Is it possible, given the above configuration, that I add a second cluster in a seperate location that stores another copy/copies of the data to be geo redundant? Is there any kind of "replication queue" that caches data for the remote location in case of a saturated uplink? Or does a saturated uplink slow dows the whole cluster?
I hope that someone can point me into the right directions and show me where I probably misunderstood some basic concepts...
Thanks in advance.
admin
2,930 Posts
September 19, 2017, 6:22 pmQuote from admin on September 19, 2017, 6:22 pmIf you need the data to be in 3 places, you need to specify a replica count of 3 not 2..in Ceph the replica count is the total number of copies including original. Ceph will not place 2 copies on the same node, they will be distributed. In your case since you have 3 nodes each piece of data, to be more excat each 4M stripe/sector of each rbd/iSCSI disk, will be placed on each of your servers..but if you had more servers, each 4M stripe will be placed on any 3 servers which could be different from stripe to stripe. The exact placement is determined by ceph's crush algorithm which can be customized.
Hardware varies widely so it is difficult to give a simple advise. There are also different grades of SSD ranging from consumer to enterprise grade which vary in durability and performance, same applies your other hardware such as cpu/network. Generally 4 SSDs is a good number to start but the more exact way is to use PetaSAN's cluster benchmrak page: it will tell you what your current configuration is capable of in terms of iops and MB/s but also what are the current bottlenecks: If you have low cpu% and high disk %utilization it means your cpu is not utilized to its full capacity and your disks are slowing you..in this case you can add more disks and your performance will increase almost linearly, if it is the opposite then you are not fully getting the performance of your extra disks and need to put them in another node. So it is a very handy way to tune your cluster in an easy way.
I cannot give you a good advise on the ESX cache issue.
For geo-replication: It is not currently supported in PetaSAN but definitely in the future. Ceph does support a couple of ways such as async incremental snapshots, async rbd mirroring and in some cases if you have very low latency between your data centers you can actually customize your replicas to be placed on both places in sync fashion. So we plan to do it and also have in our roadmap to mirror disks to cloud providers like Amazon/Azure...down the road.
If you need the data to be in 3 places, you need to specify a replica count of 3 not 2..in Ceph the replica count is the total number of copies including original. Ceph will not place 2 copies on the same node, they will be distributed. In your case since you have 3 nodes each piece of data, to be more excat each 4M stripe/sector of each rbd/iSCSI disk, will be placed on each of your servers..but if you had more servers, each 4M stripe will be placed on any 3 servers which could be different from stripe to stripe. The exact placement is determined by ceph's crush algorithm which can be customized.
Hardware varies widely so it is difficult to give a simple advise. There are also different grades of SSD ranging from consumer to enterprise grade which vary in durability and performance, same applies your other hardware such as cpu/network. Generally 4 SSDs is a good number to start but the more exact way is to use PetaSAN's cluster benchmrak page: it will tell you what your current configuration is capable of in terms of iops and MB/s but also what are the current bottlenecks: If you have low cpu% and high disk %utilization it means your cpu is not utilized to its full capacity and your disks are slowing you..in this case you can add more disks and your performance will increase almost linearly, if it is the opposite then you are not fully getting the performance of your extra disks and need to put them in another node. So it is a very handy way to tune your cluster in an easy way.
I cannot give you a good advise on the ESX cache issue.
For geo-replication: It is not currently supported in PetaSAN but definitely in the future. Ceph does support a couple of ways such as async incremental snapshots, async rbd mirroring and in some cases if you have very low latency between your data centers you can actually customize your replicas to be placed on both places in sync fashion. So we plan to do it and also have in our roadmap to mirror disks to cloud providers like Amazon/Azure...down the road.
Last edited on September 19, 2017, 6:34 pm by admin · #2
Design questions
alienn
37 Posts
Quote from alienn on September 19, 2017, 11:00 amHi,
I'm absolutly new to ceph and found PetaSAN during my researches. Reading the forum I found some interesting topics and now I have some questions regarding a possible implementation.
I'm trying to build a redundant and high available storage for my vmWare environment (at the moment for testing purposes only).
I have three systems with 4 disks each.
I would design the storage in the following way:
- each system has 4 OSD
- for each datum there should be 2 replicas (the data should be available three times)
- a rule should place each copy of the data on each system. No two copies of a datum should be on the same node.
Following the above rules I do not need any form of software-/hardwareraid and I should get the most out of the available I/O, diskspace and reliability.
For performance reasons I thought about placing ssd in my esx hosts for caching. Does this make sense?
Is it possible, given the above configuration, that I add a second cluster in a seperate location that stores another copy/copies of the data to be geo redundant? Is there any kind of "replication queue" that caches data for the remote location in case of a saturated uplink? Or does a saturated uplink slow dows the whole cluster?
I hope that someone can point me into the right directions and show me where I probably misunderstood some basic concepts...
Thanks in advance.
Hi,
I'm absolutly new to ceph and found PetaSAN during my researches. Reading the forum I found some interesting topics and now I have some questions regarding a possible implementation.
I'm trying to build a redundant and high available storage for my vmWare environment (at the moment for testing purposes only).
I have three systems with 4 disks each.
I would design the storage in the following way:
- each system has 4 OSD
- for each datum there should be 2 replicas (the data should be available three times)
- a rule should place each copy of the data on each system. No two copies of a datum should be on the same node.
Following the above rules I do not need any form of software-/hardwareraid and I should get the most out of the available I/O, diskspace and reliability.
For performance reasons I thought about placing ssd in my esx hosts for caching. Does this make sense?
Is it possible, given the above configuration, that I add a second cluster in a seperate location that stores another copy/copies of the data to be geo redundant? Is there any kind of "replication queue" that caches data for the remote location in case of a saturated uplink? Or does a saturated uplink slow dows the whole cluster?
I hope that someone can point me into the right directions and show me where I probably misunderstood some basic concepts...
Thanks in advance.
admin
2,930 Posts
Quote from admin on September 19, 2017, 6:22 pmIf you need the data to be in 3 places, you need to specify a replica count of 3 not 2..in Ceph the replica count is the total number of copies including original. Ceph will not place 2 copies on the same node, they will be distributed. In your case since you have 3 nodes each piece of data, to be more excat each 4M stripe/sector of each rbd/iSCSI disk, will be placed on each of your servers..but if you had more servers, each 4M stripe will be placed on any 3 servers which could be different from stripe to stripe. The exact placement is determined by ceph's crush algorithm which can be customized.
Hardware varies widely so it is difficult to give a simple advise. There are also different grades of SSD ranging from consumer to enterprise grade which vary in durability and performance, same applies your other hardware such as cpu/network. Generally 4 SSDs is a good number to start but the more exact way is to use PetaSAN's cluster benchmrak page: it will tell you what your current configuration is capable of in terms of iops and MB/s but also what are the current bottlenecks: If you have low cpu% and high disk %utilization it means your cpu is not utilized to its full capacity and your disks are slowing you..in this case you can add more disks and your performance will increase almost linearly, if it is the opposite then you are not fully getting the performance of your extra disks and need to put them in another node. So it is a very handy way to tune your cluster in an easy way.
I cannot give you a good advise on the ESX cache issue.
For geo-replication: It is not currently supported in PetaSAN but definitely in the future. Ceph does support a couple of ways such as async incremental snapshots, async rbd mirroring and in some cases if you have very low latency between your data centers you can actually customize your replicas to be placed on both places in sync fashion. So we plan to do it and also have in our roadmap to mirror disks to cloud providers like Amazon/Azure...down the road.
If you need the data to be in 3 places, you need to specify a replica count of 3 not 2..in Ceph the replica count is the total number of copies including original. Ceph will not place 2 copies on the same node, they will be distributed. In your case since you have 3 nodes each piece of data, to be more excat each 4M stripe/sector of each rbd/iSCSI disk, will be placed on each of your servers..but if you had more servers, each 4M stripe will be placed on any 3 servers which could be different from stripe to stripe. The exact placement is determined by ceph's crush algorithm which can be customized.
Hardware varies widely so it is difficult to give a simple advise. There are also different grades of SSD ranging from consumer to enterprise grade which vary in durability and performance, same applies your other hardware such as cpu/network. Generally 4 SSDs is a good number to start but the more exact way is to use PetaSAN's cluster benchmrak page: it will tell you what your current configuration is capable of in terms of iops and MB/s but also what are the current bottlenecks: If you have low cpu% and high disk %utilization it means your cpu is not utilized to its full capacity and your disks are slowing you..in this case you can add more disks and your performance will increase almost linearly, if it is the opposite then you are not fully getting the performance of your extra disks and need to put them in another node. So it is a very handy way to tune your cluster in an easy way.
I cannot give you a good advise on the ESX cache issue.
For geo-replication: It is not currently supported in PetaSAN but definitely in the future. Ceph does support a couple of ways such as async incremental snapshots, async rbd mirroring and in some cases if you have very low latency between your data centers you can actually customize your replicas to be placed on both places in sync fashion. So we plan to do it and also have in our roadmap to mirror disks to cloud providers like Amazon/Azure...down the road.