Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Cluster to cluster replication

I am moving our SAN to PetaSAN to address several issues of our storage system and while I am at it I am implementing HA for our VM servers. Since we have two locations that are supposed to provide fail-over for each other, I am planning on placing another full cluster there as well.

I know I can replicate a cluster using rsync (got this to work using VMs) but that requires either manually running the commands or making a cron script. Both are not realtime.

Would lsyncd work? I havent used it before nor had any data that needed realtime replication at the block level.

Any ideas?

 

We do plan cluster to cluster replication down the road, but it is a couple of months away.

I dot know about lsyncd.

You can use rsync in cron script to sync from an rbd image snapshot.

You can also use incremental snapshots:

https://ceph.com/geen-categorie/incremental-snapshots-with-rbd/

 

There are several issues with rsync.

Rsync on cron is still 60 seconds plus sync time behind the write. Lsyncd from what Ive read is a near realtime asynchronous sync scheduler.

I am not sure how rdb snapshots would work as you have to create the snapshot, transfer the snapshot and then revert the second cluster to this snapshot. This is over complicated, time consuming and does not allow for a master/master replication that is required in cross datacenter applications.

This is why I am asking the community for their ideas and experience with replication accross datacenters. I am still learning how ceph works and hoe the data is writen to the OSDs. I know that the current petasan version uses Ceph jewel and that it uses a file system like xfs. Bluestore looks like it has a replication application already that we can tap into.

why don't try rbd mirror ?

rbd mirroring is not yet supported in linux kernel

rbd also requires a low latency high bandwidth inter data center connection, I currently have a 100Mbps/FD link between data centers, but this is also used by other processes for high availability control and access. rbd is also not a real time sync system. you must create a mirror schedule and have the bandwidth to carry the mirror data. I am looking for near real time syncing of new writes. rsync if called will sync only the changes but with cron windows of 60 seconds, it does not work. sending the journals and meta data would work but if the link gets congested then they too will fail or not get updated.

I have been toying with NFS replication, which does work, but it has a very high bandwidth requirement to be close to real time.

still looking for the answer, maybe my constraints are too stringent?

First, I know this is not a PetaSAN approved configuration, but that I not what I am asking and the provided information is only to illustrate the process and constraints that I am facing.

on server cluster(per server config): 4 1Gbps Ethernet bonded (management and client data over vlans), 2 56Gbps Infiniband (IPoIB configured, SAN network interconnect) The Infiniband connection is setup as a very fast, very low latency IP network. average ping is <0.01ms Each server runs Xenserver and all VMs boot off SAN based disk images.

Storage cluster (per server config): 2 1Gbps Ethernet bonded (management only), 2 56Gbps Infiniband (IPoIB configured, iSCSI targets are on this network as well as all cluster conntections)

both data centers have an identical configuration and have a 100Mbps FD link between them. The link carries HA data, our management data, syslog replication (which will be moved to the SAN cluster once everything is able to replicate properly), internal VOIP and some small services. On average the link has 60% capacity available. We can not get anymore bandwidth between these centers at this time, but there are plans for more once the carrier upgrades (read not going to happen any time soon and we do not have alternative options)

The pipe across your data centers is the culprit. No matter what method you use to replicate, it is limited to 60Mbps so it cannot be used for writes across all your vms totaling 7MB/s which makes it probably not an option. For  now you would need to transport the snapshots to the other center using other means,

60Mbps is more than my combined write speed requirements. Speed of the network should only matter once a method has been settled on. Since I have 60Mbps Full Duplex available and a latency of 4ms average (peaks to 11ms and I have seen <1ms at times) I do not think it is unreasonable to be able to form a replication over this link. Sure the first time a cluster is brought online it is going to saturate the link, be slow and take a fairly long time to reach a suitable level of synchronization. But that is expected, and is a cost of ensuring that the data is safe and available in multiple locations.

I am looking at methods other than cron based sync to be just behind the live write so that in the event of a major failure the storage system is almost up to date. Right now a rbd mirror image takes 8mins to create, 2mins to transfer and just over 5 to migrate into the existing image on the other cluster. That is 15 mins, yes it is only 10mins behind per image, but 10mins is a lot of data to loose. Rsync can run every minute with cron, but if an rsync take longer than 59 seconds, it skips running due to how rsync behaves with itself. This is better than 10mins, but still is a lot of lost data.

After testing lsyncd, I found that though it does work for file systems, it can not do block device systems.  This leaves really only one method available: DRBD. I am in the process of getting this working on my PetaSAN VMs and do not expect progress to be fast but I will update this thread as I do make progress.