Storage option available in Ceph
Pages: 1 2
the.only.chaos.lucifer
31 Posts
January 12, 2024, 1:15 amQuote from the.only.chaos.lucifer on January 12, 2024, 1:15 amOut of the storage option available. I don't quite understand which option will let me store website data at multiple location (on different subnet).
- Let said client A write to website and website store data to ceph/petasan at location A. Client B write to location B.
-Client A will be able to see location B data on the website once it is synced... and same for Client B being able to see location A data.
How is this configured in PetaSAN? Pretty much it would be a storage repository that sync data correct (or async as it's not necessarily instantaneous)? I not sure how website work over different location so please bare with me. I only ever put up website at single location with single database and storage so it's easy but having HA is a different ball game for me.
Out of the storage option available. I don't quite understand which option will let me store website data at multiple location (on different subnet).
- Let said client A write to website and website store data to ceph/petasan at location A. Client B write to location B.
-Client A will be able to see location B data on the website once it is synced... and same for Client B being able to see location A data.
How is this configured in PetaSAN? Pretty much it would be a storage repository that sync data correct (or async as it's not necessarily instantaneous)? I not sure how website work over different location so please bare with me. I only ever put up website at single location with single database and storage so it's easy but having HA is a different ball game for me.
admin
2,930 Posts
January 12, 2024, 4:35 amQuote from admin on January 12, 2024, 4:35 amYou really have to think/define your application requirements in more detail, example: why do you need your website to be stored in more than 1 location ? is it for disaster recovery ? for serving clients in different geographies with less latencies ? how far are the sites apart ? can you have async replication between sites and are ok with data inconsistency ? or you need consistent synced data even at expense of hit in performance latency/iops ? does your server application allow active/active setups running in different sites ? will you be using a database or some sort ? is it a sql db or non sql ? does the db allow active/active ? db allow async data or must be consistent ? is it an existing application that expects a posix file system ? or new application you will develop and can use s3 ?
You really have to think/define your application requirements in more detail, example: why do you need your website to be stored in more than 1 location ? is it for disaster recovery ? for serving clients in different geographies with less latencies ? how far are the sites apart ? can you have async replication between sites and are ok with data inconsistency ? or you need consistent synced data even at expense of hit in performance latency/iops ? does your server application allow active/active setups running in different sites ? will you be using a database or some sort ? is it a sql db or non sql ? does the db allow active/active ? db allow async data or must be consistent ? is it an existing application that expects a posix file system ? or new application you will develop and can use s3 ?
the.only.chaos.lucifer
31 Posts
January 12, 2024, 6:22 pmQuote from the.only.chaos.lucifer on January 12, 2024, 6:22 pmReally appreciate your detail help 🙂
why do you need your website to be stored in more than 1 location? is it for disaster recovery ? for serving clients in different geographies with less latencies ? how far are the sites apart ? can you have async replication between sites and are ok with data inconsistency ? or you need consistent synced data even at expense of hit in performance latency/iops?
Data consistency is pretty important as I plan to have a forum, ecommerce, and more sites up if possible. The local access to the data is a plus. So I believe data need to be consistent across location. You don't want item sold and still show as being up on the site in another location.
does your server application allow active/active setups running in different sites?
Sadly don't think the webserver has HA capabilities but thinking of making the database it access and storage it access be HA. So any change it make to DB and storage will be what make the site dynamic.
Let me know if this a is a impossible or possible thing if I make the storage and database Active/Active?
From my understanding the frontend of the site can be static only database and stored files needs to change.
will you be using a database or some sort ? is it a sql db or non sql ? does the db allow active/active? db allow async data or must be consistent?
Plan to use MariaDB HA database which is suppose to be Active/Active
is it an existing application that expects a posix file system ? or new application you will develop and can use s3?
Will have to look into this more by ur meaning of posix/s3...
Really appreciate your detail help 🙂
why do you need your website to be stored in more than 1 location? is it for disaster recovery ? for serving clients in different geographies with less latencies ? how far are the sites apart ? can you have async replication between sites and are ok with data inconsistency ? or you need consistent synced data even at expense of hit in performance latency/iops?
Data consistency is pretty important as I plan to have a forum, ecommerce, and more sites up if possible. The local access to the data is a plus. So I believe data need to be consistent across location. You don't want item sold and still show as being up on the site in another location.
does your server application allow active/active setups running in different sites?
Sadly don't think the webserver has HA capabilities but thinking of making the database it access and storage it access be HA. So any change it make to DB and storage will be what make the site dynamic.
Let me know if this a is a impossible or possible thing if I make the storage and database Active/Active?
From my understanding the frontend of the site can be static only database and stored files needs to change.
will you be using a database or some sort ? is it a sql db or non sql ? does the db allow active/active? db allow async data or must be consistent?
Plan to use MariaDB HA database which is suppose to be Active/Active
is it an existing application that expects a posix file system ? or new application you will develop and can use s3?
Will have to look into this more by ur meaning of posix/s3...
the.only.chaos.lucifer
31 Posts
January 12, 2024, 6:31 pmQuote from the.only.chaos.lucifer on January 12, 2024, 6:31 pmCurrently setup my subnet(s) from 1 location to be able to access another location through a private tunnel but since they were different subnet PetaSAN can only ping each other but they can't be in a cluster as backend complain.
I understand this probably won't work if you do this as there is too much data being transfer back and forward.
But if it is just basic database data and image file from the website once in a while the low volume of data should work I believe. As I am not transferring VM data or something big like that.
Thanks!!!
Currently setup my subnet(s) from 1 location to be able to access another location through a private tunnel but since they were different subnet PetaSAN can only ping each other but they can't be in a cluster as backend complain.
I understand this probably won't work if you do this as there is too much data being transfer back and forward.
But if it is just basic database data and image file from the website once in a while the low volume of data should work I believe. As I am not transferring VM data or something big like that.
Thanks!!!
admin
2,930 Posts
January 12, 2024, 7:25 pmQuote from admin on January 12, 2024, 7:25 pmif you need data consistence, you need sync updates rather than async, so you should create 1 cluster spanning 2 locations.
I don;t think you answered why do you need more than 1 site, but if it is for disaster recovery, the you need to setup a crush rule to insure 1 site can function on its own.
You probably should be using cephfs.
vpn should allow both sites to connect securely.
Good luck.
if you need data consistence, you need sync updates rather than async, so you should create 1 cluster spanning 2 locations.
I don;t think you answered why do you need more than 1 site, but if it is for disaster recovery, the you need to setup a crush rule to insure 1 site can function on its own.
You probably should be using cephfs.
vpn should allow both sites to connect securely.
Good luck.
the.only.chaos.lucifer
31 Posts
January 12, 2024, 9:36 pmQuote from the.only.chaos.lucifer on January 12, 2024, 9:36 pmThanks for clarifying. It was really helpful. Oops guess I probably didn't quite answer the first question but your answer is definitely what I would like.
So it seem like setting up the crush rule is key to make sure 1 site can function all on it's own and when it is back up it start syncing data over to make sure it is all back up and consistent. The plan is to like you said create a cluster spanning both location.
During setup I notice the petasan cluster only want a single subnet. I can have frontend on different subnet and petasan doesn't complain but backend needs onto be on same subnet. If so how do I create a cluster spanning both location?
Or is it a crush rule I need to create that let 2 cluster be in separate subnet and still sync data (not async). Or is there something I can do on the VPN side. Which i am not sure what is the correct answer. As far as I know a VPN private tunnel which I have only bridge the subnets and let clients communicate but not combine it and make it into a single subnet for the backend to communicate.
Is there somewhere I can read more about all these setup as I went to ceph site for more detail but mostly only see the setup of ceph and through the whole documentation as far as I can tell it's just how to get it up and running.
Thanks for clarifying. It was really helpful. Oops guess I probably didn't quite answer the first question but your answer is definitely what I would like.
So it seem like setting up the crush rule is key to make sure 1 site can function all on it's own and when it is back up it start syncing data over to make sure it is all back up and consistent. The plan is to like you said create a cluster spanning both location.
During setup I notice the petasan cluster only want a single subnet. I can have frontend on different subnet and petasan doesn't complain but backend needs onto be on same subnet. If so how do I create a cluster spanning both location?
Or is it a crush rule I need to create that let 2 cluster be in separate subnet and still sync data (not async). Or is there something I can do on the VPN side. Which i am not sure what is the correct answer. As far as I know a VPN private tunnel which I have only bridge the subnets and let clients communicate but not combine it and make it into a single subnet for the backend to communicate.
Is there somewhere I can read more about all these setup as I went to ceph site for more detail but mostly only see the setup of ceph and through the whole documentation as far as I can tell it's just how to get it up and running.
the.only.chaos.lucifer
31 Posts
January 18, 2024, 8:01 amQuote from the.only.chaos.lucifer on January 18, 2024, 8:01 am@admin
After spending the last few day reading through Ceph docs and more online to better understand your response. From my understanding if it is a site like a forum or something async is sufficient. But for site like ecommerce it should have sync data if you want inventory/data to be reflected correctly over different location from my understanding. Correct me if i am wrong. 🙂 Also if you have suggested read let me know where to begin as I barely delve in to Ceph like 2 month ago and still clueless.
From what I can gather from reading Ceph documentation, I think PetaSAN has some if not all of these in the GUI correct or is it something i need to do cmd line driven?
- Asynchronous Data Replication (default setup?): You can configure asynchronous replication mechanisms between different Ceph clusters. This involves periodically copying or syncing data from one cluster to another, typically using tools like RBD mirroring or object gateway synchronization (not sure where this is in the gui or is it all cmd line driven). However, it's important to note that this is not real-time bidirectional replication, but rather a one-way replication process from a primary to a secondary cluster. Think this is the default replication in the (Replication Guide).
- Synchronous Data Replication (cmd line setup?): Not even sure where to begin this process/setup? But this will make sure all data arrive at all the OSDs that will be storing the data before the data is consider written. Takes longer as all copies of the data need to arrive at its location and confirmed before the data is consider saved to the cluster. (Think this is the same as the GEO-REPLICATION I desire below.)
(???Guide???)
- Global Object Distribution (in GUI): With this approach, you can distribute objects across multiple Ceph clusters by leveraging Ceph's CRUSH (Controlled Replication Under Scalable Hashing) algorithm. The CRUSH algorithm determines the placement of objects across the storage cluster based on defined rules. By configuring the CRUSH rules and mapping them to different clusters, you can ensure that data is distributed across multiple clusters, allowing for simultaneous access. Should be under the following tab.
- CONFIGURATION > CRUSH > (BUCKETS TREE / RULES)
(Crush Map Guide)
- Geo-replication (cmd line setup?): Geo-replication is a feature offered by some Ceph distributions or additional tools. It enables bidirectional replication and synchronization of data between Ceph clusters at different locations. It typically involves real-time (well nothing is really instantly) or near-real-time data replication to maintain consistency between the clusters. (Correct me if i am wrong but doing think this is available in PetaSAN at of this time correct? Only certain CEPH distributions has this but if we do cmd line this may be possible? GUI doesn't have these feature of GEO-Replication but it does have Replication but this is 1 way replication?)
(???Guide???)
@admin
After spending the last few day reading through Ceph docs and more online to better understand your response. From my understanding if it is a site like a forum or something async is sufficient. But for site like ecommerce it should have sync data if you want inventory/data to be reflected correctly over different location from my understanding. Correct me if i am wrong. 🙂 Also if you have suggested read let me know where to begin as I barely delve in to Ceph like 2 month ago and still clueless.
From what I can gather from reading Ceph documentation, I think PetaSAN has some if not all of these in the GUI correct or is it something i need to do cmd line driven?
- Asynchronous Data Replication (default setup?): You can configure asynchronous replication mechanisms between different Ceph clusters. This involves periodically copying or syncing data from one cluster to another, typically using tools like RBD mirroring or object gateway synchronization (not sure where this is in the gui or is it all cmd line driven). However, it's important to note that this is not real-time bidirectional replication, but rather a one-way replication process from a primary to a secondary cluster. Think this is the default replication in the (Replication Guide).
- Synchronous Data Replication (cmd line setup?): Not even sure where to begin this process/setup? But this will make sure all data arrive at all the OSDs that will be storing the data before the data is consider written. Takes longer as all copies of the data need to arrive at its location and confirmed before the data is consider saved to the cluster. (Think this is the same as the GEO-REPLICATION I desire below.)
(???Guide???)
- Global Object Distribution (in GUI): With this approach, you can distribute objects across multiple Ceph clusters by leveraging Ceph's CRUSH (Controlled Replication Under Scalable Hashing) algorithm. The CRUSH algorithm determines the placement of objects across the storage cluster based on defined rules. By configuring the CRUSH rules and mapping them to different clusters, you can ensure that data is distributed across multiple clusters, allowing for simultaneous access. Should be under the following tab.
- CONFIGURATION > CRUSH > (BUCKETS TREE / RULES)
(Crush Map Guide)
- Geo-replication (cmd line setup?): Geo-replication is a feature offered by some Ceph distributions or additional tools. It enables bidirectional replication and synchronization of data between Ceph clusters at different locations. It typically involves real-time (well nothing is really instantly) or near-real-time data replication to maintain consistency between the clusters. (Correct me if i am wrong but doing think this is available in PetaSAN at of this time correct? Only certain CEPH distributions has this but if we do cmd line this may be possible? GUI doesn't have these feature of GEO-Replication but it does have Replication but this is 1 way replication?)
(???Guide???)
Last edited on January 18, 2024, 8:09 am by the.only.chaos.lucifer · #7
the.only.chaos.lucifer
31 Posts
January 18, 2024, 9:43 amQuote from the.only.chaos.lucifer on January 18, 2024, 9:43 amFrom what I understand this is very similar to what I would like to setup (MinIO Active-Active Replication)
I believe Ceph Rados Gateway is the answer I would like but could be wrong.
- Ceph Rados Gateway: Since I am using Ceph (PetaSAN) for primary storage, Ceph Rados Gateway (RGW) can handle active-active replication natively if i understand correctly. RGW supports multi-site configurations and data replication across geographically distributed clusters. Think I can configure multiple RGW instances in different sites and set up replication policies to achieve active-active replication? Think this solution leverages the capabilities of Ceph and does not require any additional software? Its really something I always wanted to tried and thought it is cool.
From what I understand this is very similar to what I would like to setup (MinIO Active-Active Replication)
I believe Ceph Rados Gateway is the answer I would like but could be wrong.
- Ceph Rados Gateway: Since I am using Ceph (PetaSAN) for primary storage, Ceph Rados Gateway (RGW) can handle active-active replication natively if i understand correctly. RGW supports multi-site configurations and data replication across geographically distributed clusters. Think I can configure multiple RGW instances in different sites and set up replication policies to achieve active-active replication? Think this solution leverages the capabilities of Ceph and does not require any additional software? Its really something I always wanted to tried and thought it is cool.
admin
2,930 Posts
January 18, 2024, 12:19 pmQuote from admin on January 18, 2024, 12:19 pmYes Ceph RGW handles active/active replication between multiple sites, each with its own cluster installation, replications done async so that connection to geographically remote site does not block, this is also the same as MinIO you mentioned. As discussed, if you need sync you need to setup a single cluster setup across the different locations and use crush rule to control replica distribution among sites. PetaSAN allows you to do both.
I had asked earlier why you need a multi site setup, and still not clear for me. Typically there will be some compelling reason to do so and will need to be thought at various levels, not just at storage levels.
Yes Ceph RGW handles active/active replication between multiple sites, each with its own cluster installation, replications done async so that connection to geographically remote site does not block, this is also the same as MinIO you mentioned. As discussed, if you need sync you need to setup a single cluster setup across the different locations and use crush rule to control replica distribution among sites. PetaSAN allows you to do both.
I had asked earlier why you need a multi site setup, and still not clear for me. Typically there will be some compelling reason to do so and will need to be thought at various levels, not just at storage levels.
Last edited on January 18, 2024, 12:22 pm by admin · #9
the.only.chaos.lucifer
31 Posts
January 18, 2024, 6:52 pmQuote from the.only.chaos.lucifer on January 18, 2024, 6:52 pmNot sure if this is sufficient to answer your question why the multi-site setup: "I plan to do a multi-site setup so I can maintain website data across different location. At least that is from my understanding it is for data availability and also enable user to edit from 2nd site as well (Bidirectional data transfer)."
Unless you mean the website need to support HA (thought it doesn't have too but could be wrong) and the database needs to support HA in active active (pretty sure it has to support it as storage don't deal with this). Last storage need to maintain the images, files, etc. so maybe asynchronous is ok?
I could be going about this the wrong way but I want to make sure a website if hosted and one location goes down the data in the 2nd location will serve the webserver and mariadb still can operate and still store the data to the ceph cluster. As the website can be mostly static, the mariadb will be active active to maintain the database. But the images, files, etc. for the website cannot be maintain except through an active active structure correct? Maybe asynchronous may be a option too but not sure yet.
I am definitely open to asynchronous setup as well just want to make sure I am going with the right approach. Just curious how do I go about this. Is there a good read on l the setup? I am definitely open to both setup just to do testing too.
Not sure if this is sufficient to answer your question why the multi-site setup: "I plan to do a multi-site setup so I can maintain website data across different location. At least that is from my understanding it is for data availability and also enable user to edit from 2nd site as well (Bidirectional data transfer)."
Unless you mean the website need to support HA (thought it doesn't have too but could be wrong) and the database needs to support HA in active active (pretty sure it has to support it as storage don't deal with this). Last storage need to maintain the images, files, etc. so maybe asynchronous is ok?
I could be going about this the wrong way but I want to make sure a website if hosted and one location goes down the data in the 2nd location will serve the webserver and mariadb still can operate and still store the data to the ceph cluster. As the website can be mostly static, the mariadb will be active active to maintain the database. But the images, files, etc. for the website cannot be maintain except through an active active structure correct? Maybe asynchronous may be a option too but not sure yet.
I am definitely open to asynchronous setup as well just want to make sure I am going with the right approach. Just curious how do I go about this. Is there a good read on l the setup? I am definitely open to both setup just to do testing too.
Last edited on January 18, 2024, 7:17 pm by the.only.chaos.lucifer · #10
Pages: 1 2
Storage option available in Ceph
the.only.chaos.lucifer
31 Posts
Quote from the.only.chaos.lucifer on January 12, 2024, 1:15 amOut of the storage option available. I don't quite understand which option will let me store website data at multiple location (on different subnet).
- Let said client A write to website and website store data to ceph/petasan at location A. Client B write to location B.
-Client A will be able to see location B data on the website once it is synced... and same for Client B being able to see location A data.
How is this configured in PetaSAN? Pretty much it would be a storage repository that sync data correct (or async as it's not necessarily instantaneous)? I not sure how website work over different location so please bare with me. I only ever put up website at single location with single database and storage so it's easy but having HA is a different ball game for me.
Out of the storage option available. I don't quite understand which option will let me store website data at multiple location (on different subnet).
- Let said client A write to website and website store data to ceph/petasan at location A. Client B write to location B.
-Client A will be able to see location B data on the website once it is synced... and same for Client B being able to see location A data.
How is this configured in PetaSAN? Pretty much it would be a storage repository that sync data correct (or async as it's not necessarily instantaneous)? I not sure how website work over different location so please bare with me. I only ever put up website at single location with single database and storage so it's easy but having HA is a different ball game for me.
admin
2,930 Posts
Quote from admin on January 12, 2024, 4:35 amYou really have to think/define your application requirements in more detail, example: why do you need your website to be stored in more than 1 location ? is it for disaster recovery ? for serving clients in different geographies with less latencies ? how far are the sites apart ? can you have async replication between sites and are ok with data inconsistency ? or you need consistent synced data even at expense of hit in performance latency/iops ? does your server application allow active/active setups running in different sites ? will you be using a database or some sort ? is it a sql db or non sql ? does the db allow active/active ? db allow async data or must be consistent ? is it an existing application that expects a posix file system ? or new application you will develop and can use s3 ?
You really have to think/define your application requirements in more detail, example: why do you need your website to be stored in more than 1 location ? is it for disaster recovery ? for serving clients in different geographies with less latencies ? how far are the sites apart ? can you have async replication between sites and are ok with data inconsistency ? or you need consistent synced data even at expense of hit in performance latency/iops ? does your server application allow active/active setups running in different sites ? will you be using a database or some sort ? is it a sql db or non sql ? does the db allow active/active ? db allow async data or must be consistent ? is it an existing application that expects a posix file system ? or new application you will develop and can use s3 ?
the.only.chaos.lucifer
31 Posts
Quote from the.only.chaos.lucifer on January 12, 2024, 6:22 pmReally appreciate your detail help 🙂
why do you need your website to be stored in more than 1 location? is it for disaster recovery ? for serving clients in different geographies with less latencies ? how far are the sites apart ? can you have async replication between sites and are ok with data inconsistency ? or you need consistent synced data even at expense of hit in performance latency/iops?
Data consistency is pretty important as I plan to have a forum, ecommerce, and more sites up if possible. The local access to the data is a plus. So I believe data need to be consistent across location. You don't want item sold and still show as being up on the site in another location.
does your server application allow active/active setups running in different sites?
Sadly don't think the webserver has HA capabilities but thinking of making the database it access and storage it access be HA. So any change it make to DB and storage will be what make the site dynamic.
Let me know if this a is a impossible or possible thing if I make the storage and database Active/Active?
From my understanding the frontend of the site can be static only database and stored files needs to change.
will you be using a database or some sort ? is it a sql db or non sql ? does the db allow active/active? db allow async data or must be consistent?
Plan to use MariaDB HA database which is suppose to be Active/Active
is it an existing application that expects a posix file system ? or new application you will develop and can use s3?
Will have to look into this more by ur meaning of posix/s3...
Really appreciate your detail help 🙂
why do you need your website to be stored in more than 1 location? is it for disaster recovery ? for serving clients in different geographies with less latencies ? how far are the sites apart ? can you have async replication between sites and are ok with data inconsistency ? or you need consistent synced data even at expense of hit in performance latency/iops?
Data consistency is pretty important as I plan to have a forum, ecommerce, and more sites up if possible. The local access to the data is a plus. So I believe data need to be consistent across location. You don't want item sold and still show as being up on the site in another location.
does your server application allow active/active setups running in different sites?
Sadly don't think the webserver has HA capabilities but thinking of making the database it access and storage it access be HA. So any change it make to DB and storage will be what make the site dynamic.
Let me know if this a is a impossible or possible thing if I make the storage and database Active/Active?
From my understanding the frontend of the site can be static only database and stored files needs to change.
will you be using a database or some sort ? is it a sql db or non sql ? does the db allow active/active? db allow async data or must be consistent?
Plan to use MariaDB HA database which is suppose to be Active/Active
is it an existing application that expects a posix file system ? or new application you will develop and can use s3?
Will have to look into this more by ur meaning of posix/s3...
the.only.chaos.lucifer
31 Posts
Quote from the.only.chaos.lucifer on January 12, 2024, 6:31 pmCurrently setup my subnet(s) from 1 location to be able to access another location through a private tunnel but since they were different subnet PetaSAN can only ping each other but they can't be in a cluster as backend complain.
I understand this probably won't work if you do this as there is too much data being transfer back and forward.
But if it is just basic database data and image file from the website once in a while the low volume of data should work I believe. As I am not transferring VM data or something big like that.
Thanks!!!
Currently setup my subnet(s) from 1 location to be able to access another location through a private tunnel but since they were different subnet PetaSAN can only ping each other but they can't be in a cluster as backend complain.
I understand this probably won't work if you do this as there is too much data being transfer back and forward.
But if it is just basic database data and image file from the website once in a while the low volume of data should work I believe. As I am not transferring VM data or something big like that.
Thanks!!!
admin
2,930 Posts
Quote from admin on January 12, 2024, 7:25 pmif you need data consistence, you need sync updates rather than async, so you should create 1 cluster spanning 2 locations.
I don;t think you answered why do you need more than 1 site, but if it is for disaster recovery, the you need to setup a crush rule to insure 1 site can function on its own.
You probably should be using cephfs.
vpn should allow both sites to connect securely.
Good luck.
if you need data consistence, you need sync updates rather than async, so you should create 1 cluster spanning 2 locations.
I don;t think you answered why do you need more than 1 site, but if it is for disaster recovery, the you need to setup a crush rule to insure 1 site can function on its own.
You probably should be using cephfs.
vpn should allow both sites to connect securely.
Good luck.
the.only.chaos.lucifer
31 Posts
Quote from the.only.chaos.lucifer on January 12, 2024, 9:36 pmThanks for clarifying. It was really helpful. Oops guess I probably didn't quite answer the first question but your answer is definitely what I would like.
So it seem like setting up the crush rule is key to make sure 1 site can function all on it's own and when it is back up it start syncing data over to make sure it is all back up and consistent. The plan is to like you said create a cluster spanning both location.
During setup I notice the petasan cluster only want a single subnet. I can have frontend on different subnet and petasan doesn't complain but backend needs onto be on same subnet. If so how do I create a cluster spanning both location?
Or is it a crush rule I need to create that let 2 cluster be in separate subnet and still sync data (not async). Or is there something I can do on the VPN side. Which i am not sure what is the correct answer. As far as I know a VPN private tunnel which I have only bridge the subnets and let clients communicate but not combine it and make it into a single subnet for the backend to communicate.
Is there somewhere I can read more about all these setup as I went to ceph site for more detail but mostly only see the setup of ceph and through the whole documentation as far as I can tell it's just how to get it up and running.
Thanks for clarifying. It was really helpful. Oops guess I probably didn't quite answer the first question but your answer is definitely what I would like.
So it seem like setting up the crush rule is key to make sure 1 site can function all on it's own and when it is back up it start syncing data over to make sure it is all back up and consistent. The plan is to like you said create a cluster spanning both location.
During setup I notice the petasan cluster only want a single subnet. I can have frontend on different subnet and petasan doesn't complain but backend needs onto be on same subnet. If so how do I create a cluster spanning both location?
Or is it a crush rule I need to create that let 2 cluster be in separate subnet and still sync data (not async). Or is there something I can do on the VPN side. Which i am not sure what is the correct answer. As far as I know a VPN private tunnel which I have only bridge the subnets and let clients communicate but not combine it and make it into a single subnet for the backend to communicate.
Is there somewhere I can read more about all these setup as I went to ceph site for more detail but mostly only see the setup of ceph and through the whole documentation as far as I can tell it's just how to get it up and running.
the.only.chaos.lucifer
31 Posts
Quote from the.only.chaos.lucifer on January 18, 2024, 8:01 am@admin
After spending the last few day reading through Ceph docs and more online to better understand your response. From my understanding if it is a site like a forum or something async is sufficient. But for site like ecommerce it should have sync data if you want inventory/data to be reflected correctly over different location from my understanding. Correct me if i am wrong. 🙂 Also if you have suggested read let me know where to begin as I barely delve in to Ceph like 2 month ago and still clueless.
From what I can gather from reading Ceph documentation, I think PetaSAN has some if not all of these in the GUI correct or is it something i need to do cmd line driven?
- Asynchronous Data Replication (default setup?): You can configure asynchronous replication mechanisms between different Ceph clusters. This involves periodically copying or syncing data from one cluster to another, typically using tools like RBD mirroring or object gateway synchronization (not sure where this is in the gui or is it all cmd line driven). However, it's important to note that this is not real-time bidirectional replication, but rather a one-way replication process from a primary to a secondary cluster. Think this is the default replication in the (Replication Guide).
- Synchronous Data Replication (cmd line setup?): Not even sure where to begin this process/setup? But this will make sure all data arrive at all the OSDs that will be storing the data before the data is consider written. Takes longer as all copies of the data need to arrive at its location and confirmed before the data is consider saved to the cluster. (Think this is the same as the GEO-REPLICATION I desire below.)
(???Guide???)- Global Object Distribution (in GUI): With this approach, you can distribute objects across multiple Ceph clusters by leveraging Ceph's CRUSH (Controlled Replication Under Scalable Hashing) algorithm. The CRUSH algorithm determines the placement of objects across the storage cluster based on defined rules. By configuring the CRUSH rules and mapping them to different clusters, you can ensure that data is distributed across multiple clusters, allowing for simultaneous access. Should be under the following tab.
- CONFIGURATION > CRUSH > (BUCKETS TREE / RULES)
(Crush Map Guide)
- Geo-replication (cmd line setup?): Geo-replication is a feature offered by some Ceph distributions or additional tools. It enables bidirectional replication and synchronization of data between Ceph clusters at different locations. It typically involves real-time (well nothing is really instantly) or near-real-time data replication to maintain consistency between the clusters. (Correct me if i am wrong but doing think this is available in PetaSAN at of this time correct? Only certain CEPH distributions has this but if we do cmd line this may be possible? GUI doesn't have these feature of GEO-Replication but it does have Replication but this is 1 way replication?)
(???Guide???)
@admin
After spending the last few day reading through Ceph docs and more online to better understand your response. From my understanding if it is a site like a forum or something async is sufficient. But for site like ecommerce it should have sync data if you want inventory/data to be reflected correctly over different location from my understanding. Correct me if i am wrong. 🙂 Also if you have suggested read let me know where to begin as I barely delve in to Ceph like 2 month ago and still clueless.
From what I can gather from reading Ceph documentation, I think PetaSAN has some if not all of these in the GUI correct or is it something i need to do cmd line driven?
- Asynchronous Data Replication (default setup?): You can configure asynchronous replication mechanisms between different Ceph clusters. This involves periodically copying or syncing data from one cluster to another, typically using tools like RBD mirroring or object gateway synchronization (not sure where this is in the gui or is it all cmd line driven). However, it's important to note that this is not real-time bidirectional replication, but rather a one-way replication process from a primary to a secondary cluster. Think this is the default replication in the (Replication Guide).
- Synchronous Data Replication (cmd line setup?): Not even sure where to begin this process/setup? But this will make sure all data arrive at all the OSDs that will be storing the data before the data is consider written. Takes longer as all copies of the data need to arrive at its location and confirmed before the data is consider saved to the cluster. (Think this is the same as the GEO-REPLICATION I desire below.)
(???Guide???) - Global Object Distribution (in GUI): With this approach, you can distribute objects across multiple Ceph clusters by leveraging Ceph's CRUSH (Controlled Replication Under Scalable Hashing) algorithm. The CRUSH algorithm determines the placement of objects across the storage cluster based on defined rules. By configuring the CRUSH rules and mapping them to different clusters, you can ensure that data is distributed across multiple clusters, allowing for simultaneous access. Should be under the following tab.
- CONFIGURATION > CRUSH > (BUCKETS TREE / RULES)
(Crush Map Guide)
- CONFIGURATION > CRUSH > (BUCKETS TREE / RULES)
- Geo-replication (cmd line setup?): Geo-replication is a feature offered by some Ceph distributions or additional tools. It enables bidirectional replication and synchronization of data between Ceph clusters at different locations. It typically involves real-time (well nothing is really instantly) or near-real-time data replication to maintain consistency between the clusters. (Correct me if i am wrong but doing think this is available in PetaSAN at of this time correct? Only certain CEPH distributions has this but if we do cmd line this may be possible? GUI doesn't have these feature of GEO-Replication but it does have Replication but this is 1 way replication?)
(???Guide???)
the.only.chaos.lucifer
31 Posts
Quote from the.only.chaos.lucifer on January 18, 2024, 9:43 amFrom what I understand this is very similar to what I would like to setup (MinIO Active-Active Replication)
I believe Ceph Rados Gateway is the answer I would like but could be wrong.
- Ceph Rados Gateway: Since I am using Ceph (PetaSAN) for primary storage, Ceph Rados Gateway (RGW) can handle active-active replication natively if i understand correctly. RGW supports multi-site configurations and data replication across geographically distributed clusters. Think I can configure multiple RGW instances in different sites and set up replication policies to achieve active-active replication? Think this solution leverages the capabilities of Ceph and does not require any additional software? Its really something I always wanted to tried and thought it is cool.
From what I understand this is very similar to what I would like to setup (MinIO Active-Active Replication)
I believe Ceph Rados Gateway is the answer I would like but could be wrong.
- Ceph Rados Gateway: Since I am using Ceph (PetaSAN) for primary storage, Ceph Rados Gateway (RGW) can handle active-active replication natively if i understand correctly. RGW supports multi-site configurations and data replication across geographically distributed clusters. Think I can configure multiple RGW instances in different sites and set up replication policies to achieve active-active replication? Think this solution leverages the capabilities of Ceph and does not require any additional software? Its really something I always wanted to tried and thought it is cool.
admin
2,930 Posts
Quote from admin on January 18, 2024, 12:19 pmYes Ceph RGW handles active/active replication between multiple sites, each with its own cluster installation, replications done async so that connection to geographically remote site does not block, this is also the same as MinIO you mentioned. As discussed, if you need sync you need to setup a single cluster setup across the different locations and use crush rule to control replica distribution among sites. PetaSAN allows you to do both.
I had asked earlier why you need a multi site setup, and still not clear for me. Typically there will be some compelling reason to do so and will need to be thought at various levels, not just at storage levels.
Yes Ceph RGW handles active/active replication between multiple sites, each with its own cluster installation, replications done async so that connection to geographically remote site does not block, this is also the same as MinIO you mentioned. As discussed, if you need sync you need to setup a single cluster setup across the different locations and use crush rule to control replica distribution among sites. PetaSAN allows you to do both.
I had asked earlier why you need a multi site setup, and still not clear for me. Typically there will be some compelling reason to do so and will need to be thought at various levels, not just at storage levels.
the.only.chaos.lucifer
31 Posts
Quote from the.only.chaos.lucifer on January 18, 2024, 6:52 pmNot sure if this is sufficient to answer your question why the multi-site setup: "I plan to do a multi-site setup so I can maintain website data across different location. At least that is from my understanding it is for data availability and also enable user to edit from 2nd site as well (Bidirectional data transfer)."
Unless you mean the website need to support HA (thought it doesn't have too but could be wrong) and the database needs to support HA in active active (pretty sure it has to support it as storage don't deal with this). Last storage need to maintain the images, files, etc. so maybe asynchronous is ok?
I could be going about this the wrong way but I want to make sure a website if hosted and one location goes down the data in the 2nd location will serve the webserver and mariadb still can operate and still store the data to the ceph cluster. As the website can be mostly static, the mariadb will be active active to maintain the database. But the images, files, etc. for the website cannot be maintain except through an active active structure correct? Maybe asynchronous may be a option too but not sure yet.
I am definitely open to asynchronous setup as well just want to make sure I am going with the right approach. Just curious how do I go about this. Is there a good read on l the setup? I am definitely open to both setup just to do testing too.
Not sure if this is sufficient to answer your question why the multi-site setup: "I plan to do a multi-site setup so I can maintain website data across different location. At least that is from my understanding it is for data availability and also enable user to edit from 2nd site as well (Bidirectional data transfer)."
Unless you mean the website need to support HA (thought it doesn't have too but could be wrong) and the database needs to support HA in active active (pretty sure it has to support it as storage don't deal with this). Last storage need to maintain the images, files, etc. so maybe asynchronous is ok?
I could be going about this the wrong way but I want to make sure a website if hosted and one location goes down the data in the 2nd location will serve the webserver and mariadb still can operate and still store the data to the ceph cluster. As the website can be mostly static, the mariadb will be active active to maintain the database. But the images, files, etc. for the website cannot be maintain except through an active active structure correct? Maybe asynchronous may be a option too but not sure yet.
I am definitely open to asynchronous setup as well just want to make sure I am going with the right approach. Just curious how do I go about this. Is there a good read on l the setup? I am definitely open to both setup just to do testing too.