Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Increase Number of Replicas limit

Please increase the limit of "Number of Replicas" to 4.

 

Background:

We are planning to expand our PetaSAN-Cluster to at least 6 Nodes, 3 in every server room and one additional monitoring node in a third room. In order to prevent outage of  the storage service in case of powerloss in one room the ceph settings will be:

osd_pool_default_size = 4
osd_pool_default_min_size = 2

and a adjusted crush rule.

Just for curiosity, what happens if I change the settings by hand, leaving cluster settings "Number of Replicas" on "3"?

 

Regards,

Dennis

 

 

I found the place in the source:

vi /opt/petasan/services/web/templates/admin/configuration/cluster_settings.html

<!--Replicas-->
<div class="row">
<div class="col-md-4">
<div class="form-group">
<label id="lblReplicas"><i class=""></i> Number of Replicas</label>
<select class="form-control" name="replica_no" id="replica_no">
<option value="2" {% if form.replica_no=="2" %} selected="selected" {% endif %}>
2
</option>
<option value="3" {% if form.replica_no=="3" %} selected="selected" {% endif %}>
3
</option>
<option value="4" {% if form.replica_no=="4" %} selected="selected" {% endif %}>
4
</option>
</select>
</div>
</div>
</div>

Could you please commit it?

Regards, Dennis

Hi

You can definitely change the replica count by hand using  commands. PetaSAN does not store this (or similar) value external to Ceph.

I will take your changes and request they be included in next release.

Note that you do not have to create a fourth replica to achieve what you want, it is possible to use 3 or even 2 replicas but define a custom crush map which controls how these replicas are placed, so for example to place 1 replica in each room.

We do have plans further down to support crush map editing: ie have a visual editor to define your racks/rooms so that Ceph will place the replicas in a more intelligent/safer way,

 

Hi,

thanks for your response. I have read a lot that it`s dangerous to use min_size=1 (because there might be no other copy to compare).

Lets assume the second room has no power: If I split the 3 copy this would mean that one copy would be in one room and the other two are in a different room. If the second room is the one with the two copies, there will be only one copy left. With min_size=2, the system will be read_only. In other words filesystems will freeze. With min_size=1 filesystems will work, but any additional problem (bit flip, failed disk...) will lead to a total data loss or at least to data corruption.

My solution would be to invest more money and to split 4 replicas into chunks of 2. So every room would have two copies. With min_size=2 we would only run into a problem when one room is offline and a additional problem happens, but we would not loose data in this case.

Am I right or wrong?

Regards,

Dennis

Yes your solution looks good.

You will still need to customize the crush map so you do not have some PGs with 3 replicas stored in  3 hosts of the same room. Add a "room" bucket under thee default root bucket and above the host bucket, and add a rule which looks like:

min_size 2

max_size 4

step take default

step choose firstn 2 type room

step chooseleaf firstn 2 type host

step emit

or if you intend more than 3 rooms

min_size 2

max_size 4

step take default

step choose firstn 0 type room

step chooseleaf firstn 2 type host

step emit

Good luck