Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Migrate from Proxmox CEPH to PetaSAN!

Hi there!

We about to migrate our 6 servers Proxmox CEPH to PetaSAN!

Each server is an IBM System x3100 M4, with 16 RAM each one.

The HDD is a mix of SATA with 5400 RPM and 7200 RPM.

Additionally, there 's one SSD acting as journal device.

We use 2 facilities, and there's 3 servers on each of those facilities.

Between this facilities, we have an optical cable which provide 1 GB speed! =( I know! This is awful!....

We experience a lot slow down when one of server crash and when Ceph need resync something.

I already tried lower the weight of low speed HDD, but no effect!

Now question is: with PetaSAN we can achive more speed?
Thanks

Hi,

Since both use Ceph, i doubt there will be significant differences. In order to get decent performance you need to have good hardware, the low speed during recovery is an indication the cluster is under-powered. There are some config parameters that limit the recovery load, which may help in your case, but is not a real solution.

Good luck..

Quote from admin on December 6, 2018, 8:05 pm

Hi,

Since both use Ceph, i doubt there will be significant differences. In order to get decent performance you need to have good hardware, the low speed during recovery is an indication the cluster is under-powered. There are some config parameters that limit the recovery load, which may help in your case, but is not a real solution.

Good luck..

What is your suggestion? How parameters could be change to reduce the recovery workload???

Maybe osd recovery max active ??

Just to show: ceph -s

ceph -s
cluster:
id: e67534b4-0a66-48db-ad6f-aa0868e962d8
health: HEALTH_WARN
348369/2467839 objects misplaced (14.116%)
Degraded data redundancy: 396/2467839 objects degraded (0.016%), 85 pgs degraded

services:
mon: 5 daemons, quorum pve-ceph01,pve-ceph02,pve-ceph03,pve-ceph04,pve-ceph05
mgr: pve-ceph05(active), standbys: pve-ceph01, pve-ceph02, pve-ceph03, pve-ceph04
osd: 21 osds: 21 up, 21 in; 180 remapped pgs

data:
pools: 1 pools, 512 pgs
objects: 822.61k objects, 3.02TiB
usage: 9.26TiB used, 53.5TiB / 62.8TiB avail
pgs: 396/2467839 objects degraded (0.016%)
348369/2467839 objects misplaced (14.116%)
283 active+clean
142 active+remapped+backfill_wait
48 active+recovery_wait+degraded
37 active+recovery_wait+degraded+remapped
1 active+recovery_wait
1 active+remapped+backfilling

io:
client: 10.6KiB/s rd, 5.61KiB/s wr, 0op/s rd, 1op/s wr
recovery: 330KiB/s, 0objects/s

I will appreciated that, if you could help!

Thanks

I tried:

ceph tell osd.* injectargs '--osd-max-backfills 1'
ceph tell osd.* injectargs '--osd-max-recovery-threads 1'
ceph tell osd.* injectargs '--osd-recovery-op-priority 1'
ceph tell osd.* injectargs '--osd-client-op-priority 63'
ceph tell osd.* injectargs '--osd-recovery-max-active 1'
ceph osd set nodeep-scrub

But seems no effect. Any suggestion?

Try these settings in conf file

osd_max_backfills = 1
osd_recovery_sleep = 1
osd_recovery_max_active = 1
osd_recovery_priority = 1
osd_recovery_op_priority = 1
osd_client_op_priority = 63
osd_scrub_during_recovery = false

+ use CFQ io scheduler for hdds

 

Yep... I am trying this too:
ceph tell osd.* injectargs '--osd-max-scrubs 1'
ceph tell osd.* injectargs '--osd-scrub-max-interval 4838400'
ceph tell osd.* injectargs '--osd-scrub-min-interval 2419200'
ceph tell osd.* injectargs '--osd-deep-scrub-interval 2419200'
ceph tell osd.* injectargs '--osd-scrub-interval-randomize-ratio 1.0'
ceph tell osd.* injectargs '--osd-disk-thread-ioprio-class idle'
ceph tell osd.* injectargs '--osd-disk-thread-ioprio-priority 0'
ceph tell osd.* injectargs '--osd-scrub-chunk-max 1'
ceph tell osd.* injectargs '--osd-scrub-chunk-min 1'
ceph tell osd.* injectargs '--osd-deep-scrub-stride 1048576'
ceph tell osd.* injectargs '--osd-scrub-load-threshold 5.0'
ceph tell osd.* injectargs '--osd-scrub-sleep 0.1'