Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Admin Tasks

Hello,

first of all I wanna say thank you for the great and fast support here. This is amazing!
I'm in the middle of testing PetaSAN as our new environment.

Now i have a few question that I would call "Daily Admin Tasks".

1. Exchange Journal drive

I thought I read something, that it is possible to do this while all OSDs are still up. But now I can' find it. The steps are more or less:

  • activate Cluster Maintenance
  • stop OSD service
  • delete OSD and Journal
  • exchange Journal drive
  • add OSD
  • disable Cluster Maintenance

Is there a different way of doing it? In a 3 Node cluster this could be a little bit risky to do this.
Another question is, PetaSAN doesn't have a monitoring (SMART etc) that checks the Journal drive, so that you can change the drive before it fails?

2. Change ODS

Sometimes there is the need to change a running OSD which is e.g. bad performing or is beginning to fail.
Is there a way to identify this drive easily?

I guess I could install some controller tools to let the drive blink. Is there a best practice for that? (Label all drives with the SN or the OSD-#?)

 

These are the tasks that came in my mind right now. Maybe that would be worth a section for the Admin Guide?

Thanks for your help.

We have an operations guide and a performance tuning guide in the works.

For OSDs: PetaSAN will not allow you from the ui to delete a running/up OSD. If the OSD is down, or you use the cli to stop it, then you can delete an OSD. Ceph does not have the concept of changing/swapping an OSD, better to think of it as shrinking the cluster (when you delete an OSD) or expand the cluster when you add an OSD.

Change Journal: I think you refer to some methods to change a running journal while keeping its OSDs, it involves making sure all writes in the Journal are first flushed to the OSDs before replacement. Many Ceph users prefer to just delete such OSDs and re-add them with the new journal and let Ceph do its thing and backfill them, this is probably safer that dealing with flushing,  but you must be using 3x replication and make sure you start off with an active/clean cluster.

The new SMART support in the ui + emailing of alarms should pick up all disks including journals.