Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

OSD which no longer exist are shown in Dashboard

After replacing a node and removing its osds manually with ceph commands, this osds are still shown in PetaSAN Dashboard.

Howto remove them?

Regards,

Dennis

The best thing was to remove them via the web ui from the node disk list rather than manual commands. I recommend you still try to delete them from the web ui if you can, it may may not work based on what manual commands you had previously used.  if it does not work via the ui, then you should still be able to complete the manual deletion via command lines, just make sure you do all the required Ceph commands for deleting an OSD. You could see how we do it in the UI by looking at the following 2 functions found in

/usr/lib/python2.7/dist-packages/PetaSAN/core/ceph/ceph_osd.py:

def delete_osd_from_crush_map(osd_id):

def delete_osd(osd_id,disk_name):

we call the first one first then the second.

You can also find the Ceph documentation on

http://docs.ceph.com/docs/master/rados/operations/add-or-rm-osds/

Yes, I know the things in there, but how to remove the osd from e.g. the chart 'OSD commit latency'. Even in the 'last hour' tap the osds are present while no longer exist in 'ceph cluster osd status' for example.

Regards,

Dennis

Ok i thought you saw the old OSDs in the disk list and the up/down OSD in the dashboard, these are based on the current OSDs in the cluster.

For the historic charts, these are stored in database files as historic records, its a time based database called whisper that stores metrics in different time scales. These files are not deleted if the metric is no longer available as it is possible the metric comes back or even to keep it for historic data. It is possible however that the metrics will no longer be charted/displayed after sufficient time by themselves. If after a day you still see the deleted OSDs with empty values and need to force remove them, we need to stop the stats services, delete/move the database files associated with the OSD then restart the stats services:

First we need to know on which of management node (first 3 nodes) is the stats running.

You can either do it from your browser, inspect the html source element of the historic chart and look at the ip of the machine inside the iframe tag or you can run the following command on all nodes

systemctl status carbon-cache

the service should be running on 1 of them.

 

To delete the database file for the osd, on the active stats machine:

stop the stats sercives

systemctl stop carbon-cache

systemctl stop apache2

systemctl stop collectd

systemctl stop grafana-server

backup the whisper file for the OSD somewhere else then delete it

/opt/petasan/config/shared/graphite/whisper/PetaSAN/storage-node/ceph-CLUSTER_NAME/osd-NUMBER

 

re-start the stats services

systemctl start carbon-cache

systemctl start apache2

systemctl start collectd

systemctl start grafana-server

 

On an related note, instead of deleting OSDs:  in Ceph you can remove OSDs from running or failed nodes and place them in other storage nodes and things will work as before. PetaSAN will dynamically pick the OSD new host location. You should however do so as early as possible, since Ceph will start its recovery after 5 minutes of detecting the disks failing and it does not make sense to insert the old disks after Ceph has completed creating replicas on all the data.

Works like a charm!

Regards,

Dennis