Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Iscsi target stops responding after huge power surge

Background:

Our datacenter got hit with a major power transient caused by snow/ice on a transformer. The result was our PetaSAN cluster lost power supplies, and over the following 24hrs a few osd's had to be replaced because they were flapping in and out of the array. This isnt a problem usually and the cluster is almost back to clean.

The problem:

The iSCSI targets for the previously existing disks come up, but after a few seconds they stop responding. Creating a new disk works just as expected without issues. The paths are created and are pingable, the target is scanable for a few seconds then just drops off. No response. No authentication is used on these disks.

The Major issue:

We took the backup cluster off-line for full rebuild and upgrade to the new 2.3.1 version (a 3 version jump) The backup is gone, we can not get the old osd's to create a cluster that isnt new/clean.

The upside:

We can see that the iscsi disks still have the data on the main cluster. Just cant keep the target responsive.

 

Looking for suggestions as we need that data.

Our datacenter got hit with a major power transient caused by snow/ice on a transformer. The result was our PetaSAN cluster lost power supplies, and over the following 24hrs a few osd's had to be replaced because they were flapping in and out of the array. This isnt a problem usually and the cluster is almost back to clean.

What is output of ceph status now, is it active / clean or does it have errors ?

Were the disks replaced when the cluster was in a clean state or did it have errors  ? how many disks were replaced on how many hosts ?

 The iSCSI targets for the previously existing disks come up, but after a few seconds they stop responding.

If there is a problem at the Ceph layer where it cannot find stored data, the iSCSI layer for those disks will not function. It is important to see if there are any errors in Ceph and try to fix them. If there are pg errors, do a pg query to get error detail.

We took the backup cluster off-line for full rebuild and upgrade to the new 2.3.1 version (a 3 version jump) The backup is gone, we can not get the old osd's to create a cluster that isnt new/clean.

Is this the same cluster as the earlier one or a different one ?

Full rebuild, you mean upgrade the full cluster ?

backup is gone, you mean after upgrade the cluster cannot reach active clean status ?  if you had done it in 3 steps, did the cluster reach active/clean between the steps ? what is the error now ?

We can see that the iscsi disks still have the data on the main cluster. Just cant keep the target responsive.

How do you know this ? is Ceph not reporting errors ?

Generally i would not worry about iSCSI being responsive as much as Ceph layer not showing errors, if the Ceph layer is clean the iSCSI will be responsive again.

Ceph had errors that could not bebfixed until the damaged osd's were replaced. We replaced one osd on two nodes of a four node cluster, these were changed at different times to allow the cluster to rebuild before changing the other. Ceph is reporting errors in the log, but these are related to the rebuild process.

 

We have two separate clusters, one (the backup) an rsync of the other (the main). The backup cluster was taken off-line to perform hardware and software upgrades, which have almost been completed when this happened. All of the data on the backup cluster osd's has been wiped as we bring the nodes up as part of a clean install which we opted for due to the number of revisions we would have to step through. Hence, our backup is gone.

 

The ceph layer comes to 3 inactive pgs and 13 degraded and has been there for 60hrs so far. In this state we can create and write to a new iscsi disk no problems. It is only the disks prior to this failure that are giving troubles.

 

I know the data is mostly intact as we can pull a directory from the iscsi mount on a linux box. We get just enough time to do this then we get nothing more until we restart the disk in the web console. If Ceph was the problem then the iscsi target would not be available right from the start.

 

This cluster is still running 2.0.0. As it is our production cluster, it doesnt get updated often. We have a testing cluster that we test new versions of petasan on, no real data just a bunch of files that we generated with dd and the random number generator. We were planning to rebuild the backup cluster with new hardware and software, then rotate the clusters after copying the data to the backup. Then repeat the process with the original cluster after a week or so.

Is there a way to locally mount the cluster disks bypassing the iscsi layer? This may allow me to scp the data to another storage system and then just scrap this cluster.

The problem is with the 3 inactive PGs,  unless you have some OSDs that are not up then most probably those pgs stored some or all their replicas on the deleted OSDs. The rbd ceph images which are exposed via iSCSI have their data / disk sectors mapped to all pgs, so any io to the sectors corresponding to the 3 inactive pgs cause a stall.  Try to do a ceph pg query and see if you can get any more info on these pgs, you can also try lowering the pool min_size to 1 or to m if using EC pools.

you can bypass the iSCSI layer and map the image directly to a block device  via

rbd map image-xxxx --cluster xxx

then mount it

if the iSCSI disks are started, them go a node serving a path of the image, the block device should already be mapped to a block device, you can find this via

rbd showmapped --cluster xxx

UPDATE:

the 3 pg's in question show no objects and two have no journal or logs. I am not aware of any pg that doesn't get used, but these have nothing in them. the two OSD's that were fully replaced did not (according to the pg query) have these pg's on them. so not sure what is going on with these pg's

we found a several OSD's that the port they were in was not working properly, but a quick move to another port and the OSD's are now back up and working properly.

still can not keep an iSCSI target responsive but at least the cluster is back to normal ( well as close as I can get).

I will update this post once I can get the cluster to let me copy the data off.

UPDATE: ITS WORKING!

its been two months of hell, but I finally was able to convince ceph and rados to clone out and restore the failed RBD image. This is not exactly intuitive, is very time consuming, is cluster and image specific and thus I am not posting the process I took. I will help anyone whom is facing the same/similar issue with any pointers I have gleaned, but as you can imagine nothing except a working full backup can truly save your data. We ended up loosing close to 40 OSD's ( I have lost count of the actual number and we have sent some of the replacement OSD's in for RMA processing as dead shortly after install) over this, each in some sort of failure mode that cascaded as we replaced failed OSD's and allowed the cluster to rebuild.

What I can tell is that there was a lot of re-running the same commands with slight modifications to get the missing PG's to build from the existing parts and once that was accomplished a snapshot image was extracted and then restored over top of the damaged image. Once that was done, rados started working and access to the image was restored. Which prompted a whole new issue as this was a VG for a Xenserver LVM based storage repo. But that is another issue and story.

Side Note:  This has forced me to learn way more about how PetaSAN and its components actually work and how to force specific changes to cause specific actions. I have to thank the Admin staff for the pointers, they confirmed what I already knew and focused my attention towards the root of the issue. AND a big thanks to the DEV team whom make this a simple click and setup system, I tried building a testing (something I could destroy and test things on before I hit the main cluster) node from scratch and though it worked, it never acted the same/behaved as nice as the PetaSAN system.

 

I will not be monitoring this topic anymore, If youwish to ask me anything, please PM me.

Thanks for the update.

note you can always buy support from us.

I could, but then I wouldnt learn anything 😉

If I had paid for support I do not believe that the issue would have been solved any quicker since I had to swap out failed drives and wait for rebuilds that caused more drives to start failing. Its just a slow methodical process that must be followed.

Education always costs, whether its money, time or pride it always costs and usually comes with a dose of pain to ensure that the lesson is firmly learned.

In my case, my pride and my wallet got hit since the company will have a very hard time getting rid of me :p