Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Difficulty recovering one of three servers after power outage

Pages: 1 2 3 4 5 6 » Last

Great , let me know when you find out. We do wipe clean the disks we use:  the system disk + any disks you select to add as osds or journals,  else they are left.

Let me know what you find , if indeed it was booting from the wrong disk then either fix it to boot from correct one or if it was damaged then we would re-install node 2 on the working disk and deploy it with the option "Replace Management Node"

From the earlier post on adding OSDs from the ui: this was for the 2 "good" nodes, it will not work with the problem node. it is up to you either to add OSDs now on those nodes but i would prefer to wait until you fix the issue with node 2. Once you add OSDs, your pools will be active, then you will be able to add iSCSI disks.

I reinstalled the application and took the opportunity replace the two original drives with a 1tb and 500gb volume.  I gave the host the same host name with the “Replace Management Node” specifier.

I get a better response to a status command than before:

 

root@Peta-San-02:~# ceph status --cluster=Cameron-SAN-01

cluster:

id:     20448d28-7acc-4289-856f-a91b6ac26c6e

health: HEALTH_WARN

Reduced data availability: 306 pgs inactive

 

services:

mon: 3 daemons, quorum Peta-SAN-01,Peta-San-02,Peta-San-03

mgr: Peta-SAN-01(active), standbys: Peta-San-02

osd: 0 osds: 0 up, 0 in

 

data:

pools:   2 pools, 306 pgs

objects: 0 objects, 0 bytes

usage:   0 kB used, 0 kB / 0 kB avail

pgs:     100.000% pgs unknown

306 unknown

 

root@Peta-San-02:~#

 

Next, I created a pool which went into a state of checking, then inactive.

I tried to add an iSCSI disk, but I was refused with the alert:

Auto ip assignment is disabled due to some pools being inactive.

I seem to be back where I was before.

I have this morning issued a reload command to the other 2 servers in the configuration, but, the pool I configured is still inactive and no OSD's are showing in response to a status command. Is there a CLI command I can issue to manually create and start the needed OSD, or OSD's, to make the SAN work?

Did you check if node 2 was booting from a previous PetaSAN boot disk ?

To activate a pool, you need to add OSDs as per my prev replies.

Is there a rule which could be created to alleviate the inactive pools issue? I have been looking at the Ceph documentation pages, but, a lot of the commands there do not work on the Petasan servers in place now. There are recommended commands in the Ceph pages related to the inactive placement groups (pg's) I see in the status output:

root@Peta-SAN-01:/# ceph health --cluster=Cameron-SAN-01
HEALTH_WARN Reduced data availability: 306 pgs inactive
root@Peta-SAN-01:/#

Is in the Configuration-> CRUSH->Rules section a place to make needed adjustments for this problem?

Did you check if node 2 was booting from a previous PetaSAN boot disk ?

To activate a pool, you need to add OSDs as per my prev replies.

On node 2 I just outright replaced the disks and reinitialized.

Please advise the steps to add the OSDs.

After some groping around the Ceph pages, I managed to get add the OSD's. I created 3 of them but they are down. I cannot locate the valid command to start the OSD. What is that command, please?

see my prev replies on how to add osds

I have added my OSD's, thank-you. I created three of them.

The problem is starting them.

root@Peta-San-03:~# ceph osd status --cluster=Cameron-SAN-01
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| id | host |  used | avail | wr ops | wr data | rd ops | rd data |     state      |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| 0  |      |    0  |    0  |    0   |     0   |    0   |     0   | autoout,exists |
| 1  |      |    0  |    0  |    0   |     0   |    0   |     0   | autoout,exists |
| 2  |      |    0  |    0  |    0   |     0   |    0   |     0   | autoout,exists |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
root@Peta-San-03:~#

 

The original issue with consul members seems not to be a problem:

 

root@Peta-San-03:~# consul members
Node         Address             Status  Type    Build  Protocol  DC
Peta-SAN-01  10.250.252.11:8301  alive   server  0.7.3  2         petasan
Peta-San-02  10.250.252.12:8301  alive   server  0.7.3  2         petasan
Peta-San-03  10.250.252.13:8301  alive   server  0.7.3  2         petasan
root@Peta-San-03:~#

 

I cannot locate the command to start the OSD's.

Pages: 1 2 3 4 5 6 » Last