Forums - PetaSAN

ForumGeneral DiscussionDifficulty recovering one of thre …
You need to log in to create posts and topics. Login · Register
Difficulty recovering one of three servers after power outage

Pages: 1 2 3 4 5 6 » Last

admin
2,969 Posts

October 25, 2018, 8:26 pm
Quote from admin on October 25, 2018, 8:26 pm
Great , let me know when you find out. We do wipe clean the disks we use: the system disk + any disks you select to add as osds or journals, else they are left.

Let me know what you find , if indeed it was booting from the wrong disk then either fix it to boot from correct one or if it was damaged then we would re-install node 2 on the working disk and deploy it with the option "Replace Management Node"

From the earlier post on adding OSDs from the ui: this was for the 2 "good" nodes, it will not work with the problem node. it is up to you either to add OSDs now on those nodes but i would prefer to wait until you fix the issue with node 2. Once you add OSDs, your pools will be active, then you will be able to add iSCSI disks.

Great , let me know when you find out. We do wipe clean the disks we use: the system disk + any disks you select to add as osds or journals, else they are left.

Let me know what you find , if indeed it was booting from the wrong disk then either fix it to boot from correct one or if it was damaged then we would re-install node 2 on the working disk and deploy it with the option "Replace Management Node"

From the earlier post on adding OSDs from the ui: this was for the 2 "good" nodes, it will not work with the problem node. it is up to you either to add OSDs now on those nodes but i would prefer to wait until you fix the issue with node 2. Once you add OSDs, your pools will be active, then you will be able to add iSCSI disks.

Last edited on October 25, 2018, 8:26 pm by admin · #21

southcoast
50 Posts

October 26, 2018, 2:41 am
Quote from southcoast on October 26, 2018, 2:41 am
I reinstalled the application and took the opportunity replace the two original drives with a 1tb and 500gb volume. I gave the host the same host name with the “Replace Management Node” specifier.

I get a better response to a status command than before:

root@Peta-San-02:~# ceph status --cluster=Cameron-SAN-01

cluster:

id:     20448d28-7acc-4289-856f-a91b6ac26c6e

health: HEALTH_WARN

Reduced data availability: 306 pgs inactive

services:

mon: 3 daemons, quorum Peta-SAN-01,Peta-San-02,Peta-San-03

mgr: Peta-SAN-01(active), standbys: Peta-San-02

osd: 0 osds: 0 up, 0 in

data:

pools:   2 pools, 306 pgs

objects: 0 objects, 0 bytes

usage:   0 kB used, 0 kB / 0 kB avail

pgs:     100.000% pgs unknown

306 unknown

root@Peta-San-02:~#

Next, I created a pool which went into a state of checking, then inactive.

I tried to add an iSCSI disk, but I was refused with the alert:

Auto ip assignment is disabled due to some pools being inactive.

I seem to be back where I was before.

I reinstalled the application and took the opportunity replace the two original drives with a 1tb and 500gb volume. I gave the host the same host name with the “Replace Management Node” specifier.

I get a better response to a status command than before:

root@Peta-San-02:~# ceph status --cluster=Cameron-SAN-01

cluster:

id:     20448d28-7acc-4289-856f-a91b6ac26c6e

health: HEALTH_WARN

Reduced data availability: 306 pgs inactive

services:

mon: 3 daemons, quorum Peta-SAN-01,Peta-San-02,Peta-San-03

mgr: Peta-SAN-01(active), standbys: Peta-San-02

osd: 0 osds: 0 up, 0 in

data:

pools:   2 pools, 306 pgs

objects: 0 objects, 0 bytes

usage:   0 kB used, 0 kB / 0 kB avail

pgs:     100.000% pgs unknown

306 unknown

root@Peta-San-02:~#

Next, I created a pool which went into a state of checking, then inactive.

I tried to add an iSCSI disk, but I was refused with the alert:

Auto ip assignment is disabled due to some pools being inactive.

I seem to be back where I was before.

#22

southcoast
50 Posts

October 26, 2018, 5:04 pm
Quote from southcoast on October 26, 2018, 5:04 pm
I have this morning issued a reload command to the other 2 servers in the configuration, but, the pool I configured is still inactive and no OSD's are showing in response to a status command. Is there a CLI command I can issue to manually create and start the needed OSD, or OSD's, to make the SAN work?

I have this morning issued a reload command to the other 2 servers in the configuration, but, the pool I configured is still inactive and no OSD's are showing in response to a status command. Is there a CLI command I can issue to manually create and start the needed OSD, or OSD's, to make the SAN work?

#23

admin
2,969 Posts

October 26, 2018, 7:53 pm
Quote from admin on October 26, 2018, 7:53 pm
Did you check if node 2 was booting from a previous PetaSAN boot disk ?

To activate a pool, you need to add OSDs as per my prev replies.

Did you check if node 2 was booting from a previous PetaSAN boot disk ?

To activate a pool, you need to add OSDs as per my prev replies.

#24

southcoast
50 Posts

October 26, 2018, 7:55 pm
Quote from southcoast on October 26, 2018, 7:55 pm
Is there a rule which could be created to alleviate the inactive pools issue? I have been looking at the Ceph documentation pages, but, a lot of the commands there do not work on the Petasan servers in place now. There are recommended commands in the Ceph pages related to the inactive placement groups (pg's) I see in the status output:

root@Peta-SAN-01:/# ceph health --cluster=Cameron-SAN-01
HEALTH_WARN Reduced data availability: 306 pgs inactive
root@Peta-SAN-01:/#

Is in the Configuration-> CRUSH->Rules section a place to make needed adjustments for this problem?

Is there a rule which could be created to alleviate the inactive pools issue? I have been looking at the Ceph documentation pages, but, a lot of the commands there do not work on the Petasan servers in place now. There are recommended commands in the Ceph pages related to the inactive placement groups (pg's) I see in the status output:

root@Peta-SAN-01:/# ceph health --cluster=Cameron-SAN-01
HEALTH_WARN Reduced data availability: 306 pgs inactive
root@Peta-SAN-01:/#

Is in the Configuration-> CRUSH->Rules section a place to make needed adjustments for this problem?

#25

admin
2,969 Posts

October 26, 2018, 8:15 pm
Quote from admin on October 26, 2018, 8:15 pm
Did you check if node 2 was booting from a previous PetaSAN boot disk ?

To activate a pool, you need to add OSDs as per my prev replies.

Did you check if node 2 was booting from a previous PetaSAN boot disk ?

To activate a pool, you need to add OSDs as per my prev replies.

#26

southcoast
50 Posts

October 27, 2018, 2:02 am
Quote from southcoast on October 27, 2018, 2:02 am
On node 2 I just outright replaced the disks and reinitialized.

Please advise the steps to add the OSDs.

On node 2 I just outright replaced the disks and reinitialized.

Please advise the steps to add the OSDs.

#27

southcoast
50 Posts

October 27, 2018, 5:38 pm
Quote from southcoast on October 27, 2018, 5:38 pm
After some groping around the Ceph pages, I managed to get add the OSD's. I created 3 of them but they are down. I cannot locate the valid command to start the OSD. What is that command, please?

After some groping around the Ceph pages, I managed to get add the OSD's. I created 3 of them but they are down. I cannot locate the valid command to start the OSD. What is that command, please?

#28

admin
2,969 Posts

October 28, 2018, 3:07 pm
Quote from admin on October 28, 2018, 3:07 pm
see my prev replies on how to add osds

see my prev replies on how to add osds

#29

southcoast
50 Posts

October 28, 2018, 5:58 pm
Quote from southcoast on October 28, 2018, 5:58 pm
I have added my OSD's, thank-you. I created three of them.

The problem is starting them.

root@Peta-San-03:~# ceph osd status --cluster=Cameron-SAN-01
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data |     state      |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| 0 |      |    0 |    0 |    0   |     0   |    0   |     0   | autoout,exists |
| 1 |      |    0 |    0 |    0   |     0   |    0   |     0   | autoout,exists |
| 2 |      |    0 |    0 |    0   |     0   |    0   |     0   | autoout,exists |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
root@Peta-San-03:~#

The original issue with consul members seems not to be a problem:

root@Peta-San-03:~# consul members
Node         Address             Status Type    Build Protocol DC
Peta-SAN-01 10.250.252.11:8301 alive   server 0.7.3 2         petasan
Peta-San-02 10.250.252.12:8301 alive   server 0.7.3 2         petasan
Peta-San-03 10.250.252.13:8301 alive   server 0.7.3 2         petasan
root@Peta-San-03:~#

I cannot locate the command to start the OSD's.

I have added my OSD's, thank-you. I created three of them.

The problem is starting them.

root@Peta-San-03:~# ceph osd status --cluster=Cameron-SAN-01
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data |     state      |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| 0 |      |    0 |    0 |    0   |     0   |    0   |     0   | autoout,exists |
| 1 |      |    0 |    0 |    0   |     0   |    0   |     0   | autoout,exists |
| 2 |      |    0 |    0 |    0   |     0   |    0   |     0   | autoout,exists |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
root@Peta-San-03:~#

The original issue with consul members seems not to be a problem:

root@Peta-San-03:~# consul members
Node         Address             Status Type    Build Protocol DC
Peta-SAN-01 10.250.252.11:8301 alive   server 0.7.3 2         petasan
Peta-San-02 10.250.252.12:8301 alive   server 0.7.3 2         petasan
Peta-San-03 10.250.252.13:8301 alive   server 0.7.3 2         petasan
root@Peta-San-03:~#

I cannot locate the command to start the OSD's.

#30

Post Reply: Difficulty recovering one of three servers after power outage

Cancel

Pages: 1 2 3 4 5 6 » Last