Forums - PetaSAN

ForumGeneral DiscussionDifficulty recovering one of thre …
You need to log in to create posts and topics. Login · Register
Difficulty recovering one of three servers after power outage

Pages: 1 2 3 4 5 » Last

admin
2,969 Posts

October 25, 2018, 4:29 pm
Quote from admin on October 25, 2018, 4:29 pm

Hi,

can you reply to my prev post please

Hi,

can you reply to my prev post please

#11

southcoast
50 Posts

October 25, 2018, 5:16 pm
Quote from southcoast on October 25, 2018, 5:16 pm
I thought I had:

root@Peta-San-03:~# ceph osd status --cluster=Cameron-SAN-01
+----+------+------+-------+--------+---------+--------+---------+-------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+------+------+-------+--------+---------+--------+---------+-------+
+----+------+------+-------+--------+---------+--------+---------+-------+
root@Peta-San-03:~#

For the record, I am running version 2.1.0 software and the documentation offers only v2.0 support. Is there documentation available for v2.1.0? The documentation on hand does not offer help on creating pools and does not specify the available CLI commands. I am not sure what the CLI commands are I would use for these status requests and I have to look at other forum posts for sample output from other users needing assistance.

Please advise

Thank-you

I thought I had:

root@Peta-San-03:~# ceph osd status --cluster=Cameron-SAN-01
+----+------+------+-------+--------+---------+--------+---------+-------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+------+------+-------+--------+---------+--------+---------+-------+
+----+------+------+-------+--------+---------+--------+---------+-------+
root@Peta-San-03:~#

For the record, I am running version 2.1.0 software and the documentation offers only v2.0 support. Is there documentation available for v2.1.0? The documentation on hand does not offer help on creating pools and does not specify the available CLI commands. I am not sure what the CLI commands are I would use for these status requests and I have to look at other forum posts for sample output from other users needing assistance.

Please advise

Thank-you

#12

admin
2,969 Posts

October 25, 2018, 5:38 pm
Quote from admin on October 25, 2018, 5:38 pm
What i wanted in my prev post is that node 2 should have a Consul server directory and not a client directory and asked if you had this server as node 4 or greater in a previous install ? or the disk it is booting from came from such a previous install ? it is important to know this since even if we did correct the Consul service to start, we will face other issues if this server is booting from a wrong disk. I cannot think of other reasons why it will have Consul configured this way.

As for the other 2 "good" nodes : i understand this is a new installation and you have not added any OSDs. If so then you will not be able to add iSCSI disks on the default pool + the pool will not be active, this is also true for any pool you add. You need OSD added to create storage for your pools.

You could go ahead and add OSDs on these 2 nodes to activate storage and then figure out what caused node 2 to boot using a wrong configuration. If this is a fresh install, maybe i would re-install from scratch.

You are correct the docs are for 2.0, the 2.1 guide is waiting for review (+ we also have a new operation guide) but i will try to get it out by next week.

What i wanted in my prev post is that node 2 should have a Consul server directory and not a client directory and asked if you had this server as node 4 or greater in a previous install ? or the disk it is booting from came from such a previous install ? it is important to know this since even if we did correct the Consul service to start, we will face other issues if this server is booting from a wrong disk. I cannot think of other reasons why it will have Consul configured this way.

As for the other 2 "good" nodes : i understand this is a new installation and you have not added any OSDs. If so then you will not be able to add iSCSI disks on the default pool + the pool will not be active, this is also true for any pool you add. You need OSD added to create storage for your pools.

You could go ahead and add OSDs on these 2 nodes to activate storage and then figure out what caused node 2 to boot using a wrong configuration. If this is a fresh install, maybe i would re-install from scratch.

You are correct the docs are for 2.0, the 2.1 guide is waiting for review (+ we also have a new operation guide) but i will try to get it out by next week.

#13

southcoast
50 Posts

October 25, 2018, 6:02 pm
Quote from southcoast on October 25, 2018, 6:02 pm
This is a new setup on all three Dell Poweredge 1950 servers. To all three Dells, this is a new installation and any legacy software was wiped in the installation of Petasan. Each server is outfitted with a pair of 146gb drives. I thought what was needed for basic operation was created in the original installation since each did signal successful completion and I was to access the server on port 5000 from final configuration steps. What are the necessary keystrokes to add the OSD to each server, or, is this done on one device then propagated to the other two as I observed with the pool configuration steps.

I am to be onsite later this afternoon, so, as a last resort I could just execute a installation with the software then.

Please advise

Thank-you

This is a new setup on all three Dell Poweredge 1950 servers. To all three Dells, this is a new installation and any legacy software was wiped in the installation of Petasan. Each server is outfitted with a pair of 146gb drives. I thought what was needed for basic operation was created in the original installation since each did signal successful completion and I was to access the server on port 5000 from final configuration steps. What are the necessary keystrokes to add the OSD to each server, or, is this done on one device then propagated to the other two as I observed with the pool configuration steps.

I am to be onsite later this afternoon, so, as a last resort I could just execute a installation with the software then.

Please advise

Thank-you

#14

southcoast
50 Posts

October 25, 2018, 6:11 pm
Quote from southcoast on October 25, 2018, 6:11 pm
To follow-up, taking a command from a thread in another forum, I executed the following and although my #2 node shows up in the dashboard, it appears to still be unavailable:

root@Peta-San-03:~# ceph status --cluster=Cameron-SAN-01
cluster:
id:     20448d28-7acc-4289-856f-a91b6ac26c6e
health: HEALTH_WARN
Reduced data availability: 306 pgs inactive
1/3 mons down, quorum Peta-SAN-01,Peta-San-03

services:
mon: 3 daemons, quorum Peta-SAN-01,Peta-San-03, out of quorum: Peta-San-02
mgr: Peta-SAN-01(active)
osd: 0 osds: 0 up, 0 in

data:
pools:   2 pools, 306 pgs
objects: 0 objects, 0 bytes
usage:   0 kB used, 0 kB / 0 kB avail
pgs:     100.000% pgs unknown
306 unknown

root@Peta-San-03:~#

To follow-up, taking a command from a thread in another forum, I executed the following and although my #2 node shows up in the dashboard, it appears to still be unavailable:

root@Peta-San-03:~# ceph status --cluster=Cameron-SAN-01
cluster:
id:     20448d28-7acc-4289-856f-a91b6ac26c6e
health: HEALTH_WARN
Reduced data availability: 306 pgs inactive
1/3 mons down, quorum Peta-SAN-01,Peta-San-03

services:
mon: 3 daemons, quorum Peta-SAN-01,Peta-San-03, out of quorum: Peta-San-02
mgr: Peta-SAN-01(active)
osd: 0 osds: 0 up, 0 in

data:
pools:   2 pools, 306 pgs
objects: 0 objects, 0 bytes
usage:   0 kB used, 0 kB / 0 kB avail
pgs:     100.000% pgs unknown
306 unknown

root@Peta-San-03:~#

#15

admin
2,969 Posts

October 25, 2018, 6:27 pm
Quote from admin on October 25, 2018, 6:27 pm
From the Node List -> Physical Disk List page you can add disks as OSDs and journals. If all disks are the same add all as OSDs, if you have a few faster devices, make the faster devices journals, then add the slower ones as OSDs, this is explained in the admin guide.

The wrong install of Consul on node 2 is an issue: i can only think this is due to an old install. Yes the new install does wipe out an older installation...however it could it be that after the outage the boot disk failed and node 2 is booting from an older disk ? Did the machine have disks from previous installs that were not the ones formatted in the new install ? The fact that it was working before the outage means it was configured with a server directory, there is nothing else i can think of that will make it now show a client directory instead. Would it be possible for you to check this ?

Note you can also just install node 2 and do a "Replace Management Node".

From the Node List -> Physical Disk List page you can add disks as OSDs and journals. If all disks are the same add all as OSDs, if you have a few faster devices, make the faster devices journals, then add the slower ones as OSDs, this is explained in the admin guide.

The wrong install of Consul on node 2 is an issue: i can only think this is due to an old install. Yes the new install does wipe out an older installation...however it could it be that after the outage the boot disk failed and node 2 is booting from an older disk ? Did the machine have disks from previous installs that were not the ones formatted in the new install ? The fact that it was working before the outage means it was configured with a server directory, there is nothing else i can think of that will make it now show a client directory instead. Would it be possible for you to check this ?

Note you can also just install node 2 and do a "Replace Management Node".

Last edited on October 25, 2018, 6:29 pm by admin · #16

southcoast
50 Posts

October 25, 2018, 6:54 pm
Quote from southcoast on October 25, 2018, 6:54 pm
I can only guess what happened during the power outage. The fact of the matter is construction was going on in an adjacent building for a two week period during evening hours and outages in the evening could have been frequent enough to have corrupted the configurations. I can only guess.

If on my 2nd node I go to the node list, then select the "physical disk list" icon, I am given a single entry but no actions offered to update. If I do the same action on the 1st and 3rd servers, I am only presented the spinning "wait" icon, but, no disk list is ever presented.

The dashboard will not let me add iSCSI since there is a drop down prompting for a pool and since the pool I created never goes active, never appears as an available pool.

I can only guess what happened during the power outage. The fact of the matter is construction was going on in an adjacent building for a two week period during evening hours and outages in the evening could have been frequent enough to have corrupted the configurations. I can only guess.

If on my 2nd node I go to the node list, then select the "physical disk list" icon, I am given a single entry but no actions offered to update. If I do the same action on the 1st and 3rd servers, I am only presented the spinning "wait" icon, but, no disk list is ever presented.

The dashboard will not let me add iSCSI since there is a drop down prompting for a pool and since the pool I created never goes active, never appears as an available pool.

#17

southcoast
50 Posts

October 25, 2018, 7:05 pm
Quote from southcoast on October 25, 2018, 7:05 pm
I execute a status command on all three servers and find the disk status on all three Poweredge servers the same as follows:

root@Peta-San-03:~# ceph-disk list
/dev/sda :
/dev/sda1 other, ext4, mounted on /boot
/dev/sda2 other, ext4, mounted on /
/dev/sda3 other, ext4, mounted on /var/lib/ceph
/dev/sda4 other, ext4, mounted on /opt/petasan/config
/dev/sr0 other, unknown
root@Peta-San-03:~#

I execute a status command on all three servers and find the disk status on all three Poweredge servers the same as follows:

root@Peta-San-03:~# ceph-disk list
/dev/sda :
/dev/sda1 other, ext4, mounted on /boot
/dev/sda2 other, ext4, mounted on /
/dev/sda3 other, ext4, mounted on /var/lib/ceph
/dev/sda4 other, ext4, mounted on /opt/petasan/config
/dev/sr0 other, unknown
root@Peta-San-03:~#

#18

admin
2,969 Posts

October 25, 2018, 7:39 pm
Quote from admin on October 25, 2018, 7:39 pm
Still i am not clear on:

Did node 2 have disks from previous installations ?

Do you have a way to know if node 2 is currently booting from the correct disk and not from an OS disk of a previous install ?

please help me so i can help you better 🙂

Still i am not clear on:

Did node 2 have disks from previous installations ?

Do you have a way to know if node 2 is currently booting from the correct disk and not from an OS disk of a previous install ?

please help me so i can help you better 🙂

#19

southcoast
50 Posts

October 25, 2018, 8:08 pm
Quote from southcoast on October 25, 2018, 8:08 pm
Hello,

1) Node 2 has the same pair of disks now as when I was given the server for Petsan installation.

2) There was previously a Linux installation, but, I had thought Petasan would have wiped that clean with the fresh install. I will verify this evening when I am onsite.

Thank-you

Hello,

1) Node 2 has the same pair of disks now as when I was given the server for Petsan installation.

2) There was previously a Linux installation, but, I had thought Petasan would have wiped that clean with the fresh install. I will verify this evening when I am onsite.

Thank-you

#20

Post Reply: Difficulty recovering one of three servers after power outage

Cancel

Pages: 1 2 3 4 5 » Last