Difficulty recovering one of three servers after power outage
admin
2,930 Posts
October 25, 2018, 8:26 pmQuote from admin on October 25, 2018, 8:26 pmGreat , let me know when you find out. We do wipe clean the disks we use: the system disk + any disks you select to add as osds or journals, else they are left.
Let me know what you find , if indeed it was booting from the wrong disk then either fix it to boot from correct one or if it was damaged then we would re-install node 2 on the working disk and deploy it with the option "Replace Management Node"
From the earlier post on adding OSDs from the ui: this was for the 2 "good" nodes, it will not work with the problem node. it is up to you either to add OSDs now on those nodes but i would prefer to wait until you fix the issue with node 2. Once you add OSDs, your pools will be active, then you will be able to add iSCSI disks.
Great , let me know when you find out. We do wipe clean the disks we use: the system disk + any disks you select to add as osds or journals, else they are left.
Let me know what you find , if indeed it was booting from the wrong disk then either fix it to boot from correct one or if it was damaged then we would re-install node 2 on the working disk and deploy it with the option "Replace Management Node"
From the earlier post on adding OSDs from the ui: this was for the 2 "good" nodes, it will not work with the problem node. it is up to you either to add OSDs now on those nodes but i would prefer to wait until you fix the issue with node 2. Once you add OSDs, your pools will be active, then you will be able to add iSCSI disks.
Last edited on October 25, 2018, 8:26 pm by admin · #21
southcoast
50 Posts
October 26, 2018, 2:41 amQuote from southcoast on October 26, 2018, 2:41 amI reinstalled the application and took the opportunity replace the two original drives with a 1tb and 500gb volume. I gave the host the same host name with the “Replace Management Node” specifier.
I get a better response to a status command than before:
root@Peta-San-02:~# ceph status --cluster=Cameron-SAN-01
cluster:
id: 20448d28-7acc-4289-856f-a91b6ac26c6e
health: HEALTH_WARN
Reduced data availability: 306 pgs inactive
services:
mon: 3 daemons, quorum Peta-SAN-01,Peta-San-02,Peta-San-03
mgr: Peta-SAN-01(active), standbys: Peta-San-02
osd: 0 osds: 0 up, 0 in
data:
pools: 2 pools, 306 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs: 100.000% pgs unknown
306 unknown
root@Peta-San-02:~#
Next, I created a pool which went into a state of checking, then inactive.
I tried to add an iSCSI disk, but I was refused with the alert:
Auto ip assignment is disabled due to some pools being inactive.
I seem to be back where I was before.
I reinstalled the application and took the opportunity replace the two original drives with a 1tb and 500gb volume. I gave the host the same host name with the “Replace Management Node” specifier.
I get a better response to a status command than before:
root@Peta-San-02:~# ceph status --cluster=Cameron-SAN-01
cluster:
id: 20448d28-7acc-4289-856f-a91b6ac26c6e
health: HEALTH_WARN
Reduced data availability: 306 pgs inactive
services:
mon: 3 daemons, quorum Peta-SAN-01,Peta-San-02,Peta-San-03
mgr: Peta-SAN-01(active), standbys: Peta-San-02
osd: 0 osds: 0 up, 0 in
data:
pools: 2 pools, 306 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs: 100.000% pgs unknown
306 unknown
root@Peta-San-02:~#
Next, I created a pool which went into a state of checking, then inactive.
I tried to add an iSCSI disk, but I was refused with the alert:
Auto ip assignment is disabled due to some pools being inactive.
I seem to be back where I was before.
southcoast
50 Posts
October 26, 2018, 5:04 pmQuote from southcoast on October 26, 2018, 5:04 pmI have this morning issued a reload command to the other 2 servers in the configuration, but, the pool I configured is still inactive and no OSD's are showing in response to a status command. Is there a CLI command I can issue to manually create and start the needed OSD, or OSD's, to make the SAN work?
I have this morning issued a reload command to the other 2 servers in the configuration, but, the pool I configured is still inactive and no OSD's are showing in response to a status command. Is there a CLI command I can issue to manually create and start the needed OSD, or OSD's, to make the SAN work?
admin
2,930 Posts
October 26, 2018, 7:53 pmQuote from admin on October 26, 2018, 7:53 pmDid you check if node 2 was booting from a previous PetaSAN boot disk ?
To activate a pool, you need to add OSDs as per my prev replies.
Did you check if node 2 was booting from a previous PetaSAN boot disk ?
To activate a pool, you need to add OSDs as per my prev replies.
southcoast
50 Posts
October 26, 2018, 7:55 pmQuote from southcoast on October 26, 2018, 7:55 pmIs there a rule which could be created to alleviate the inactive pools issue? I have been looking at the Ceph documentation pages, but, a lot of the commands there do not work on the Petasan servers in place now. There are recommended commands in the Ceph pages related to the inactive placement groups (pg's) I see in the status output:
root@Peta-SAN-01:/# ceph health --cluster=Cameron-SAN-01
HEALTH_WARN Reduced data availability: 306 pgs inactive
root@Peta-SAN-01:/#
Is in the Configuration-> CRUSH->Rules section a place to make needed adjustments for this problem?
Is there a rule which could be created to alleviate the inactive pools issue? I have been looking at the Ceph documentation pages, but, a lot of the commands there do not work on the Petasan servers in place now. There are recommended commands in the Ceph pages related to the inactive placement groups (pg's) I see in the status output:
root@Peta-SAN-01:/# ceph health --cluster=Cameron-SAN-01
HEALTH_WARN Reduced data availability: 306 pgs inactive
root@Peta-SAN-01:/#
Is in the Configuration-> CRUSH->Rules section a place to make needed adjustments for this problem?
admin
2,930 Posts
October 26, 2018, 8:15 pmQuote from admin on October 26, 2018, 8:15 pmDid you check if node 2 was booting from a previous PetaSAN boot disk ?
To activate a pool, you need to add OSDs as per my prev replies.
Did you check if node 2 was booting from a previous PetaSAN boot disk ?
To activate a pool, you need to add OSDs as per my prev replies.
southcoast
50 Posts
October 27, 2018, 2:02 amQuote from southcoast on October 27, 2018, 2:02 amOn node 2 I just outright replaced the disks and reinitialized.
Please advise the steps to add the OSDs.
On node 2 I just outright replaced the disks and reinitialized.
Please advise the steps to add the OSDs.
southcoast
50 Posts
October 27, 2018, 5:38 pmQuote from southcoast on October 27, 2018, 5:38 pmAfter some groping around the Ceph pages, I managed to get add the OSD's. I created 3 of them but they are down. I cannot locate the valid command to start the OSD. What is that command, please?
After some groping around the Ceph pages, I managed to get add the OSD's. I created 3 of them but they are down. I cannot locate the valid command to start the OSD. What is that command, please?
admin
2,930 Posts
October 28, 2018, 3:07 pmQuote from admin on October 28, 2018, 3:07 pmsee my prev replies on how to add osds
see my prev replies on how to add osds
southcoast
50 Posts
October 28, 2018, 5:58 pmQuote from southcoast on October 28, 2018, 5:58 pmI have added my OSD's, thank-you. I created three of them.
The problem is starting them.
root@Peta-San-03:~# ceph osd status --cluster=Cameron-SAN-01
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| 0 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 1 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 2 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
root@Peta-San-03:~#
The original issue with consul members seems not to be a problem:
root@Peta-San-03:~# consul members
Node Address Status Type Build Protocol DC
Peta-SAN-01 10.250.252.11:8301 alive server 0.7.3 2 petasan
Peta-San-02 10.250.252.12:8301 alive server 0.7.3 2 petasan
Peta-San-03 10.250.252.13:8301 alive server 0.7.3 2 petasan
root@Peta-San-03:~#
I cannot locate the command to start the OSD's.
I have added my OSD's, thank-you. I created three of them.
The problem is starting them.
root@Peta-San-03:~# ceph osd status --cluster=Cameron-SAN-01
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| 0 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 1 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 2 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
root@Peta-San-03:~#
The original issue with consul members seems not to be a problem:
root@Peta-San-03:~# consul members
Node Address Status Type Build Protocol DC
Peta-SAN-01 10.250.252.11:8301 alive server 0.7.3 2 petasan
Peta-San-02 10.250.252.12:8301 alive server 0.7.3 2 petasan
Peta-San-03 10.250.252.13:8301 alive server 0.7.3 2 petasan
root@Peta-San-03:~#
I cannot locate the command to start the OSD's.
Difficulty recovering one of three servers after power outage
admin
2,930 Posts
Quote from admin on October 25, 2018, 8:26 pmGreat , let me know when you find out. We do wipe clean the disks we use: the system disk + any disks you select to add as osds or journals, else they are left.
Let me know what you find , if indeed it was booting from the wrong disk then either fix it to boot from correct one or if it was damaged then we would re-install node 2 on the working disk and deploy it with the option "Replace Management Node"
From the earlier post on adding OSDs from the ui: this was for the 2 "good" nodes, it will not work with the problem node. it is up to you either to add OSDs now on those nodes but i would prefer to wait until you fix the issue with node 2. Once you add OSDs, your pools will be active, then you will be able to add iSCSI disks.
Great , let me know when you find out. We do wipe clean the disks we use: the system disk + any disks you select to add as osds or journals, else they are left.
Let me know what you find , if indeed it was booting from the wrong disk then either fix it to boot from correct one or if it was damaged then we would re-install node 2 on the working disk and deploy it with the option "Replace Management Node"
From the earlier post on adding OSDs from the ui: this was for the 2 "good" nodes, it will not work with the problem node. it is up to you either to add OSDs now on those nodes but i would prefer to wait until you fix the issue with node 2. Once you add OSDs, your pools will be active, then you will be able to add iSCSI disks.
southcoast
50 Posts
Quote from southcoast on October 26, 2018, 2:41 amI reinstalled the application and took the opportunity replace the two original drives with a 1tb and 500gb volume. I gave the host the same host name with the “Replace Management Node” specifier.
I get a better response to a status command than before:
root@Peta-San-02:~# ceph status --cluster=Cameron-SAN-01
cluster:
id: 20448d28-7acc-4289-856f-a91b6ac26c6e
health: HEALTH_WARN
Reduced data availability: 306 pgs inactive
services:
mon: 3 daemons, quorum Peta-SAN-01,Peta-San-02,Peta-San-03
mgr: Peta-SAN-01(active), standbys: Peta-San-02
osd: 0 osds: 0 up, 0 in
data:
pools: 2 pools, 306 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs: 100.000% pgs unknown
306 unknown
root@Peta-San-02:~#
Next, I created a pool which went into a state of checking, then inactive.
I tried to add an iSCSI disk, but I was refused with the alert:
Auto ip assignment is disabled due to some pools being inactive.
I seem to be back where I was before.
I reinstalled the application and took the opportunity replace the two original drives with a 1tb and 500gb volume. I gave the host the same host name with the “Replace Management Node” specifier.
I get a better response to a status command than before:
root@Peta-San-02:~# ceph status --cluster=Cameron-SAN-01
cluster:
id: 20448d28-7acc-4289-856f-a91b6ac26c6e
health: HEALTH_WARN
Reduced data availability: 306 pgs inactive
services:
mon: 3 daemons, quorum Peta-SAN-01,Peta-San-02,Peta-San-03
mgr: Peta-SAN-01(active), standbys: Peta-San-02
osd: 0 osds: 0 up, 0 in
data:
pools: 2 pools, 306 pgs
objects: 0 objects, 0 bytes
usage: 0 kB used, 0 kB / 0 kB avail
pgs: 100.000% pgs unknown
306 unknown
root@Peta-San-02:~#
Next, I created a pool which went into a state of checking, then inactive.
I tried to add an iSCSI disk, but I was refused with the alert:
Auto ip assignment is disabled due to some pools being inactive.
I seem to be back where I was before.
southcoast
50 Posts
Quote from southcoast on October 26, 2018, 5:04 pmI have this morning issued a reload command to the other 2 servers in the configuration, but, the pool I configured is still inactive and no OSD's are showing in response to a status command. Is there a CLI command I can issue to manually create and start the needed OSD, or OSD's, to make the SAN work?
I have this morning issued a reload command to the other 2 servers in the configuration, but, the pool I configured is still inactive and no OSD's are showing in response to a status command. Is there a CLI command I can issue to manually create and start the needed OSD, or OSD's, to make the SAN work?
admin
2,930 Posts
Quote from admin on October 26, 2018, 7:53 pmDid you check if node 2 was booting from a previous PetaSAN boot disk ?
To activate a pool, you need to add OSDs as per my prev replies.
Did you check if node 2 was booting from a previous PetaSAN boot disk ?
To activate a pool, you need to add OSDs as per my prev replies.
southcoast
50 Posts
Quote from southcoast on October 26, 2018, 7:55 pmIs there a rule which could be created to alleviate the inactive pools issue? I have been looking at the Ceph documentation pages, but, a lot of the commands there do not work on the Petasan servers in place now. There are recommended commands in the Ceph pages related to the inactive placement groups (pg's) I see in the status output:
root@Peta-SAN-01:/# ceph health --cluster=Cameron-SAN-01
HEALTH_WARN Reduced data availability: 306 pgs inactive
root@Peta-SAN-01:/#Is in the Configuration-> CRUSH->Rules section a place to make needed adjustments for this problem?
Is there a rule which could be created to alleviate the inactive pools issue? I have been looking at the Ceph documentation pages, but, a lot of the commands there do not work on the Petasan servers in place now. There are recommended commands in the Ceph pages related to the inactive placement groups (pg's) I see in the status output:
root@Peta-SAN-01:/# ceph health --cluster=Cameron-SAN-01
HEALTH_WARN Reduced data availability: 306 pgs inactive
root@Peta-SAN-01:/#
Is in the Configuration-> CRUSH->Rules section a place to make needed adjustments for this problem?
admin
2,930 Posts
Quote from admin on October 26, 2018, 8:15 pmDid you check if node 2 was booting from a previous PetaSAN boot disk ?
To activate a pool, you need to add OSDs as per my prev replies.
Did you check if node 2 was booting from a previous PetaSAN boot disk ?
To activate a pool, you need to add OSDs as per my prev replies.
southcoast
50 Posts
Quote from southcoast on October 27, 2018, 2:02 amOn node 2 I just outright replaced the disks and reinitialized.
Please advise the steps to add the OSDs.
On node 2 I just outright replaced the disks and reinitialized.
Please advise the steps to add the OSDs.
southcoast
50 Posts
Quote from southcoast on October 27, 2018, 5:38 pmAfter some groping around the Ceph pages, I managed to get add the OSD's. I created 3 of them but they are down. I cannot locate the valid command to start the OSD. What is that command, please?
After some groping around the Ceph pages, I managed to get add the OSD's. I created 3 of them but they are down. I cannot locate the valid command to start the OSD. What is that command, please?
admin
2,930 Posts
Quote from admin on October 28, 2018, 3:07 pmsee my prev replies on how to add osds
see my prev replies on how to add osds
southcoast
50 Posts
Quote from southcoast on October 28, 2018, 5:58 pmI have added my OSD's, thank-you. I created three of them.
The problem is starting them.
root@Peta-San-03:~# ceph osd status --cluster=Cameron-SAN-01
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| 0 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 1 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 2 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
root@Peta-San-03:~#
The original issue with consul members seems not to be a problem:
root@Peta-San-03:~# consul members
Node Address Status Type Build Protocol DC
Peta-SAN-01 10.250.252.11:8301 alive server 0.7.3 2 petasan
Peta-San-02 10.250.252.12:8301 alive server 0.7.3 2 petasan
Peta-San-03 10.250.252.13:8301 alive server 0.7.3 2 petasan
root@Peta-San-03:~#
I cannot locate the command to start the OSD's.
I have added my OSD's, thank-you. I created three of them.
The problem is starting them.
root@Peta-San-03:~# ceph osd status --cluster=Cameron-SAN-01
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
| 0 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 1 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
| 2 | | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists |
+----+------+-------+-------+--------+---------+--------+---------+----------------+
root@Peta-San-03:~#
The original issue with consul members seems not to be a problem:
root@Peta-San-03:~# consul members
Node Address Status Type Build Protocol DC
Peta-SAN-01 10.250.252.11:8301 alive server 0.7.3 2 petasan
Peta-San-02 10.250.252.12:8301 alive server 0.7.3 2 petasan
Peta-San-03 10.250.252.13:8301 alive server 0.7.3 2 petasan
root@Peta-San-03:~#
I cannot locate the command to start the OSD's.