Problem replacing first node after disk failure
erazmus
40 Posts
September 9, 2017, 10:32 pmQuote from erazmus on September 9, 2017, 10:32 pmIn my test environment, I had a disk failure on my first node. I've replaced the drive, and installed PetaSAN to it, using the same IP address that this machine had originally.
When I try to join it to the cluster, it says "Alert: Error joining node to cluster" when I give the IP address of either remaining monitor nodes. Is there something I should be doing? This node was one of the three monitor nodes.
In my test environment, I had a disk failure on my first node. I've replaced the drive, and installed PetaSAN to it, using the same IP address that this machine had originally.
When I try to join it to the cluster, it says "Alert: Error joining node to cluster" when I give the IP address of either remaining monitor nodes. Is there something I should be doing? This node was one of the three monitor nodes.
admin
2,930 Posts
September 10, 2017, 5:46 amQuote from admin on September 10, 2017, 5:46 amI understand you had a failure on your system disk. Since this is a management node, you need to choose "Replace Management Node" in the first step of the deployment wizard. You also need to give it the same hostname and ip address when installing via the installer.
Nodes beyond 3 can be deleted and added at will, nodes can be deleted from the node list and added to the cluster by choosing "Join existing Cluster". But the first 3 nodes if they fail need to be "replaced" as soon as possible, they are the management nodes that contain the brain of the cluster (Ceph monitors,Consul Server, PetaSAN Management ), the cluster can tolerate 1 management node failure but not 2. Replacement nodes need to have the same hostname/ip since the Ceph monitors cannot have those changed.
Note some users prefer to install system disk on RAID1
I understand you had a failure on your system disk. Since this is a management node, you need to choose "Replace Management Node" in the first step of the deployment wizard. You also need to give it the same hostname and ip address when installing via the installer.
Nodes beyond 3 can be deleted and added at will, nodes can be deleted from the node list and added to the cluster by choosing "Join existing Cluster". But the first 3 nodes if they fail need to be "replaced" as soon as possible, they are the management nodes that contain the brain of the cluster (Ceph monitors,Consul Server, PetaSAN Management ), the cluster can tolerate 1 management node failure but not 2. Replacement nodes need to have the same hostname/ip since the Ceph monitors cannot have those changed.
Note some users prefer to install system disk on RAID1
Last edited on September 10, 2017, 5:49 am by admin · #2
erazmus
40 Posts
September 10, 2017, 2:53 pmQuote from erazmus on September 10, 2017, 2:53 pmGreat support as always. Somehow I missed the 'replace management node' option. When I tried it, it worked flawlessly.
Great support as always. Somehow I missed the 'replace management node' option. When I tried it, it worked flawlessly.
Problem replacing first node after disk failure
erazmus
40 Posts
Quote from erazmus on September 9, 2017, 10:32 pmIn my test environment, I had a disk failure on my first node. I've replaced the drive, and installed PetaSAN to it, using the same IP address that this machine had originally.
When I try to join it to the cluster, it says "Alert: Error joining node to cluster" when I give the IP address of either remaining monitor nodes. Is there something I should be doing? This node was one of the three monitor nodes.
In my test environment, I had a disk failure on my first node. I've replaced the drive, and installed PetaSAN to it, using the same IP address that this machine had originally.
When I try to join it to the cluster, it says "Alert: Error joining node to cluster" when I give the IP address of either remaining monitor nodes. Is there something I should be doing? This node was one of the three monitor nodes.
admin
2,930 Posts
Quote from admin on September 10, 2017, 5:46 amI understand you had a failure on your system disk. Since this is a management node, you need to choose "Replace Management Node" in the first step of the deployment wizard. You also need to give it the same hostname and ip address when installing via the installer.
Nodes beyond 3 can be deleted and added at will, nodes can be deleted from the node list and added to the cluster by choosing "Join existing Cluster". But the first 3 nodes if they fail need to be "replaced" as soon as possible, they are the management nodes that contain the brain of the cluster (Ceph monitors,Consul Server, PetaSAN Management ), the cluster can tolerate 1 management node failure but not 2. Replacement nodes need to have the same hostname/ip since the Ceph monitors cannot have those changed.
Note some users prefer to install system disk on RAID1
I understand you had a failure on your system disk. Since this is a management node, you need to choose "Replace Management Node" in the first step of the deployment wizard. You also need to give it the same hostname and ip address when installing via the installer.
Nodes beyond 3 can be deleted and added at will, nodes can be deleted from the node list and added to the cluster by choosing "Join existing Cluster". But the first 3 nodes if they fail need to be "replaced" as soon as possible, they are the management nodes that contain the brain of the cluster (Ceph monitors,Consul Server, PetaSAN Management ), the cluster can tolerate 1 management node failure but not 2. Replacement nodes need to have the same hostname/ip since the Ceph monitors cannot have those changed.
Note some users prefer to install system disk on RAID1
erazmus
40 Posts
Quote from erazmus on September 10, 2017, 2:53 pmGreat support as always. Somehow I missed the 'replace management node' option. When I tried it, it worked flawlessly.
Great support as always. Somehow I missed the 'replace management node' option. When I tried it, it worked flawlessly.