Forums - PetaSAN

ForumGeneral DiscussionProblem replacing first node afte …
You need to log in to create posts and topics. Login · Register
Problem replacing first node after disk failure

erazmus
40 Posts

September 9, 2017, 10:32 pm
Quote from erazmus on September 9, 2017, 10:32 pm
In my test environment, I had a disk failure on my first node. I've replaced the drive, and installed PetaSAN to it, using the same IP address that this machine had originally.

When I try to join it to the cluster, it says "Alert: Error joining node to cluster" when I give the IP address of either remaining monitor nodes. Is there something I should be doing? This node was one of the three monitor nodes.

In my test environment, I had a disk failure on my first node. I've replaced the drive, and installed PetaSAN to it, using the same IP address that this machine had originally.

When I try to join it to the cluster, it says "Alert: Error joining node to cluster" when I give the IP address of either remaining monitor nodes. Is there something I should be doing? This node was one of the three monitor nodes.

#1

admin
2,962 Posts

September 10, 2017, 5:46 am
Quote from admin on September 10, 2017, 5:46 am
I understand you had a failure on your system disk. Since this is a management node, you need to choose "Replace Management Node" in the first step of the deployment wizard. You also need to give it the same hostname and ip address when installing via the installer.

Nodes beyond 3 can be deleted and added at will, nodes can be deleted from the node list and added to the cluster by choosing "Join existing Cluster". But the first 3 nodes if they fail need to be "replaced" as soon as possible, they are the management nodes that contain the brain of the cluster (Ceph monitors,Consul Server, PetaSAN Management ), the cluster can tolerate 1 management node failure but not 2. Replacement nodes need to have the same hostname/ip since the Ceph monitors cannot have those changed.

Note some users prefer to install system disk on RAID1

I understand you had a failure on your system disk. Since this is a management node, you need to choose "Replace Management Node" in the first step of the deployment wizard. You also need to give it the same hostname and ip address when installing via the installer.

Nodes beyond 3 can be deleted and added at will, nodes can be deleted from the node list and added to the cluster by choosing "Join existing Cluster". But the first 3 nodes if they fail need to be "replaced" as soon as possible, they are the management nodes that contain the brain of the cluster (Ceph monitors,Consul Server, PetaSAN Management ), the cluster can tolerate 1 management node failure but not 2. Replacement nodes need to have the same hostname/ip since the Ceph monitors cannot have those changed.

Note some users prefer to install system disk on RAID1

Last edited on September 10, 2017, 5:49 am by admin · #2

erazmus
40 Posts

September 10, 2017, 2:53 pm
Quote from erazmus on September 10, 2017, 2:53 pm
Great support as always. Somehow I missed the 'replace management node' option. When I tried it, it worked flawlessly.

Great support as always. Somehow I missed the 'replace management node' option. When I tried it, it worked flawlessly.

#3

Post Reply: Problem replacing first node after disk failure

Cancel