Moving management interface to a bond after cluster is built.
Pages: 1 2
DividedByPi
32 Posts
January 18, 2021, 3:23 pmQuote from DividedByPi on January 18, 2021, 3:23 pmHey there,
I have been trying some new things and have come across some odd activity, and I'm just curious if anyone has had this issue. So I have a fully built cluster that did not use bonding, it had a single interface for back-end and a single interface for management. I decided I wanted to create a bond that would use both. In order to not have any weird syntax errors, I just created a VM and installed Petasan on it and used the exact configuration I wanted for my already built cluster so I could use the cluster_info.json file.
So, I made the required edits to my cluster_info.json and the file looks like so:
{
"backend_1_base_ip": "10.10.1.1",
"backend_1_eth_name": "bond0",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth0,eth1",
"is_jumbo_frames": false,
"mode": "active-backup",
"name": "bond0",
"primary_interface": "eth0"
}
],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "2",
"eth_count": 4,
"jf_mtu_size": "",
"jumbo_frames": [],
"management_eth_name": "bond0",
"management_nodes": [
{
"backend_1_ip": "10.10.1.1",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.101",
"name": "petasan01"
},
{
"backend_1_ip": "10.10.1.2",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "192.168.202.102",
"name": "petasan02"
},
{
"backend_1_ip": "10.10.1.3",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.103",
"name": "petasan03"
}
],
"name": "petademo",
"storage_engine": "bluestore"
}
Next, I rebooted all of the nodes. The cluster came back up healthy and the bond was created, however there are some very odd quirks now with the cluster that aren't working correctly. The first one is, if I go into the WebUI and goto Nodes List, and click on the Settings button, it just sits forever until it finally times out with "Bad gateway"
Node 1 is also acting strange as I can't access the web UI from the management IP of node 1, and when I look at the Nodes list in the Web UI it says node 1 is down, however when I look at Ceph, it shows node 1 as up and working properly - All OSD's are up and I am able to run Ceph commands from node 1.
Any input would be great!
Thanks
Hey there,
I have been trying some new things and have come across some odd activity, and I'm just curious if anyone has had this issue. So I have a fully built cluster that did not use bonding, it had a single interface for back-end and a single interface for management. I decided I wanted to create a bond that would use both. In order to not have any weird syntax errors, I just created a VM and installed Petasan on it and used the exact configuration I wanted for my already built cluster so I could use the cluster_info.json file.
So, I made the required edits to my cluster_info.json and the file looks like so:
{
"backend_1_base_ip": "10.10.1.1",
"backend_1_eth_name": "bond0",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth0,eth1",
"is_jumbo_frames": false,
"mode": "active-backup",
"name": "bond0",
"primary_interface": "eth0"
}
],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "2",
"eth_count": 4,
"jf_mtu_size": "",
"jumbo_frames": [],
"management_eth_name": "bond0",
"management_nodes": [
{
"backend_1_ip": "10.10.1.1",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.101",
"name": "petasan01"
},
{
"backend_1_ip": "10.10.1.2",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "192.168.202.102",
"name": "petasan02"
},
{
"backend_1_ip": "10.10.1.3",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.103",
"name": "petasan03"
}
],
"name": "petademo",
"storage_engine": "bluestore"
}
Next, I rebooted all of the nodes. The cluster came back up healthy and the bond was created, however there are some very odd quirks now with the cluster that aren't working correctly. The first one is, if I go into the WebUI and goto Nodes List, and click on the Settings button, it just sits forever until it finally times out with "Bad gateway"
Node 1 is also acting strange as I can't access the web UI from the management IP of node 1, and when I look at the Nodes list in the Web UI it says node 1 is down, however when I look at Ceph, it shows node 1 as up and working properly - All OSD's are up and I am able to run Ceph commands from node 1.
Any input would be great!
Thanks
Last edited on January 18, 2021, 3:26 pm by DividedByPi · #1
DividedByPi
32 Posts
January 18, 2021, 3:24 pmQuote from DividedByPi on January 18, 2021, 3:24 pmAlso - Here is my ORIGINAL cluster_info.json that all nodes used prior to me making the changes I showed above:
{
"backend_1_base_ip": "10.10.1.1",
"backend_1_eth_name": "eth1",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "2",
"eth_count": 4,
"jf_mtu_size": "",
"jumbo_frames": [],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "10.10.1.1",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.101",
"name": "petasan01"
},
{
"backend_1_ip": "10.10.1.2",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "192.168.202.102",
"name": "petasan02"
},
{
"backend_1_ip": "10.10.1.3",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.103",
"name": "petasan03"
}
],
"name": "petademo",
"storage_engine": "bluestore"
}
Also - Here is my ORIGINAL cluster_info.json that all nodes used prior to me making the changes I showed above:
{
"backend_1_base_ip": "10.10.1.1",
"backend_1_eth_name": "eth1",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "2",
"eth_count": 4,
"jf_mtu_size": "",
"jumbo_frames": [],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "10.10.1.1",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.101",
"name": "petasan01"
},
{
"backend_1_ip": "10.10.1.2",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "192.168.202.102",
"name": "petasan02"
},
{
"backend_1_ip": "10.10.1.3",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.103",
"name": "petasan03"
}
],
"name": "petademo",
"storage_engine": "bluestore"
}
admin
2,930 Posts
January 18, 2021, 3:48 pmQuote from admin on January 18, 2021, 3:48 pmif you ssh to the nodes via backend network, can the nodes ping each other over the management ips ?
if you ssh to the nodes via backend network, can the nodes ping each other over the management ips ?
DividedByPi
32 Posts
January 18, 2021, 3:57 pmQuote from DividedByPi on January 18, 2021, 3:57 pmYes I can.
I can also SSH into node 1 from the management IP as well.
Petasan.log has some connection refused errors - such as this:
18/01/2021 11:55:48 INFO Checking backend latencies :
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.1 =
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.2 =
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.3 = 24.4 us
18/01/2021 11:55:50 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b06d8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:55:52 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0a58>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:55:56 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0c18>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:56:04 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0dd8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:56:08 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f76bf4074a8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader?index=3914040&wait=20s
18/01/2021 11:56:10 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff0d5ebda58>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
This is only happening on node 1. All other nodes are quiet.
Yes I can.
I can also SSH into node 1 from the management IP as well.
Petasan.log has some connection refused errors - such as this:
18/01/2021 11:55:48 INFO Checking backend latencies :
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.1 =
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.2 =
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.3 = 24.4 us
18/01/2021 11:55:50 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b06d8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:55:52 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0a58>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:55:56 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0c18>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:56:04 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0dd8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:56:08 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f76bf4074a8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader?index=3914040&wait=20s
18/01/2021 11:56:10 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff0d5ebda58>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
This is only happening on node 1. All other nodes are quiet.
DividedByPi
32 Posts
January 18, 2021, 4:41 pmQuote from DividedByPi on January 18, 2021, 4:41 pmSo I have tracked all of the odd behavior down to node 1.
If I shutdown node 1, I can go into the Settings page of the remaining nodes From within "Nodes list" and see all the information as normal.
The problem seems to be down to the refused connections that are happening when Node 1 is up.
So I have tracked all of the odd behavior down to node 1.
If I shutdown node 1, I can go into the Settings page of the remaining nodes From within "Nodes list" and see all the information as normal.
The problem seems to be down to the refused connections that are happening when Node 1 is up.
DividedByPi
32 Posts
January 18, 2021, 5:27 pmQuote from DividedByPi on January 18, 2021, 5:27 pmSorry just going to dump some more errors from Petasan.log that I am seeing.
It appears as though for some reason Node 1 is having issues running/getting information from some of the scripts that Petasan runs when a node starts up. It is very strange however, since the other 2 nodes are fine.
Here is the dump:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/PetaSAN/backend/file_sync_manager.py", line 81, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/base.py", line 77, in watch
index, data = cons.kv.get(key, index=current_index, recurse=True)
File "/usr/lib/python3/dist-packages/consul/base.py", line 554, in get
params=params)
File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/usr/lib/python3/dist-packages/retrying.py", line 212, in call
raise attempt.get()
File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/ps_consul.py", line 85, in get
raise e
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/ps_consul.py", line 70, in get
res = self.response(self.session.get(uri, verify=self.verify))
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 520, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 630, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config /Files?index=150&recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b4a7ac8>: Failed to establis h a new connection: [Errno 111] Connection refused',))
18/01/2021 13:26:17 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59c080>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:19 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59c0b8>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:23 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59cf98>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:31 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b567eb8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:33 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdb0b435c18>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/session/list
18/01/2021 13:26:39 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f33fb9569b0>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader?index=3914040&wait=20s
Sorry just going to dump some more errors from Petasan.log that I am seeing.
It appears as though for some reason Node 1 is having issues running/getting information from some of the scripts that Petasan runs when a node starts up. It is very strange however, since the other 2 nodes are fine.
Here is the dump:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/PetaSAN/backend/file_sync_manager.py", line 81, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/base.py", line 77, in watch
index, data = cons.kv.get(key, index=current_index, recurse=True)
File "/usr/lib/python3/dist-packages/consul/base.py", line 554, in get
params=params)
File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/usr/lib/python3/dist-packages/retrying.py", line 212, in call
raise attempt.get()
File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/ps_consul.py", line 85, in get
raise e
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/ps_consul.py", line 70, in get
res = self.response(self.session.get(uri, verify=self.verify))
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 520, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 630, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config /Files?index=150&recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b4a7ac8>: Failed to establis h a new connection: [Errno 111] Connection refused',))
18/01/2021 13:26:17 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59c080>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:19 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59c0b8>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:23 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59cf98>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:31 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b567eb8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:33 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdb0b435c18>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/session/list
18/01/2021 13:26:39 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f33fb9569b0>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader?index=3914040&wait=20s
admin
2,930 Posts
January 18, 2021, 5:43 pmQuote from admin on January 18, 2021, 5:43 pmcan you run
consul members
on all 3 nodes
do you see any ping latency on backend network from/to node 1 with the other nodes ?
can you run
consul members
on all 3 nodes
do you see any ping latency on backend network from/to node 1 with the other nodes ?
Last edited on January 18, 2021, 5:45 pm by admin · #7
DividedByPi
32 Posts
January 18, 2021, 5:55 pmQuote from DividedByPi on January 18, 2021, 5:55 pmHey !
I know I am just double-posting away like crazy here, but I have been able to track down the issue and fix it.
It appears that when switching the management and the back-end network from single interfaces to a bond, node 1 had an issue not being able to join the consul cluster. The other nodes did successfully which is what threw me off.
I was able to fix it by manually re-joining the consul cluster with the following command:
consul agent -config-dir /opt/petasan/config/etc/consul.d/server -bind node1backendIP -retry-join node2backendIP -retry-join node3backendIP
In case anyone else ever has this issue!
Hey !
I know I am just double-posting away like crazy here, but I have been able to track down the issue and fix it.
It appears that when switching the management and the back-end network from single interfaces to a bond, node 1 had an issue not being able to join the consul cluster. The other nodes did successfully which is what threw me off.
I was able to fix it by manually re-joining the consul cluster with the following command:
consul agent -config-dir /opt/petasan/config/etc/consul.d/server -bind node1backendIP -retry-join node2backendIP -retry-join node3backendIP
In case anyone else ever has this issue!
Last edited on January 18, 2021, 6:09 pm by DividedByPi · #8
DividedByPi
32 Posts
January 18, 2021, 5:57 pmQuote from DividedByPi on January 18, 2021, 5:57 pm
Quote from admin on January 18, 2021, 5:43 pm
can you run
consul members
on all 3 nodes
do you see any ping latency on backend network from/to node 1 with the other nodes ?
Haha that would have helped me ! Fortunately I ended up figuring that out !! It was the issue. THanks for the help
Quote from admin on January 18, 2021, 5:43 pm
can you run
consul members
on all 3 nodes
do you see any ping latency on backend network from/to node 1 with the other nodes ?
Haha that would have helped me ! Fortunately I ended up figuring that out !! It was the issue. THanks for the help
DividedByPi
32 Posts
January 19, 2021, 2:00 pmQuote from DividedByPi on January 19, 2021, 2:00 pmIt looks as though the reason why you have to run the manual bind command is because consul tries to use a loopback address for its own address, but now there is more than 1 IP on the network interface, so consul is trying to use the management IP rather than the backend.
running the bind command manually is great to get it up and running again, but its not a longterm solution as it does not stick on reboot.
It looks as though the reason why you have to run the manual bind command is because consul tries to use a loopback address for its own address, but now there is more than 1 IP on the network interface, so consul is trying to use the management IP rather than the backend.
running the bind command manually is great to get it up and running again, but its not a longterm solution as it does not stick on reboot.
Pages: 1 2
Moving management interface to a bond after cluster is built.
DividedByPi
32 Posts
Quote from DividedByPi on January 18, 2021, 3:23 pmHey there,
I have been trying some new things and have come across some odd activity, and I'm just curious if anyone has had this issue. So I have a fully built cluster that did not use bonding, it had a single interface for back-end and a single interface for management. I decided I wanted to create a bond that would use both. In order to not have any weird syntax errors, I just created a VM and installed Petasan on it and used the exact configuration I wanted for my already built cluster so I could use the cluster_info.json file.
So, I made the required edits to my cluster_info.json and the file looks like so:
{"backend_1_base_ip": "10.10.1.1","backend_1_eth_name": "bond0","backend_1_mask": "255.255.255.0","backend_1_vlan_id": "","backend_2_base_ip": "","backend_2_eth_name": "","backend_2_mask": "","backend_2_vlan_id": "","bonds": [{"interfaces": "eth0,eth1","is_jumbo_frames": false,"mode": "active-backup","name": "bond0","primary_interface": "eth0"}],"default_pool": "both","default_pool_pgs": "256","default_pool_replicas": "2","eth_count": 4,"jf_mtu_size": "","jumbo_frames": [],"management_eth_name": "bond0","management_nodes": [{"backend_1_ip": "10.10.1.1","backend_2_ip": "","is_backup": false,"is_cifs": true,"is_iscsi": true,"is_management": true,"is_nfs": true,"is_storage": true,"management_ip": "192.168.202.101","name": "petasan01"},{"backend_1_ip": "10.10.1.2","backend_2_ip": "","is_backup": false,"is_cifs": false,"is_iscsi": true,"is_management": true,"is_nfs": false,"is_storage": true,"management_ip": "192.168.202.102","name": "petasan02"},{"backend_1_ip": "10.10.1.3","backend_2_ip": "","is_backup": false,"is_cifs": true,"is_iscsi": true,"is_management": true,"is_nfs": true,"is_storage": true,"management_ip": "192.168.202.103","name": "petasan03"}],"name": "petademo","storage_engine": "bluestore"}Next, I rebooted all of the nodes. The cluster came back up healthy and the bond was created, however there are some very odd quirks now with the cluster that aren't working correctly. The first one is, if I go into the WebUI and goto Nodes List, and click on the Settings button, it just sits forever until it finally times out with "Bad gateway"Node 1 is also acting strange as I can't access the web UI from the management IP of node 1, and when I look at the Nodes list in the Web UI it says node 1 is down, however when I look at Ceph, it shows node 1 as up and working properly - All OSD's are up and I am able to run Ceph commands from node 1.Any input would be great!Thanks
Hey there,
I have been trying some new things and have come across some odd activity, and I'm just curious if anyone has had this issue. So I have a fully built cluster that did not use bonding, it had a single interface for back-end and a single interface for management. I decided I wanted to create a bond that would use both. In order to not have any weird syntax errors, I just created a VM and installed Petasan on it and used the exact configuration I wanted for my already built cluster so I could use the cluster_info.json file.
So, I made the required edits to my cluster_info.json and the file looks like so:
DividedByPi
32 Posts
Quote from DividedByPi on January 18, 2021, 3:24 pmAlso - Here is my ORIGINAL cluster_info.json that all nodes used prior to me making the changes I showed above:
{
"backend_1_base_ip": "10.10.1.1",
"backend_1_eth_name": "eth1",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "2",
"eth_count": 4,
"jf_mtu_size": "",
"jumbo_frames": [],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "10.10.1.1",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.101",
"name": "petasan01"
},
{
"backend_1_ip": "10.10.1.2",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "192.168.202.102",
"name": "petasan02"
},
{
"backend_1_ip": "10.10.1.3",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.103",
"name": "petasan03"
}
],
"name": "petademo",
"storage_engine": "bluestore"
}
Also - Here is my ORIGINAL cluster_info.json that all nodes used prior to me making the changes I showed above:
{
"backend_1_base_ip": "10.10.1.1",
"backend_1_eth_name": "eth1",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "2",
"eth_count": 4,
"jf_mtu_size": "",
"jumbo_frames": [],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "10.10.1.1",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.101",
"name": "petasan01"
},
{
"backend_1_ip": "10.10.1.2",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": false,
"is_iscsi": true,
"is_management": true,
"is_nfs": false,
"is_storage": true,
"management_ip": "192.168.202.102",
"name": "petasan02"
},
{
"backend_1_ip": "10.10.1.3",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "192.168.202.103",
"name": "petasan03"
}
],
"name": "petademo",
"storage_engine": "bluestore"
}
admin
2,930 Posts
Quote from admin on January 18, 2021, 3:48 pmif you ssh to the nodes via backend network, can the nodes ping each other over the management ips ?
if you ssh to the nodes via backend network, can the nodes ping each other over the management ips ?
DividedByPi
32 Posts
Quote from DividedByPi on January 18, 2021, 3:57 pmYes I can.
I can also SSH into node 1 from the management IP as well.
Petasan.log has some connection refused errors - such as this:18/01/2021 11:55:48 INFO Checking backend latencies :
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.1 =
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.2 =
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.3 = 24.4 us
18/01/2021 11:55:50 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b06d8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:55:52 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0a58>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:55:56 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0c18>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:56:04 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0dd8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:56:08 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f76bf4074a8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader?index=3914040&wait=20s
18/01/2021 11:56:10 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff0d5ebda58>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
This is only happening on node 1. All other nodes are quiet.
Yes I can.
I can also SSH into node 1 from the management IP as well.
Petasan.log has some connection refused errors - such as this:
18/01/2021 11:55:48 INFO Checking backend latencies :
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.1 =
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.2 =
18/01/2021 11:55:48 INFO Network latency for backend 10.10.1.3 = 24.4 us
18/01/2021 11:55:50 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b06d8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:55:52 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0a58>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:55:56 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0c18>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:56:04 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f2a200b0dd8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Disks/
18/01/2021 11:56:08 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f76bf4074a8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader?index=3914040&wait=20s
18/01/2021 11:56:10 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff0d5ebda58>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
This is only happening on node 1. All other nodes are quiet.
DividedByPi
32 Posts
Quote from DividedByPi on January 18, 2021, 4:41 pmSo I have tracked all of the odd behavior down to node 1.
If I shutdown node 1, I can go into the Settings page of the remaining nodes From within "Nodes list" and see all the information as normal.
The problem seems to be down to the refused connections that are happening when Node 1 is up.
So I have tracked all of the odd behavior down to node 1.
If I shutdown node 1, I can go into the Settings page of the remaining nodes From within "Nodes list" and see all the information as normal.
The problem seems to be down to the refused connections that are happening when Node 1 is up.
DividedByPi
32 Posts
Quote from DividedByPi on January 18, 2021, 5:27 pmSorry just going to dump some more errors from Petasan.log that I am seeing.
It appears as though for some reason Node 1 is having issues running/getting information from some of the scripts that Petasan runs when a node starts up. It is very strange however, since the other 2 nodes are fine.
Here is the dump:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/PetaSAN/backend/file_sync_manager.py", line 81, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/base.py", line 77, in watch
index, data = cons.kv.get(key, index=current_index, recurse=True)
File "/usr/lib/python3/dist-packages/consul/base.py", line 554, in get
params=params)
File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/usr/lib/python3/dist-packages/retrying.py", line 212, in call
raise attempt.get()
File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/ps_consul.py", line 85, in get
raise e
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/ps_consul.py", line 70, in get
res = self.response(self.session.get(uri, verify=self.verify))
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 520, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 630, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config /Files?index=150&recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b4a7ac8>: Failed to establis h a new connection: [Errno 111] Connection refused',))
18/01/2021 13:26:17 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59c080>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:19 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59c0b8>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:23 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59cf98>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:31 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b567eb8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:33 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdb0b435c18>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/session/list
18/01/2021 13:26:39 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f33fb9569b0>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader?index=3914040&wait=20s
Sorry just going to dump some more errors from Petasan.log that I am seeing.
It appears as though for some reason Node 1 is having issues running/getting information from some of the scripts that Petasan runs when a node starts up. It is very strange however, since the other 2 nodes are fine.
Here is the dump:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/PetaSAN/backend/file_sync_manager.py", line 81, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/base.py", line 77, in watch
index, data = cons.kv.get(key, index=current_index, recurse=True)
File "/usr/lib/python3/dist-packages/consul/base.py", line 554, in get
params=params)
File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/usr/lib/python3/dist-packages/retrying.py", line 212, in call
raise attempt.get()
File "/usr/lib/python3/dist-packages/retrying.py", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/usr/lib/python3/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/lib/python3/dist-packages/retrying.py", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/ps_consul.py", line 85, in get
raise e
File "/usr/lib/python3/dist-packages/PetaSAN/core/consul/ps_consul.py", line 70, in get
res = self.response(self.session.get(uri, verify=self.verify))
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in get
return self.request('GET', url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 520, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 630, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config /Files?index=150&recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b4a7ac8>: Failed to establis h a new connection: [Errno 111] Connection refused',))
18/01/2021 13:26:17 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59c080>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:19 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59c0b8>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:23 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'N ewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b59cf98>: Failed to establish a new connection: [Errno 111] Connect ion refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:31 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f364b567eb8>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?index=150&recurse=1
18/01/2021 13:26:33 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fdb0b435c18>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/session/list
18/01/2021 13:26:39 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f33fb9569b0>: Failed to establish a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader?index=3914040&wait=20s
admin
2,930 Posts
Quote from admin on January 18, 2021, 5:43 pmcan you run
consul members
on all 3 nodes
do you see any ping latency on backend network from/to node 1 with the other nodes ?
can you run
consul members
on all 3 nodes
do you see any ping latency on backend network from/to node 1 with the other nodes ?
DividedByPi
32 Posts
Quote from DividedByPi on January 18, 2021, 5:55 pmHey !
I know I am just double-posting away like crazy here, but I have been able to track down the issue and fix it.
It appears that when switching the management and the back-end network from single interfaces to a bond, node 1 had an issue not being able to join the consul cluster. The other nodes did successfully which is what threw me off.
I was able to fix it by manually re-joining the consul cluster with the following command:
consul agent -config-dir /opt/petasan/config/etc/consul.d/server -bind node1backendIP -retry-join node2backendIP -retry-join node3backendIP
In case anyone else ever has this issue!
Hey !
I know I am just double-posting away like crazy here, but I have been able to track down the issue and fix it.
It appears that when switching the management and the back-end network from single interfaces to a bond, node 1 had an issue not being able to join the consul cluster. The other nodes did successfully which is what threw me off.
I was able to fix it by manually re-joining the consul cluster with the following command:
consul agent -config-dir /opt/petasan/config/etc/consul.d/server -bind node1backendIP -retry-join node2backendIP -retry-join node3backendIP
In case anyone else ever has this issue!
DividedByPi
32 Posts
Quote from DividedByPi on January 18, 2021, 5:57 pmQuote from admin on January 18, 2021, 5:43 pmcan you run
consul members
on all 3 nodes
do you see any ping latency on backend network from/to node 1 with the other nodes ?
Haha that would have helped me ! Fortunately I ended up figuring that out !! It was the issue. THanks for the help
Quote from admin on January 18, 2021, 5:43 pmcan you run
consul members
on all 3 nodes
do you see any ping latency on backend network from/to node 1 with the other nodes ?
Haha that would have helped me ! Fortunately I ended up figuring that out !! It was the issue. THanks for the help
DividedByPi
32 Posts
Quote from DividedByPi on January 19, 2021, 2:00 pmIt looks as though the reason why you have to run the manual bind command is because consul tries to use a loopback address for its own address, but now there is more than 1 IP on the network interface, so consul is trying to use the management IP rather than the backend.
running the bind command manually is great to get it up and running again, but its not a longterm solution as it does not stick on reboot.
It looks as though the reason why you have to run the manual bind command is because consul tries to use a loopback address for its own address, but now there is more than 1 IP on the network interface, so consul is trying to use the management IP rather than the backend.
running the bind command manually is great to get it up and running again, but its not a longterm solution as it does not stick on reboot.