Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Node interfaces do not match cluster interface settings

Pages: 1 2

Hello,

I've got 3 servers (r720) with two Intel  x520-DA2 10gb SPF with dual ports.  When configuring the first node, two ports were "bonded" for a total of two bonded ports.  The problem is when adding a second node to the cluster it comes up with the error "Node interfaces do not match cluster interface settings".  All servers have the same amount of interfaces.  Question is that if you look at Node one which has the bonded interfaces, are the bonded interfaces being counted which would make node 1 have more "interfaces" due to the bonded interfaces?  What is creating this issue?  Also, both servers while being loaded had an issue of not recognizing all the ports (eth7) and needed to be reloaded to make sure it came back online.

Node 1

cat cluster_info.json
{
"backend_1_base_ip": "10.20.4.0",
"backend_1_eth_name": "bond0",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "104",
"backend_2_base_ip": "10.20.5.0",
"backend_2_eth_name": "bond1",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "105",
"bonds": [
{
"interfaces": "eth4,eth6",
"is_jumbo_frames": false,
"mode": "balance-rr",
"name": "bond0",
"primary_interface": ""
},
{
"interfaces": "eth5,eth7",
"is_jumbo_frames": false,
"mode": "balance-rr",
"name": "bond1",
"primary_interface": ""
}
],
"eth_count": 9,
"iscsi_1_eth_name": "bond0",
"iscsi_2_eth_name": "bond1",
"jumbo_frames": [],
"management_eth_name": "bond0",
"management_nodes": [
{
"backend_1_ip": "10.20.4.55",
"backend_2_ip": "10.20.5.55",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.16.14.55",
"name": "PS-Node1"
}
],
"name": "vmware-storage",
"storage_engine": "bluestore"

 

Node 2

 

cat cluster_info.json
{
"backend_1_base_ip": "10.20.4.0",
"backend_1_eth_name": "bond0",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "104",
"backend_2_base_ip": "10.20.5.0",
"backend_2_eth_name": "bond1",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "105",
"bonds": [
{
"interfaces": "eth4,eth6",
"is_jumbo_frames": false,
"mode": "balance-rr",
"name": "bond0",
"primary_interface": ""
},
{
"interfaces": "eth5,eth7",
"is_jumbo_frames": false,
"mode": "balance-rr",
"name": "bond1",
"primary_interface": ""
}
],
"eth_count": 9,
"iscsi_1_eth_name": "bond0",
"iscsi_2_eth_name": "bond1",
"jumbo_frames": [],
"management_eth_name": "bond0",
"management_nodes": [
{
"backend_1_ip": "10.20.4.55",
"backend_2_ip": "10.20.5.55",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.16.14.55",
"name": "PS-Node1"
}
],
"name": "vmware-storage",
"storage_engine": "bluestore"

Have an update.  Seems there are issues with the intel x520 cards.  It looks like logs show an issue with the SFP when plugged in.

 

Update: For anyone that is having this issue, Intel says it the card doesn't like certain SFP's.  To change this to accept other brands, options need to be made for the driver and Grub.  In etc/modprobe.d, create a file called "ixgbe.conf" and put in "options ixgbe allow_unsupported_sfp=1".

/etc/default/grub:
GRUB_CMDLINE_LINUX=" ixgbe.allow_unsupported_sfp=1"

 

Then update GRUB with "update-grub" and reboot.

Try to determine if this is a hardware failure on specific node or a common issue across all nodes. If it is common, the dmesg logs may help you search on the problem. It could also be firmware update needed.

Got some other cards and no issues with the interfaces disappearing.  Still unable to  get the nodes talking.  Just using two bonded networks (bond0 for iscsi1 and management) (bond1 for iscsi2).  All nics are lit up.  Still getting the interface issue.  Both nodes were reloaded. Logs are from the Petasan.log

Seems node 2 is unable to find the node_info.json. Node 1 has it.  Each has the same info for "cluster_info.json"

Files were updated so its talking to each other.

Node 1

root@PS-Node1:/opt/petasan/log# cat PetaSAN.log
13/08/2019 09:38:29 INFO Start settings IPs
13/08/2019 09:45:54 ERROR Config file error. The petaSAN os maybe just installed.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 53, in get_node_status
node_name = config.get_node_info().name
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/cluster/configuration.py", line 99, in get_node_info
with open(config.get_node_info_file_path(), 'r') as f:
IOError: [Errno 2] No such file or directory: '/opt/petasan/config/node_info.json'
13/08/2019 09:47:27 ERROR Config file error. The petaSAN os maybe just installed.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 53, in get_node_status
node_name = config.get_node_info().name
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/cluster/configuration.py", line 99, in get_node_info
with open(config.get_node_info_file_path(), 'r') as f:
IOError: [Errno 2] No such file or directory: '/opt/petasan/config/node_info.json'
13/08/2019 09:47:45 INFO Created keys for cluster vmware-storage
13/08/2019 09:47:45 INFO Created cluster file and set cluster name to vmware-storage
13/08/2019 09:47:45 INFO password set successfully.
13/08/2019 09:49:18 INFO Updated cluster interface successfully.
13/08/2019 09:50:30 INFO 104
13/08/2019 09:50:30 INFO 105
13/08/2019 09:50:30 INFO Updated cluster network successfully.
13/08/2019 09:50:44 INFO Current tuning configurations saved.
13/08/2019 09:51:12 ERROR Config file error. The petaSAN os maybe just installed.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 53, in get_node_status
node_name = config.get_node_info().name
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/cluster/configuration.py", line 99, in get_node_info
with open(config.get_node_info_file_path(), 'r') as f:
IOError: [Errno 2] No such file or directory: '/opt/petasan/config/node_info.json'
13/08/2019 09:51:28 INFO Created keys for cluster vmware-storage
13/08/2019 09:51:28 INFO Created cluster file and set cluster name to vmware-storage
13/08/2019 09:51:28 INFO password set successfully.
13/08/2019 09:52:04 INFO Updated cluster interface successfully.
13/08/2019 09:54:05 INFO 104
13/08/2019 09:54:05 INFO 105
13/08/2019 09:54:05 INFO Updated cluster network successfully.
13/08/2019 09:54:15 INFO Current tuning configurations saved.
13/08/2019 09:54:36 INFO Set node info completed successfully.
13/08/2019 09:54:36 ERROR getting cluster uuid from configuration failed
13/08/2019 09:54:38 ERROR getting cluster uuid from configuration failed
13/08/2019 09:55:02 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
13/08/2019 09:55:02 INFO Set node role completed successfully.
13/08/2019 09:55:04 INFO Node 1 added, cluster requires 2 other nodes to build.
13/08/2019 09:55:04 INFO Run post deploy script.
13/08/2019 13:27:28 ERROR Cluster is not completed, PetasSAN will check node join status.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 61, in get_node_status
raise Exception("Cluster is not completed, PetasSAN will check node join status.")
Exception: Cluster is not completed, PetasSAN will check node join status.
13/08/2019 13:57:15 INFO Start settings IPs
13/08/2019 15:27:37 ERROR Cluster is not completed, PetasSAN will check node join status.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 61, in get_node_status
raise Exception("Cluster is not completed, PetasSAN will check node join status.")
Exception: Cluster is not completed, PetasSAN will check node join status.

Node 2

cat PetaSAN.log
13/08/2019 16:02:21 INFO Start settings IPs
13/08/2019 16:05:47 ERROR Config file error. The petaSAN os maybe just installed.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 53, in get_node_status
node_name = config.get_node_info().name
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/cluster/configuration.py", line 99, in get_node_info
with open(config.get_node_info_file_path(), 'r') as f:
IOError: [Errno 2] No such file or directory: '/opt/petasan/config/node_info.json'
13/08/2019 16:06:23 INFO Starting node join
13/08/2019 16:06:23 INFO Successfully copied public keys.
13/08/2019 16:06:23 INFO Successfully copied private keys.
13/08/2019 16:06:23 INFO password set successfully.
13/08/2019 16:06:24 INFO Start copying cluster info file.
13/08/2019 16:06:24 INFO Successfully copied cluster info file.
13/08/2019 16:06:24 INFO Joined cluster vmware-storage

Thanks for the info on the intel card fix.

Did you actually keep using the cards or did you replace with other nics ? this is not clear to me.

What version of PetaSAN are you using ? if 2.3.1, from the node console blue menu, can you check the interface names are correct and edit if needed ? If using earlier version this menu had a bug in case of bonding..it relied on current mac address instead of original hardware mac address, the first could be altered when bonded. If you use an earlier version you need to setup the /etc/udev/rules.d/70-persistent-net.rules manually. It is quite possible after replacing the nics, that unless the 70-persistent-net.rules is setup correctly, the interface names could have changed.

Is this a cluster you are just setting up ? if so, i would recommend you re-install, it is probably much quicker. You can also re-deploy the node from page 1 if it had not completed deployment before.

The Nics were replaced by some Hotlava X710 4 port 10-Gb cards.  Was reloaded with the new 2.3.1 version.  Question, could the file be created?

Created a file called "node_info.json" and the error went away.  Also created  host in the /etc/host file for all nodes.  Now the new error

Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 61, in get_node_status
raise Exception("Cluster is not completed, PetasSAN will check node join status.")
Exception: Cluster is not completed, PetasSAN will check node join status.

 Alert!

Node interfaces do not match cluster interface settings.

Hi,

It is not too clear to me, but i assume your are installing the cluster from scratch using 2.3.1, not doing any manual edit of json config files and just using the web Deployment Wizard app to try to build the cluster,  but are getting a the mentioned error when joining node 2 or 3.

if the above is correct then make sure your nodes have the same number of nics and that you setup the management interface during the installer to use the same interface ( ie ethX ) on all nodes you try to join. Just to double check PetaSAN is able to detect your interfaces correctly on all nodes run the sctip:

/opt/petasan/scripts/detect-interfaces.sh

Ok, reloaded everything from scratch but only used one interface, no bonding.  Using eth4 for management/iscsi1 and eth11 for iscsi2.  Fresh log file same results.

Node1

cat PetaSAN.log
14/09/2019 10:06:24 INFO Start settings IPs
14/08/2019 10:09:13 ERROR Config file error. The petaSAN os maybe just installed.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 53, in get_node_status
node_name = config.get_node_info().name
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/cluster/configuration.py", line 99, in get_node_info
with open(config.get_node_info_file_path(), 'r') as f:
IOError: [Errno 2] No such file or directory: '/opt/petasan/config/node_info.json'
14/08/2019 10:09:52 INFO Created keys for cluster vmware
14/08/2019 10:09:52 INFO Created cluster file and set cluster name to vmware
14/08/2019 10:09:52 INFO password set successfully.
14/08/2019 10:11:33 INFO 104
14/08/2019 10:11:33 INFO 105
14/08/2019 10:11:33 INFO Updated cluster network successfully.
14/08/2019 10:11:50 INFO Current tuning configurations saved.
14/08/2019 10:12:15 INFO Set node info completed successfully.
14/08/2019 10:12:15 ERROR getting cluster uuid from configuration failed
14/08/2019 10:12:16 ERROR getting cluster uuid from configuration failed
14/08/2019 10:12:36 ERROR 400 Bad Request: The browser (or proxy) sent a request that this server could not understand.
14/08/2019 10:12:36 INFO Set node role completed successfully.
14/08/2019 10:12:39 INFO Node 1 added, cluster requires 2 other nodes to build.
14/08/2019 10:12:39 INFO Run post deploy script.

cat cluster_info.json
{
"backend_1_base_ip": "10.20.4.0",
"backend_1_eth_name": "eth4",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "104",
"backend_2_base_ip": "10.20.5.0",
"backend_2_eth_name": "eth11",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "105",
"bonds": [],
"eth_count": 13,
"iscsi_1_eth_name": "eth4",
"iscsi_2_eth_name": "eth11",
"jumbo_frames": [],
"management_eth_name": "eth4",
"management_nodes": [
{
"backend_1_ip": "10.20.4.55",
"backend_2_ip": "10.20.5.55",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.16.14.55",
"name": "PS-Node1"
}
],
"name": "vmware",
"storage_engine": "bluestore"

./detect-interfaces.sh
device=eth0,mac=90:b1:1c:09:eb:a3,pci=01:00.0,model=Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
device=eth1,mac=90:b1:1c:09:eb:a4,pci=01:00.1,model=Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
device=eth10,mac=00:12:c0:02:d6:49,pci=42:00.2,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth11,mac=00:12:c0:02:d6:4a,pci=42:00.3,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth2,mac=90:b1:1c:09:eb:a5,pci=02:00.0,model=Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
device=eth3,mac=90:b1:1c:09:eb:a6,pci=02:00.1,model=Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
device=eth4,mac=00:12:c0:02:d6:33,pci=05:00.0,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth5,mac=00:12:c0:02:d6:34,pci=05:00.1,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth6,mac=00:12:c0:02:d6:35,pci=05:00.2,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth7,mac=00:12:c0:02:d6:36,pci=05:00.3,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth8,mac=00:12:c0:02:d6:47,pci=42:00.0,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth9,mac=00:12:c0:02:d6:48,pci=42:00.1,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)

netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 172.16.14.1 0.0.0.0 UG 0 0 0 eth4.103
10.20.4.0 0.0.0.0 255.255.255.0 U 0 0 0 eth4.104
10.20.5.0 0.0.0.0 255.255.255.0 U 0 0 0 eth11.105
172.16.14.0 0.0.0.0 255.255.255.192 U 0 0 0 eth4.103

Node2

cat PetaSAN.log
14/08/2019 10:25:54 INFO Start settings IPs
14/08/2019 10:27:02 ERROR Config file error. The petaSAN os maybe just installed.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/cluster/deploy.py", line 53, in get_node_status
node_name = config.get_node_info().name
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/cluster/configuration.py", line 99, in get_node_info
with open(config.get_node_info_file_path(), 'r') as f:
IOError: [Errno 2] No such file or directory: '/opt/petasan/config/node_info.json'
14/08/2019 10:27:33 INFO Starting node join
14/08/2019 10:27:33 INFO Successfully copied public keys.
14/08/2019 10:27:34 INFO Successfully copied private keys.
14/08/2019 10:27:34 INFO password set successfully.
14/08/2019 10:27:34 INFO Start copying cluster info file.
14/08/2019 10:27:34 INFO Successfully copied cluster info file.
14/08/2019 10:27:34 INFO Joined cluster vmware

cat cluster_info.json
{
"backend_1_base_ip": "10.20.4.0",
"backend_1_eth_name": "eth4",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "104",
"backend_2_base_ip": "10.20.5.0",
"backend_2_eth_name": "eth11",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "105",
"bonds": [],
"eth_count": 13,
"iscsi_1_eth_name": "eth4",
"iscsi_2_eth_name": "eth11",
"jumbo_frames": [],
"management_eth_name": "eth4",
"management_nodes": [
{
"backend_1_ip": "10.20.4.55",
"backend_2_ip": "10.20.5.55",
"is_backup": false,
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.16.14.55",
"name": "PS-Node1"
}
],
"name": "vmware",
"storage_engine": "bluestore"

/opt/petasan/scripts/detect-interfaces.sh
device=eth0,mac=f0:1f:af:d0:eb:f6,pci=01:00.0,model=Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
device=eth1,mac=f0:1f:af:d0:eb:f7,pci=01:00.1,model=Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
device=eth10,mac=00:12:c0:02:d5:c5,pci=42:00.2,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth11,mac=00:12:c0:02:d5:c6,pci=42:00.3,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth2,mac=f0:1f:af:d0:eb:f8,pci=02:00.0,model=Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
device=eth3,mac=f0:1f:af:d0:eb:f9,pci=02:00.1,model=Broadcom Inc. and subsidiaries NetXtreme BCM5720 Gigabit Ethernet PCIe
device=eth4,mac=00:12:c0:02:d6:27,pci=05:00.0,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth5,mac=00:12:c0:02:d6:28,pci=05:00.1,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth6,mac=00:12:c0:02:d6:29,pci=05:00.2,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth7,mac=00:12:c0:02:d6:2a,pci=05:00.3,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth8,mac=00:12:c0:02:d5:c3,pci=42:00.0,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)
device=eth9,mac=00:12:c0:02:d5:c4,pci=42:00.1,model=Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 02)

netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 172.16.14.1 0.0.0.0 UG 0 0 0 eth4.103
172.16.14.0 0.0.0.0 255.255.255.192 U 0 0 0 eth4.103

i understand you are getting the error

 Alert!  Node interfaces do not match cluster interface settings.

during  node 2 deployment ? is this correct ?

 

Also just for my own info, the earlier  error you had ( missing node_info.json ) can you give me more info : were you installing using ui , oe editing json yourself ? it is not clear for me why you had to mess with the json files, is this a result of the interfaces you had to replace ??

Pages: 1 2