Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Network issue after online upgrade

Hello.

I have a network issue after online upgrade from 2.3.1 to 2.5.1.

I was performed following steps for each node:

wget http://archive.petasan.org/repo/2.3.1-enable-updates.tar.gz
tar xzf 2.3.1-enable-updates.tar.gz
cd 2.3.1-enable-updates
./enable-updates.sh

And then following steps for each node:

apt update
export DEBIAN_FRONTEND=noninteractive
apt upgrade
apt install petasan

Upgrade was finished successfully. And cluster was in HEALTH_OK status.

Then i tried to to reboot one of nodes.

As a result, i have inaccessible node after reboot. Can't ssh, can't ping it. Management and two backend networks inaccessible from other nodes.

I can't even login to bash console of this node to troubleshoot problem arter reboot, because there is no the same option in blue screen menu.

I tried to boot node in recovery mode to check network configuration in files "cluster_info.json" and "node_info.json", and it looks correct (ip addresses, bond, jumbo and others).

Please tell me, what steps i can take in my situation? I afraid of reboot other nodes.

Node Settings

No ping

boot the node in normal mode, you should be able to log directly on the node, via ctrl+alt+f1 or f2

look at what ips are set via
ip addr

look at any errors in
dmesg
cat /opt/petasan/log/PetaSAN.log

Does the cluster_info.json on the node look ok ?
can you post the cluster_info.json (you can scp from any running node)

if you have a simple setup, you can try to bring the ips yourself like in
ip addr add 10.0.1.11/24 dev eth0

Thank you for response!

 

dmesg does not contain any error entry.

/opt/petasan/log/PetaSAN.log contain error entries like:

 31/03/2020 20:49:24 INFO     Start settings IPs

31/03/2020 20:49:28 ERROR    Error setting bond jumbo frames.
31/03/2020 20:49:34 WARNING  Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69069c390>: Failed to establis
31/03/2020 20:51:41 ERROR    HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config/Files?recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69063e6d
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config/Files?recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69063e6d0>: Failed to
31/03/2020 20:51:41 WARNING  Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f875462b050>: Failed to establis

The most interesting is: Error setting bond jumbo frames.

 

"ip addr" command result:

root@sds-osd-302-04:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether a4:bf:01:09:5d:08 brd ff:ff:ff:ff:ff:ff
inet 10.1.9.126/23 brd 10.1.9.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a6bf:1ff:fe09:5d08/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether a4:bf:01:09:5d:09 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0_pr state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 90:e2:ba:c8:cc:ed brd ff:ff:ff:ff:ff:ff
6: eth4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0_pr state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
7: eth5: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 90:e2:ba:c8:cc:f1 brd ff:ff:ff:ff:ff:ff
8: bond0_pr: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
inet6 fe80::92e2:baff:fec8:ccec/64 scope link
valid_lft forever preferred_lft forever

There is no "bond1_cl" interface. There is no ip address for "bond0_pr" interface.

ip address for eth0 interface appears after perform following command: ip addr add 10.1.9.126/23 dev eth0 && ifdown eth0 && ifup eth0

 

node_info.json:

{
"backend_1_ip": "192.168.98.5",
"backend_2_ip": "192.168.32.5",
"is_backup": false,
"is_iscsi": false,
"is_management": false,
"is_storage": true,
"management_ip": "10.1.9.126",
"name": "sds-osd-302-04"
}

 

cluster_info.json:

{
"backend_1_base_ip": "192.168.98.2",
"backend_1_eth_name": "bond0_pr",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.32.2",
"backend_2_eth_name": "bond1_cl",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth2,eth4",
"is_jumbo_frames": true,
"mode": "802.3ad",
"name": "bond0_pr",
"primary_interface": ""
},
{
"interfaces": "eth3,eth5",
"is_jumbo_frames": true,
"mode": "802.3ad",
"name": "bond1_cl",
"primary_interface": ""
}
],
"eth_count": 6,
"iscsi_1_eth_name": "bond0_pr",
"iscsi_2_eth_name": "bond0_pr",
"jumbo_frames": [
"eth2",
"eth3",
"eth4",
"eth5"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.98.2",
"backend_2_ip": "192.168.32.2",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.120",
"name": "sds-osd-302-01"
},
{
"backend_1_ip": "192.168.98.3",
"backend_2_ip": "192.168.32.3",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.122",
"name": "sds-osd-302-02"
},
{
"backend_1_ip": "192.168.98.4",
"backend_2_ip": "192.168.32.4",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.124",
"name": "sds-osd-302-03"
}
],
"name": "ceph2-cod"
}

 

Any ideas?

Thanks for sending the info. i can confirm this is a bug, we are testing a fix and will post it, should be today.

First apply it on node with issue then restart.

https://drive.google.com/open?id=1MAVIzFLOAovcLb_JH9ooXAxwd-yjavoy

patch -p1 -d / < upgrade_backend2_bond_mtu.patch

if ok apply it to othet nodes without restarting

Hello!

The problem was resolved.

Thanks a lot!

Great ! thanks for the feedback..