Network issue after online upgrade
atselitan
21 Posts
March 30, 2020, 6:34 pmQuote from atselitan on March 30, 2020, 6:34 pmHello.
I have a network issue after online upgrade from 2.3.1 to 2.5.1.
I was performed following steps for each node:
wget http://archive.petasan.org/repo/2.3.1-enable-updates.tar.gz
tar xzf 2.3.1-enable-updates.tar.gz
cd 2.3.1-enable-updates
./enable-updates.sh
And then following steps for each node:
apt update
export DEBIAN_FRONTEND=noninteractive
apt upgrade
apt install petasan
Upgrade was finished successfully. And cluster was in HEALTH_OK status.
Then i tried to to reboot one of nodes.
As a result, i have inaccessible node after reboot. Can't ssh, can't ping it. Management and two backend networks inaccessible from other nodes.
I can't even login to bash console of this node to troubleshoot problem arter reboot, because there is no the same option in blue screen menu.
I tried to boot node in recovery mode to check network configuration in files "cluster_info.json" and "node_info.json", and it looks correct (ip addresses, bond, jumbo and others).
Please tell me, what steps i can take in my situation? I afraid of reboot other nodes.
Hello.
I have a network issue after online upgrade from 2.3.1 to 2.5.1.
I was performed following steps for each node:
wget http://archive.petasan.org/repo/2.3.1-enable-updates.tar.gz
tar xzf 2.3.1-enable-updates.tar.gz
cd 2.3.1-enable-updates
./enable-updates.sh
And then following steps for each node:
apt update
export DEBIAN_FRONTEND=noninteractive
apt upgrade
apt install petasan
Upgrade was finished successfully. And cluster was in HEALTH_OK status.
Then i tried to to reboot one of nodes.
As a result, i have inaccessible node after reboot. Can't ssh, can't ping it. Management and two backend networks inaccessible from other nodes.
I can't even login to bash console of this node to troubleshoot problem arter reboot, because there is no the same option in blue screen menu.
I tried to boot node in recovery mode to check network configuration in files "cluster_info.json" and "node_info.json", and it looks correct (ip addresses, bond, jumbo and others).
Please tell me, what steps i can take in my situation? I afraid of reboot other nodes.
admin
2,930 Posts
March 30, 2020, 7:12 pmQuote from admin on March 30, 2020, 7:12 pmboot the node in normal mode, you should be able to log directly on the node, via ctrl+alt+f1 or f2
look at what ips are set via
ip addr
look at any errors in
dmesg
cat /opt/petasan/log/PetaSAN.log
Does the cluster_info.json on the node look ok ?
can you post the cluster_info.json (you can scp from any running node)
if you have a simple setup, you can try to bring the ips yourself like in
ip addr add 10.0.1.11/24 dev eth0
boot the node in normal mode, you should be able to log directly on the node, via ctrl+alt+f1 or f2
look at what ips are set via
ip addr
look at any errors in
dmesg
cat /opt/petasan/log/PetaSAN.log
Does the cluster_info.json on the node look ok ?
can you post the cluster_info.json (you can scp from any running node)
if you have a simple setup, you can try to bring the ips yourself like in
ip addr add 10.0.1.11/24 dev eth0
atselitan
21 Posts
March 31, 2020, 4:25 pmQuote from atselitan on March 31, 2020, 4:25 pmThank you for response!
dmesg does not contain any error entry.
/opt/petasan/log/PetaSAN.log contain error entries like:
31/03/2020 20:49:24 INFO Start settings IPs
31/03/2020 20:49:28 ERROR Error setting bond jumbo frames.
31/03/2020 20:49:34 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69069c390>: Failed to establis
31/03/2020 20:51:41 ERROR HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config/Files?recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69063e6d
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config/Files?recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69063e6d0>: Failed to
31/03/2020 20:51:41 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f875462b050>: Failed to establis
The most interesting is: Error setting bond jumbo frames.
"ip addr" command result:
root@sds-osd-302-04:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether a4:bf:01:09:5d:08 brd ff:ff:ff:ff:ff:ff
inet 10.1.9.126/23 brd 10.1.9.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a6bf:1ff:fe09:5d08/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether a4:bf:01:09:5d:09 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0_pr state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 90:e2:ba:c8:cc:ed brd ff:ff:ff:ff:ff:ff
6: eth4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0_pr state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
7: eth5: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 90:e2:ba:c8:cc:f1 brd ff:ff:ff:ff:ff:ff
8: bond0_pr: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
inet6 fe80::92e2:baff:fec8:ccec/64 scope link
valid_lft forever preferred_lft forever
There is no "bond1_cl" interface. There is no ip address for "bond0_pr" interface.
ip address for eth0 interface appears after perform following command: ip addr add 10.1.9.126/23 dev eth0 && ifdown eth0 && ifup eth0
node_info.json:
{
"backend_1_ip": "192.168.98.5",
"backend_2_ip": "192.168.32.5",
"is_backup": false,
"is_iscsi": false,
"is_management": false,
"is_storage": true,
"management_ip": "10.1.9.126",
"name": "sds-osd-302-04"
}
cluster_info.json:
{
"backend_1_base_ip": "192.168.98.2",
"backend_1_eth_name": "bond0_pr",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.32.2",
"backend_2_eth_name": "bond1_cl",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth2,eth4",
"is_jumbo_frames": true,
"mode": "802.3ad",
"name": "bond0_pr",
"primary_interface": ""
},
{
"interfaces": "eth3,eth5",
"is_jumbo_frames": true,
"mode": "802.3ad",
"name": "bond1_cl",
"primary_interface": ""
}
],
"eth_count": 6,
"iscsi_1_eth_name": "bond0_pr",
"iscsi_2_eth_name": "bond0_pr",
"jumbo_frames": [
"eth2",
"eth3",
"eth4",
"eth5"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.98.2",
"backend_2_ip": "192.168.32.2",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.120",
"name": "sds-osd-302-01"
},
{
"backend_1_ip": "192.168.98.3",
"backend_2_ip": "192.168.32.3",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.122",
"name": "sds-osd-302-02"
},
{
"backend_1_ip": "192.168.98.4",
"backend_2_ip": "192.168.32.4",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.124",
"name": "sds-osd-302-03"
}
],
"name": "ceph2-cod"
}
Any ideas?
Thank you for response!
dmesg does not contain any error entry.
/opt/petasan/log/PetaSAN.log contain error entries like:
31/03/2020 20:49:24 INFO Start settings IPs
31/03/2020 20:49:28 ERROR Error setting bond jumbo frames.
31/03/2020 20:49:34 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69069c390>: Failed to establis
31/03/2020 20:51:41 ERROR HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config/Files?recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69063e6d
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config/Files?recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69063e6d0>: Failed to
31/03/2020 20:51:41 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f875462b050>: Failed to establis
The most interesting is: Error setting bond jumbo frames.
"ip addr" command result:
root@sds-osd-302-04:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether a4:bf:01:09:5d:08 brd ff:ff:ff:ff:ff:ff
inet 10.1.9.126/23 brd 10.1.9.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a6bf:1ff:fe09:5d08/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether a4:bf:01:09:5d:09 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0_pr state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 90:e2:ba:c8:cc:ed brd ff:ff:ff:ff:ff:ff
6: eth4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0_pr state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
7: eth5: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 90:e2:ba:c8:cc:f1 brd ff:ff:ff:ff:ff:ff
8: bond0_pr: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
inet6 fe80::92e2:baff:fec8:ccec/64 scope link
valid_lft forever preferred_lft forever
There is no "bond1_cl" interface. There is no ip address for "bond0_pr" interface.
ip address for eth0 interface appears after perform following command: ip addr add 10.1.9.126/23 dev eth0 && ifdown eth0 && ifup eth0
node_info.json:
{
"backend_1_ip": "192.168.98.5",
"backend_2_ip": "192.168.32.5",
"is_backup": false,
"is_iscsi": false,
"is_management": false,
"is_storage": true,
"management_ip": "10.1.9.126",
"name": "sds-osd-302-04"
}
cluster_info.json:
{
"backend_1_base_ip": "192.168.98.2",
"backend_1_eth_name": "bond0_pr",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.32.2",
"backend_2_eth_name": "bond1_cl",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth2,eth4",
"is_jumbo_frames": true,
"mode": "802.3ad",
"name": "bond0_pr",
"primary_interface": ""
},
{
"interfaces": "eth3,eth5",
"is_jumbo_frames": true,
"mode": "802.3ad",
"name": "bond1_cl",
"primary_interface": ""
}
],
"eth_count": 6,
"iscsi_1_eth_name": "bond0_pr",
"iscsi_2_eth_name": "bond0_pr",
"jumbo_frames": [
"eth2",
"eth3",
"eth4",
"eth5"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.98.2",
"backend_2_ip": "192.168.32.2",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.120",
"name": "sds-osd-302-01"
},
{
"backend_1_ip": "192.168.98.3",
"backend_2_ip": "192.168.32.3",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.122",
"name": "sds-osd-302-02"
},
{
"backend_1_ip": "192.168.98.4",
"backend_2_ip": "192.168.32.4",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.124",
"name": "sds-osd-302-03"
}
],
"name": "ceph2-cod"
}
Any ideas?
Last edited on March 31, 2020, 4:26 pm by atselitan · #3
admin
2,930 Posts
April 1, 2020, 2:40 pmQuote from admin on April 1, 2020, 2:40 pmThanks for sending the info. i can confirm this is a bug, we are testing a fix and will post it, should be today.
Thanks for sending the info. i can confirm this is a bug, we are testing a fix and will post it, should be today.
admin
2,930 Posts
April 1, 2020, 6:26 pmQuote from admin on April 1, 2020, 6:26 pmFirst apply it on node with issue then restart.
https://drive.google.com/open?id=1MAVIzFLOAovcLb_JH9ooXAxwd-yjavoy
patch -p1 -d / < upgrade_backend2_bond_mtu.patch
if ok apply it to othet nodes without restarting
First apply it on node with issue then restart.
https://drive.google.com/open?id=1MAVIzFLOAovcLb_JH9ooXAxwd-yjavoy
patch -p1 -d / < upgrade_backend2_bond_mtu.patch
if ok apply it to othet nodes without restarting
atselitan
21 Posts
April 2, 2020, 4:30 amQuote from atselitan on April 2, 2020, 4:30 amHello!
The problem was resolved.
Thanks a lot!
Hello!
The problem was resolved.
Thanks a lot!
admin
2,930 Posts
April 2, 2020, 10:47 amQuote from admin on April 2, 2020, 10:47 amGreat ! thanks for the feedback..
Great ! thanks for the feedback..
Network issue after online upgrade
atselitan
21 Posts
Quote from atselitan on March 30, 2020, 6:34 pmHello.
I have a network issue after online upgrade from 2.3.1 to 2.5.1.
I was performed following steps for each node:
wget http://archive.petasan.org/repo/2.3.1-enable-updates.tar.gz
tar xzf 2.3.1-enable-updates.tar.gz
cd 2.3.1-enable-updates
./enable-updates.shAnd then following steps for each node:
apt update
export DEBIAN_FRONTEND=noninteractive
apt upgrade
apt install petasanUpgrade was finished successfully. And cluster was in HEALTH_OK status.
Then i tried to to reboot one of nodes.
As a result, i have inaccessible node after reboot. Can't ssh, can't ping it. Management and two backend networks inaccessible from other nodes.
I can't even login to bash console of this node to troubleshoot problem arter reboot, because there is no the same option in blue screen menu.
I tried to boot node in recovery mode to check network configuration in files "cluster_info.json" and "node_info.json", and it looks correct (ip addresses, bond, jumbo and others).
Please tell me, what steps i can take in my situation? I afraid of reboot other nodes.
Hello.
I have a network issue after online upgrade from 2.3.1 to 2.5.1.
I was performed following steps for each node:
wget http://archive.petasan.org/repo/2.3.1-enable-updates.tar.gz
tar xzf 2.3.1-enable-updates.tar.gz
cd 2.3.1-enable-updates
./enable-updates.sh
And then following steps for each node:
apt update
export DEBIAN_FRONTEND=noninteractive
apt upgrade
apt install petasan
Upgrade was finished successfully. And cluster was in HEALTH_OK status.
Then i tried to to reboot one of nodes.
As a result, i have inaccessible node after reboot. Can't ssh, can't ping it. Management and two backend networks inaccessible from other nodes.
I can't even login to bash console of this node to troubleshoot problem arter reboot, because there is no the same option in blue screen menu.
I tried to boot node in recovery mode to check network configuration in files "cluster_info.json" and "node_info.json", and it looks correct (ip addresses, bond, jumbo and others).
Please tell me, what steps i can take in my situation? I afraid of reboot other nodes.
admin
2,930 Posts
Quote from admin on March 30, 2020, 7:12 pmboot the node in normal mode, you should be able to log directly on the node, via ctrl+alt+f1 or f2
look at what ips are set via
ip addrlook at any errors in
dmesg
cat /opt/petasan/log/PetaSAN.logDoes the cluster_info.json on the node look ok ?
can you post the cluster_info.json (you can scp from any running node)if you have a simple setup, you can try to bring the ips yourself like in
ip addr add 10.0.1.11/24 dev eth0
boot the node in normal mode, you should be able to log directly on the node, via ctrl+alt+f1 or f2
look at what ips are set via
ip addr
look at any errors in
dmesg
cat /opt/petasan/log/PetaSAN.log
Does the cluster_info.json on the node look ok ?
can you post the cluster_info.json (you can scp from any running node)
if you have a simple setup, you can try to bring the ips yourself like in
ip addr add 10.0.1.11/24 dev eth0
atselitan
21 Posts
Quote from atselitan on March 31, 2020, 4:25 pmThank you for response!
dmesg does not contain any error entry.
/opt/petasan/log/PetaSAN.log contain error entries like:
31/03/2020 20:49:24 INFO Start settings IPs
31/03/2020 20:49:28 ERROR Error setting bond jumbo frames.
31/03/2020 20:49:34 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69069c390>: Failed to establis
31/03/2020 20:51:41 ERROR HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config/Files?recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69063e6d
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config/Files?recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69063e6d0>: Failed to
31/03/2020 20:51:41 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f875462b050>: Failed to establisThe most interesting is: Error setting bond jumbo frames.
"ip addr" command result:
root@sds-osd-302-04:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether a4:bf:01:09:5d:08 brd ff:ff:ff:ff:ff:ff
inet 10.1.9.126/23 brd 10.1.9.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a6bf:1ff:fe09:5d08/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether a4:bf:01:09:5d:09 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0_pr state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 90:e2:ba:c8:cc:ed brd ff:ff:ff:ff:ff:ff
6: eth4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0_pr state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
7: eth5: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 90:e2:ba:c8:cc:f1 brd ff:ff:ff:ff:ff:ff
8: bond0_pr: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
inet6 fe80::92e2:baff:fec8:ccec/64 scope link
valid_lft forever preferred_lft foreverThere is no "bond1_cl" interface. There is no ip address for "bond0_pr" interface.
ip address for eth0 interface appears after perform following command: ip addr add 10.1.9.126/23 dev eth0 && ifdown eth0 && ifup eth0
node_info.json:
{
"backend_1_ip": "192.168.98.5",
"backend_2_ip": "192.168.32.5",
"is_backup": false,
"is_iscsi": false,
"is_management": false,
"is_storage": true,
"management_ip": "10.1.9.126",
"name": "sds-osd-302-04"
}
cluster_info.json:
{
"backend_1_base_ip": "192.168.98.2",
"backend_1_eth_name": "bond0_pr",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.32.2",
"backend_2_eth_name": "bond1_cl",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth2,eth4",
"is_jumbo_frames": true,
"mode": "802.3ad",
"name": "bond0_pr",
"primary_interface": ""
},
{
"interfaces": "eth3,eth5",
"is_jumbo_frames": true,
"mode": "802.3ad",
"name": "bond1_cl",
"primary_interface": ""
}
],
"eth_count": 6,
"iscsi_1_eth_name": "bond0_pr",
"iscsi_2_eth_name": "bond0_pr",
"jumbo_frames": [
"eth2",
"eth3",
"eth4",
"eth5"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.98.2",
"backend_2_ip": "192.168.32.2",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.120",
"name": "sds-osd-302-01"
},
{
"backend_1_ip": "192.168.98.3",
"backend_2_ip": "192.168.32.3",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.122",
"name": "sds-osd-302-02"
},
{
"backend_1_ip": "192.168.98.4",
"backend_2_ip": "192.168.32.4",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.124",
"name": "sds-osd-302-03"
}
],
"name": "ceph2-cod"
}
Any ideas?
Thank you for response!
dmesg does not contain any error entry.
/opt/petasan/log/PetaSAN.log contain error entries like:
31/03/2020 20:49:24 INFO Start settings IPs
31/03/2020 20:49:28 ERROR Error setting bond jumbo frames.
31/03/2020 20:49:34 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69069c390>: Failed to establis
31/03/2020 20:51:41 ERROR HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config/Files?recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69063e6d
raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8500): Max retries exceeded with url: /v1/kv/PetaSAN/Config/Files?recurse=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ff69063e6d0>: Failed to
31/03/2020 20:51:41 WARNING Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f875462b050>: Failed to establis
The most interesting is: Error setting bond jumbo frames.
"ip addr" command result:
root@sds-osd-302-04:~# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether a4:bf:01:09:5d:08 brd ff:ff:ff:ff:ff:ff
inet 10.1.9.126/23 brd 10.1.9.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::a6bf:1ff:fe09:5d08/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether a4:bf:01:09:5d:09 brd ff:ff:ff:ff:ff:ff
4: eth2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0_pr state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
5: eth3: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 90:e2:ba:c8:cc:ed brd ff:ff:ff:ff:ff:ff
6: eth4: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master bond0_pr state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
7: eth5: <BROADCAST,MULTICAST> mtu 9000 qdisc noop state DOWN group default qlen 1000
link/ether 90:e2:ba:c8:cc:f1 brd ff:ff:ff:ff:ff:ff
8: bond0_pr: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
link/ether 90:e2:ba:c8:cc:ec brd ff:ff:ff:ff:ff:ff
inet6 fe80::92e2:baff:fec8:ccec/64 scope link
valid_lft forever preferred_lft forever
There is no "bond1_cl" interface. There is no ip address for "bond0_pr" interface.
ip address for eth0 interface appears after perform following command: ip addr add 10.1.9.126/23 dev eth0 && ifdown eth0 && ifup eth0
node_info.json:
{
"backend_1_ip": "192.168.98.5",
"backend_2_ip": "192.168.32.5",
"is_backup": false,
"is_iscsi": false,
"is_management": false,
"is_storage": true,
"management_ip": "10.1.9.126",
"name": "sds-osd-302-04"
}
cluster_info.json:
{
"backend_1_base_ip": "192.168.98.2",
"backend_1_eth_name": "bond0_pr",
"backend_1_mask": "255.255.255.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "192.168.32.2",
"backend_2_eth_name": "bond1_cl",
"backend_2_mask": "255.255.255.0",
"backend_2_vlan_id": "",
"bonds": [
{
"interfaces": "eth2,eth4",
"is_jumbo_frames": true,
"mode": "802.3ad",
"name": "bond0_pr",
"primary_interface": ""
},
{
"interfaces": "eth3,eth5",
"is_jumbo_frames": true,
"mode": "802.3ad",
"name": "bond1_cl",
"primary_interface": ""
}
],
"eth_count": 6,
"iscsi_1_eth_name": "bond0_pr",
"iscsi_2_eth_name": "bond0_pr",
"jumbo_frames": [
"eth2",
"eth3",
"eth4",
"eth5"
],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "192.168.98.2",
"backend_2_ip": "192.168.32.2",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.120",
"name": "sds-osd-302-01"
},
{
"backend_1_ip": "192.168.98.3",
"backend_2_ip": "192.168.32.3",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.122",
"name": "sds-osd-302-02"
},
{
"backend_1_ip": "192.168.98.4",
"backend_2_ip": "192.168.32.4",
"is_backup": false,
"is_iscsi": false,
"is_management": true,
"is_storage": true,
"management_ip": "10.1.9.124",
"name": "sds-osd-302-03"
}
],
"name": "ceph2-cod"
}
Any ideas?
admin
2,930 Posts
Quote from admin on April 1, 2020, 2:40 pmThanks for sending the info. i can confirm this is a bug, we are testing a fix and will post it, should be today.
Thanks for sending the info. i can confirm this is a bug, we are testing a fix and will post it, should be today.
admin
2,930 Posts
Quote from admin on April 1, 2020, 6:26 pmFirst apply it on node with issue then restart.
https://drive.google.com/open?id=1MAVIzFLOAovcLb_JH9ooXAxwd-yjavoy
patch -p1 -d / < upgrade_backend2_bond_mtu.patch
if ok apply it to othet nodes without restarting
First apply it on node with issue then restart.
https://drive.google.com/open?id=1MAVIzFLOAovcLb_JH9ooXAxwd-yjavoy
patch -p1 -d / < upgrade_backend2_bond_mtu.patch
if ok apply it to othet nodes without restarting
atselitan
21 Posts
Quote from atselitan on April 2, 2020, 4:30 amHello!
The problem was resolved.
Thanks a lot!
Hello!
The problem was resolved.
Thanks a lot!
admin
2,930 Posts
Quote from admin on April 2, 2020, 10:47 amGreat ! thanks for the feedback..
Great ! thanks for the feedback..