Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Dependecy of a Package while petasan is upgrading from 2.6.1 to 2.8.1

Pages: 1 2

Thanks for the verification, we changed the online documentation for 2.x upgrading with the following steps

wget http://archive.petasan.org/repo/updatev2.sh
chmod +x updatev2.sh
./updatev2.sh

the downloadable script includes the prev steps and adds CA certificate update.

Hey Admin,

With the updated upgrade documentation, the upgrade of Petasan from 2.6.1 to 2.8.1 did work, however this was observed with the 3 node cluster in our environment, while performing it on the 3rd Node the Node got shutdown.
I know its not supposed to go down, performed the upgrade twice, the result was same.

Thanks in advance.
Below are the petasan logs info of the node 3

03/03/2023 19:32:53 INFO adding NODE_JOINED flag to /opt/petasan/config/flags/flags.json
03/03/2023 19:32:54 INFO Searching for any unlinked caches to be reused.
03/03/2023 19:32:54 INFO OSD IDs list for this node are : [0]
03/03/2023 19:33:48 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fbeb9eb1470>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?recurse=1
03/03/2023 19:33:48 INFO GlusterFS mount attempt
03/03/2023 19:33:49 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c7b12e0b8>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader
03/03/2023 19:33:49 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd03ea603c8>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config
03/03/2023 19:33:50 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fbeb9eb1748>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?recurse=1
03/03/2023 19:33:51 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c7b12e3c8>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader
03/03/2023 19:33:51 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd03ea606a0>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config
03/03/2023 19:33:53 INFO checking uptime
03/03/2023 19:33:53 INFO checking uptime of node: Node1 ip:10.150.0.11
03/03/2023 19:33:54 INFO checking uptime of node: Node2 ip:10.150.0.12
03/03/2023 19:33:54 INFO checking uptime of node: node3 ip:10.150.0.13
03/03/2023 19:33:54 INFO Service is starting.
03/03/2023 19:33:54 WARNING Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f09f8dad550>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/session/list
03/03/2023 19:33:54 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fbeb9eb18d0>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?recurse=1
03/03/2023 19:33:55 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c7b12e550>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader
03/03/2023 19:33:55 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd03ea60860>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config
03/03/2023 19:33:56 WARNING Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f09f8dad860>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/session/list
03/03/2023 19:34:00 WARNING Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f09f8dad9e8>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/session/list
03/03/2023 19:34:02 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fbeb9eb1a58>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config/Files?recurse=1
03/03/2023 19:34:03 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8c7b12e710>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Services/ClusterLeader
03/03/2023 19:34:03 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd03ea60a20>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/kv/PetaSAN/Config
03/03/2023 19:34:08 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f09f8dadba8>: Failed to establis
h a new connection: [Errno 111] Connection refused',)': /v1/session/list
03/03/2023 19:42:15 INFO Start settings IPs
03/03/2023 19:42:21 INFO GlusterFS mount attempt
03/03/2023 19:42:31 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.150.2.6 -retry-join 10.150.2.4 -retry-join 10.150.2.5
03/03/2023 19:42:35 INFO Starting cluster file sync service
03/03/2023 19:42:35 INFO Starting iSCSI Service
03/03/2023 19:42:35 INFO Starting Cluster Management application
03/03/2023 19:42:35 INFO Starting Node Stats Service
03/03/2023 19:42:35 INFO Starting activating PetaSAN lvs
03/03/2023 19:42:35 INFO LeaderElectionBase dropping old sessions
03/03/2023 19:42:35 INFO checking uptime
03/03/2023 19:42:35 INFO checking uptime of node: Node1 ip:10.150.0.11
03/03/2023 19:42:35 INFO checking uptime of node: Node2 ip:10.150.0.12
03/03/2023 19:42:36 INFO checking uptime of node: node3 ip:10.150.0.13
03/03/2023 19:42:36 INFO checking uptime of node: node3 is starting
03/03/2023 19:42:36 INFO Service is starting.
03/03/2023 19:42:36 INFO iSCSI Service dropping old sessions
03/03/2023 19:42:39 INFO Starting activating PetaSAN lvs
03/03/2023 19:42:41 INFO Starting OSDs
03/03/2023 19:42:41 INFO Starting sync replication node service
03/03/2023 19:42:41 INFO Starting petasan tuning service
03/03/2023 19:42:42 INFO Starting Config Upload service
03/03/2023 19:42:42 INFO Starting petasan-qperf server
03/03/2023 19:42:42 INFO Starting petasan-update-node-info
03/03/2023 19:42:42 INFO sync_replication_node starting
03/03/2023 19:42:42 INFO update_node_info starting
03/03/2023 19:42:42 INFO syncing cron ok
03/03/2023 19:42:43 INFO syncing replication users ok
03/03/2023 19:42:43 INFO sync_replication_node completed
03/03/2023 19:42:43 INFO node info has been updated successfully.
03/03/2023 19:42:43 INFO update_node_info completed
03/03/2023 19:42:45 INFO LeaderElectionBase successfully dropped old sessions
03/03/2023 19:42:51 INFO iSCSI Service successfully dropped old sessions
03/03/2023 19:42:51 INFO Cleaning unused configurations.
03/03/2023 19:42:51 INFO Cleaning all mapped disks
03/03/2023 19:42:51 INFO Cleaning unused rbd images.
03/03/2023 19:42:51 INFO Cleaning unused ips.

The new documentation is just the old upgrade process, just updates CA certificate for ssl to work. So nothing major got changed than before. Can you clarify more info:

Do you mean the node shut down by itself ?

what do you mean you did the upgrade twice and result is same ? have you tried on a different cluster setup ?

when you reboot the node from this shutdown, does it shutdown again and it is working after reboot ?

does the node serving any iSCSI paths ?

if you switch fencing off, does it work ?

is the system under heavy load that could prevent heartbeat messages from being sent/received ?

Giving you clarity on the questions that you have asked.
We have a 3 Node Cluster HP DL380 Gen8 servers for test setup , and 4 Node DL380 Gen 10 for Production. : Before upgrading in the production setup, I was trying it out on my Test Setup.

Do you mean the node shut down by itself ?
Yes The Node Shutdown by itself, i was running these scripts Node wise in a particular order Node 1 , 2, 3. The cluster showed the Warning Message "mons are allowing insecure global_id reclaim"
In 
another post it was mentioned its not a serious issue, so i went on with next node for upgrade.

What do you mean, you did the upgrade twice and result is same ? have you tried on a different cluster setup ?
I repeated the upgrade test, from 2.6.1 to 2.8.1 and multiple reinstallation of 2.6.1 and redoing it multiple times to be sure if there are no variations in the outcome.

When you reboot the node from this shutdown, does it shut down again, and it is working after reboot ?
Yes after the boot of the 3rd Node it does work fine, and HA seems to available, all the VMs running were available no issue with that.

Does the node serving any iSCSI paths ?
Did not really check this part, Will have this check while I am trying out next upgrade.
Since i had created only 1 disk with 2 path  , did not really verfiy that,
if you switch fencing off, does it work ?
Will Try in my next upgrade test, will as of now was trying with fencing ON

 is the system under heavy load that could prevent heartbeat messages from being sent/received ?
I dont think, there was any load, there is only 1 disk and 1 VM running,

Also observed, PGs getting displaced while performing the upgrade on the 2nd Node, since the data size and less PGs is less the recovery was fast, and did not really see any impact, ? is this smtg happens while the version upgrade happens.

Any difference in hardware configuration of node 3 than other nodes?

Do you have enough resources like ram ?

Can you try it in another environment setup?

Any difference in hardware configuration of node 3 than other nodes?
Like i said there is no difference in the HARDware difference  these are all HP gen 8 servers, with 12 Core CPU and 256 GB memory, which more than sufficient  i guess,

Can you try it in another environment setup?
We do not have any other servers for this, probably will have to try on these itself,
Allow me time for this.
Will try these with fencing OFF

*Note
I had performed the upgrade previously once from 2.7.1 to 2.8.1 : Which I had no issues and was able to upgrade smoothly.

Dear Admin,
We performed multiple tests on the upgrade.
These are the observations : While performing the upgrade 2.6.1 to 2.8.1 usually the on the last Node, There is a small timeout that is observed on the logs and the node shuts down, This situation is when Fencing was ON

After turning off the fencing, even if there was a timeout, the Node did not shut down. Things were running fine.

Should it be a best practice to turn off fencing at the time of Petasan Upgrade. Since all the service Mon, iscsi and other services restart and the timeout is happening,
Or is there a way to overcome this problem in general.

My Cluster status shows Healthy and Can we leave the fencing off after the Upgrade is done.

You can switch fencing off, but this is not recommended. Fencing is a known clustering concept to kill a node it is not responding to cluster heartbeats, in case the node is half dead (not likely but possible), it may keep accessing storage while some other cluster node is now serving its paths.

Why do you see a timeout on the third node which makes it lose connection to the cluster is not known. First we have been using fencing since the beginning of PetaSAN and had not caused issues during upgrades. Why is it happening on only the third node ? Note that when we upgrade we do not restart the Consul clustering software nor do we change or reset the LIO/iSCSI kernel configuration so it should continue to be connected to the cluster. Yes we do restart Ceph OSDs but this should pause iSCSI i/o for approx 20 sec on all nodes not just the iSCSI i/o on the nodes being upgraded, aside from this the node should be responsive and respond to cluster heartbeats using Consul.

Also instead of switch off fencing, you may move the iSCSI paths from node 3 before upgrading, it will not be fenced if it is not serving any iSCSI paths. Still that does not explain why you see this.

Pages: 1 2