Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Booting up a node causes the iSCSI to fail

We have a test cluster of three nodes. We've setup a single iSCSI disk, with three paths to it.

When we shut down a node, the iSCSI keeps running (with about a 20 second pause).  However when we boot the node back up the iSCSI paths all go down.  We also note that the node is very slow to come back on line - it takes about 20 minutes after booting before it is ready for service.  Rebooting any of the three nodes gives the same result.

This is what we saw when we booted up one node and the iscsi went down on all paths.

15/05/2020 13:50:24 INFO     Cleaned disk path 00001/2.
15/05/2020 13:50:24 INFO     Cleaned disk path 00001/1.
15/05/2020 13:50:24 INFO     PetaSAN cleaned local paths not locked by this node in consul.
15/05/2020 13:50:24 INFO     LIO deleted backstore image image-00001
15/05/2020 13:50:24 INFO     LIO deleted Target iqn.2016-05.com.petasan:00001
15/05/2020 13:50:25 INFO     Image image-00001 unmapped successfully.
15/05/2020 13:50:25 INFO     PetaSAN cleaned iqns.
Any thoughts would be welcome.

Do you see any errors on the node console monitor when it is booting up, or where it is stuck for 20 min ? Any errors in syslog ?

 

Here are the log entries resulting from rebooting one node:

18/05/2020 11:19:46 INFO Start settings IPs
18/05/2020 11:19:54 INFO GlusterFS mount attempt
18/05/2020 11:20:04 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.2.13.32 -retry-join 10.2.13.30 -retry-join 10.2.13.31
18/05/2020 11:20:07 INFO Starting cluster file sync service
18/05/2020 11:20:07 INFO Starting iSCSI Service
18/05/2020 11:20:07 INFO Starting Cluster Management application
18/05/2020 11:20:07 INFO Starting Node Stats Service
18/05/2020 11:20:07 INFO Starting activating PetaSAN lvs
18/05/2020 11:20:07 INFO Running command: /sbin/vgs --noheadings --readonly --units=g --separator=";" -o vg_name,pv_count,lv_count,snap_count,vg_attr,vg_size,vg_free,vg_free_count
18/05/2020 11:20:07 INFO LeaderElectionBase dropping old sessions
18/05/2020 11:20:07 INFO stdout ceph-41498c09-d030-4b44-9f52-dae325abd600";"1";"1";"0";"wz--n-";"9314.00g";"0g";"0
18/05/2020 11:20:07 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.0";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:20:07 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.1";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:20:07 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.2";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:20:07 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.3";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:20:07 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.4";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:20:07 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.5";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:20:07 INFO Activating ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.0
18/05/2020 11:20:14 INFO Cluster is just starting, system will delete all active disk resources
18/05/2020 11:20:31 INFO GlusterFS mount attempt
18/05/2020 11:20:41 INFO LeaderElectionBase successfully dropped old sessions
18/05/2020 11:20:41 INFO Service is starting.
18/05/2020 11:22:39 INFO Setting writeback jobs number.
18/05/2020 11:22:39 INFO Activating ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.1
18/05/2020 11:25:28 INFO Setting writeback jobs number.
18/05/2020 11:25:28 INFO Activating ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.2
18/05/2020 11:27:59 INFO Setting writeback jobs number.
18/05/2020 11:27:59 INFO Activating ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.3
18/05/2020 11:29:02 INFO Found pool:rbd for disk:00001 via consul
18/05/2020 11:29:02 INFO Could not lock path 00001/1 with session 645cfda3-126b-5014-ea6d-873736df2afa.
18/05/2020 11:29:06 INFO Found pool:rbd for disk:00001 via consul
18/05/2020 11:29:06 INFO Could not lock path 00001/3 with session 645cfda3-126b-5014-ea6d-873736df2afa.
18/05/2020 11:29:11 INFO Found pool:rbd for disk:00001 via consul
18/05/2020 11:29:14 INFO Image image-00001 mapped successfully.
18/05/2020 11:29:17 INFO LIO add_target() disk wwn is 5f5876c100001
18/05/2020 11:29:17 INFO Path 00001/2 acquired successfully
18/05/2020 11:30:32 INFO Setting writeback jobs number.
18/05/2020 11:30:32 INFO Activating ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.4
18/05/2020 11:33:06 INFO Setting writeback jobs number.
18/05/2020 11:33:06 INFO Activating ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.5
18/05/2020 11:35:46 INFO Setting writeback jobs number.
18/05/2020 11:35:46 INFO Starting activating PetaSAN lvs
18/05/2020 11:35:46 INFO Running command: /sbin/vgs --noheadings --readonly --units=g --separator=";" -o vg_name,pv_count,lv_count,snap_count,vg_attr,vg_size,vg_free,vg_free_count
18/05/2020 11:35:46 INFO stdout ceph-41498c09-d030-4b44-9f52-dae325abd600";"1";"1";"0";"wz--n-";"9314.00g";"0g";"0
18/05/2020 11:35:46 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.0";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:35:46 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.1";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:35:46 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.2";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:35:46 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.3";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:35:46 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.4";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:35:46 INFO stdout ps-5f5876c1-6abd-4f6d-b31b-9227baf64927-wc-osd.5";"2";"1";"0";"wz--n-";"9910.16g";"0g";"0
18/05/2020 11:35:57 INFO Starting OSDs
18/05/2020 11:35:57 INFO Starting petasan tuning service
18/05/2020 11:35:57 INFO Starting Config Upload service
18/05/2020 11:48:55 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/file_sync_manager.py", line 75, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/base.py", line 49, in watch
return cons.kv.get(key, index=current_index, recurse=True)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 554, in get
params=params)
File "/usr/lib/python2.7/dist-packages/retrying.py", line 49, in wrapped_f
return Retrying(*dargs, **dkw).call(f, *args, **kw)
File "/usr/lib/python2.7/dist-packages/retrying.py", line 212, in call
raise attempt.get()
File "/usr/lib/python2.7/dist-packages/retrying.py", line 247, in get
six.reraise(self.value[0], self.value[1], self.value[2])
File "/usr/lib/python2.7/dist-packages/retrying.py", line 200, in call
attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
18/05/2020 11:49:28 INFO Found pool:rbd for disk:00001 via consul
18/05/2020 11:49:28 INFO Could not lock path 00001/1 with session 645cfda3-126b-5014-ea6d-873736df2afa.
18/05/2020 11:49:45 ERROR Node Stats exception.
18/05/2020 11:49:45 ERROR Error running echo command :echo "PetaSAN.NodeStats.ceph02.cpu_all.percent_util 0.64 `date +%s`" "
PetaSAN.NodeStats.ceph02.memory.percent_util 72.94 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.percent_util.bond0 0.01 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.throughput.bond0_received 135127.04 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.throughput.bond0_transmitted 15902.72 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.percent_util.bond1 0.07 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.throughput.bond1_received 767488.0 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.throughput.bond1_transmitted 822661.12 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.percent_util.eth0 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.throughput.eth0_received 0.0 `date +%s`" | nc -q0 10.2.10.31 2003
Traceback (most recent call last):
File "/opt/petasan/scripts/node_stats.py", line 159, in <module>
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 64, in get_stats
graphite_sender.send(leader_ip)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.ceph02.cpu_all.percent_util 0.64 `date +%s`" "
PetaSAN.NodeStats.ceph02.memory.percent_util 72.94 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.percent_util.bond0 0.01 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.throughput.bond0_received 135127.04 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.throughput.bond0_transmitted 15902.72 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.percent_util.bond1 0.07 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.throughput.bond1_received 767488.0 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.throughput.bond1_transmitted 822661.12 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.percent_util.eth0 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph02.ifaces.throughput.eth0_received 0.0 `date +%s`" | nc -q0 10.2.10.31 2003
18/05/2020 11:51:53 INFO Cleaned disk path 00001/2.
18/05/2020 11:51:53 INFO PetaSAN cleaned local paths not locked by this node in consul.
18/05/2020 11:51:53 INFO LIO deleted backstore image image-00001
18/05/2020 11:51:53 INFO LIO deleted Target iqn.2016-05.com.petasan:00001
18/05/2020 11:51:53 INFO Image image-00001 unmapped successfully.
18/05/2020 11:51:53 INFO PetaSAN cleaned iqns.
18/05/2020 11:52:27 INFO Found pool:rbd for disk:00001 via consul
18/05/2020 11:52:43 INFO Image image-00001 mapped successfully.
18/05/2020 11:52:46 INFO LIO add_target() disk wwn is 5f5876c100001
18/05/2020 11:52:46 INFO Path 00001/1 acquired successfully