Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Path Assignment failed

I have a few problems with the iSCSI targets on VMWare (path activity switches - goes down and up).
So my first idea was to change the Path assignment on PetaSAN.

The assignment looked also weird, but OK:

Node 1
- hbps-iscsi-ssd-001 (eth5)
- hbps-esx-ssd-001 (eth4)

Node 2
- NOTHING

Node 3
- hbps-iscsi-ssd-001 (eth4)
- hbps-esx-ssd-001 (eth5)

After reassigning hbps-iscsi-ssd-001 (eth5) from Node 1 to Node 2 I get a FAILED.
-> Reboot of Node 2 didn't help.
-> An assignment from Node 1 to Node 3 or vise versa was OK.
Here is the Log of both Nodes:

Node 1:

19/06/2019 19:24:54 INFO User starts manual assignments.
19/06/2019 19:24:54 INFO User selected path 00002 hbps-iscsi-ssd-001 HBPS01.
19/06/2019 19:24:54 INFO User selected manual option in assignment.
19/06/2019 19:24:57 INFO User start manual reassignment paths for selected paths.
19/06/2019 19:24:57 INFO Set new assignment.
19/06/2019 19:24:57 INFO Delete old assignments.
19/06/2019 19:24:57 INFO Lock assignment root.
19/06/2019 19:24:57 INFO {}
19/06/2019 19:24:57 INFO Lock path 172.20.10.125 by session 20ebe702-c561-b8af-2756-d275d225ceca
19/06/2019 19:24:57 INFO New assignment for 172.20.10.125 ,disk 00002, from node HBPS01 and to node HBPS02 with status 1
19/06/2019 19:24:57 INFO System started manual assignments.
19/06/2019 19:24:58 INFO Reassignment paths script invoked to run process action.
19/06/2019 19:24:58 INFO Start process reassignments paths.
19/06/2019 19:24:58 INFO process path 172.20.10.125 and its status is 1
19/06/2019 19:24:58 INFO Move action,try clean disk hbps-iscsi-ssd-001 path 00002 remotely on node HBPS01.
19/06/2019 19:24:58 INFO get object from MangePathAssignment.
19/06/2019 19:24:58 INFO Reassignment paths script invoked to run clean action.
19/06/2019 19:24:58 INFO Updating path 172.20.10.125 status to 0
19/06/2019 19:24:58 INFO call get assignments stats function.
19/06/2019 19:24:58 INFO Path 172.20.10.125 status updated to 0
19/06/2019 19:24:58 INFO Found pool:SSD for disk:00002 via consul
19/06/2019 19:24:58 INFO Move action,cleaned disk 00002 path 1.
19/06/2019 19:24:58 INFO Move action,clean newtwork config for disk 00002 path 1.
19/06/2019 19:24:58 INFO {}
19/06/2019 19:24:58 INFO Move action,release disk 00002 path 2.
19/06/2019 19:24:58 INFO python /opt/petasan/scripts/admin/reassignment_paths.py path_host -ip 172.20.10.125 -disk_id 00002
19/06/2019 19:24:58 INFO Move action passed
19/06/2019 19:25:04 INFO Process completed for path 172.20.10.125 with status 3.
19/06/2019 19:25:08 INFO get object from MangePathAssignment.
19/06/2019 19:25:08 INFO call get assignments stats function.
19/06/2019 19:25:14 INFO Process completed.

Node 2:

19/06/2019 19:25:00 INFO Found pool:SSD for disk:00002 via consul
19/06/2019 19:25:00 INFO Image image-00002 mapped successfully.
19/06/2019 19:25:03 INFO LIO add_target() disk wwn is 00002
19/06/2019 19:25:03 ERROR LIO error could not create target for disk 00002.
19/06/2019 19:25:03 ERROR Could not create Target in configFS.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/lio/api.py", line 65, in add_target
target = Target(fabric, disk_meta.iqn)
File "/usr/lib/python2.7/dist-packages/rtslib/target.py", line 1214, in __init__
self._create_in_cfs_ine(mode)
File "/usr/lib/python2.7/dist-packages/rtslib/node.py", line 77, in _create_in_cfs_ine
% self.__class__.__name__)
RTSLibError: Could not create Target in configFS.
19/06/2019 19:25:03 INFO Acquired forced path 00002/2
19/06/2019 19:25:03 INFO Updating path 172.20.10.125 status to 3
19/06/2019 19:25:03 INFO Path 172.20.10.125 status updated to 3
19/06/2019 19:25:03 ERROR Error could not acquire path 00002/2
19/06/2019 19:25:03 INFO Unlock path 00002/2
19/06/2019 19:25:03 INFO PetaSAN unlocked any consul locks not configured in this node.
19/06/2019 19:25:04 INFO LIO deleted backstore image image-00002
19/06/2019 19:25:04 INFO PetaSAN Cleaned rbd backstores.
19/06/2019 19:25:04 INFO Image image-00002 unmapped successfully.
19/06/2019 19:25:04 INFO Found pool:SSD for disk:00002 via consul
19/06/2019 19:25:04 INFO Image image-00002 mapped successfully.
19/06/2019 19:25:07 INFO LIO add_target() disk wwn is 00002
19/06/2019 19:25:07 ERROR LIO error could not create target for disk 00002.
19/06/2019 19:25:07 ERROR Could not create Target in configFS.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/lio/api.py", line 65, in add_target
target = Target(fabric, disk_meta.iqn)
File "/usr/lib/python2.7/dist-packages/rtslib/target.py", line 1214, in __init__
self._create_in_cfs_ine(mode)
File "/usr/lib/python2.7/dist-packages/rtslib/node.py", line 77, in _create_in_cfs_ine
% self.__class__.__name__)
RTSLibError: Could not create Target in configFS.
19/06/2019 19:25:07 ERROR Error could not acquire path 00002/2
19/06/2019 19:25:07 INFO Unlock path 00002/2
19/06/2019 19:25:07 INFO PetaSAN unlocked any consul locks not configured in this node.
19/06/2019 19:25:08 INFO LIO deleted backstore image image-00002
19/06/2019 19:25:08 INFO PetaSAN Cleaned rbd backstores.
19/06/2019 19:25:08 INFO Image image-00002 unmapped successfully.
19/06/2019 19:25:08 INFO Found pool:SSD for disk:00002 via consul
19/06/2019 19:25:08 INFO Image image-00002 mapped successfully.
19/06/2019 19:25:11 INFO LIO add_target() disk wwn is 00002
19/06/2019 19:25:11 ERROR LIO error could not create target for disk 00002.
19/06/2019 19:25:11 ERROR Could not create Target in configFS.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/lio/api.py", line 65, in add_target
target = Target(fabric, disk_meta.iqn)
File "/usr/lib/python2.7/dist-packages/rtslib/target.py", line 1214, in __init__
self._create_in_cfs_ine(mode)
File "/usr/lib/python2.7/dist-packages/rtslib/node.py", line 77, in _create_in_cfs_ine
% self.__class__.__name__)
RTSLibError: Could not create Target in configFS.
19/06/2019 19:25:11 ERROR Error could not acquire path 00002/2
19/06/2019 19:25:11 INFO Unlock path 00002/2
19/06/2019 19:25:11 INFO PetaSAN unlocked any consul locks not configured in this node.
19/06/2019 19:25:12 INFO LIO deleted backstore image image-00002
19/06/2019 19:25:12 INFO PetaSAN Cleaned rbd backstores.
19/06/2019 19:25:13 INFO Image image-00002 unmapped successfully.
19/06/2019 19:25:13 INFO Found pool:SSD for disk:00002 via consul
19/06/2019 19:25:13 INFO Image image-00002 mapped successfully.
19/06/2019 19:25:16 INFO LIO add_target() disk wwn is 00002
19/06/2019 19:25:16 ERROR LIO error could not create target for disk 00002.
19/06/2019 19:25:16 ERROR Could not create Target in configFS.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/lio/api.py", line 65, in add_target
target = Target(fabric, disk_meta.iqn)
File "/usr/lib/python2.7/dist-packages/rtslib/target.py", line 1214, in __init__
self._create_in_cfs_ine(mode)
File "/usr/lib/python2.7/dist-packages/rtslib/node.py", line 77, in _create_in_cfs_ine
% self.__class__.__name__)
RTSLibError: Could not create Target in configFS.
19/06/2019 19:25:16 ERROR Error could not acquire path 00002/2
19/06/2019 19:25:16 INFO Unlock path 00002/2
19/06/2019 19:25:16 INFO PetaSAN unlocked any consul locks not configured in this node.
19/06/2019 19:25:17 INFO LIO deleted backstore image image-00002
19/06/2019 19:25:17 INFO PetaSAN Cleaned rbd backstores.
19/06/2019 19:25:17 INFO Image image-00002 unmapped successfully.

... repeating until I moved the path back to Node 1.
The same problem if I try to move a path from Node 3 to Node 2.

What problem could occur this?

Thanks.

Trexman

Hi

What do you mean by   "(path activity switches - goes down and up)."  ? is this all the time or only during path assignment ?

Are nodes 1 and 3 OK and only 2 is bad ? Do paths work correctly on node 1 and 3 ?

We had a path assignment bug in v 2.2 when using vlans , does this apply, what version are you using ?

The "RTSLibError: Could not create Target in configFS" is strange, has any software updates were done to node 2 ?

Hi,
for me the focus is the iSCSI path assignment error.

 

What do you mean by   "(path activity switches - goes down and up)."  ? is this all the time or only during path assignment ?

I'm getting vCenter alerts like:
Lost path redundancy to storage device naa.60014050000700000000000000000000. Path vmhba64:C1:T3:L0 is down. Affected datastores: hbps-esx-ssd-001.

So for me it looks like that the path are going down and up because of those messages 2-3 times a day.
But there I follow another resolution with one host itself.

 

Are nodes 1 and 3 OK and only 2 is bad ? Do paths work correctly on node 1 and 3 ?

What do you mean with "Do paths work correctly"? Except of the vCenter messages iSCSI connection is working with VMware.
And I can assign paths without any problems between Node 1 and 3.

 

We had a path assignment bug in v 2.2 when using vlans , does this apply, what version are you using ?

We are using 2.2.0, but no VLAN Tagging.

 

The "RTSLibError: Could not create Target in configFS" is strange, has any software updates were done to node 2 ?

As far as I know the OS and software should have the same version like after the installation. I checked a few program version between Node 1 and Node 2 and they are identical.

 

As a summit of this I should update to petasan 2.3.0, or should this problem fixed before?

I had to start the Update with Node 2. Can I run the cluster "temporally" in different versions? Meaning, Update Node 2, try to move iSCSI pathes to Node 2, if OK update Node 1, a.s.o.
Right?

For the path assignment, my recommendation is to upgrade. The RTSLibError is not something we have seen, my guess it could have been caused by a package update from ubuntu since we use a custom SUSE version to support rbd backstores. You should have no problem upgrading the second node and moving paths to it from the 2.2 nodes, working temporarily with mixed versions should be ok but since it is not something we (or ceph) test for long periods, try to upgrade all when you can.

The paths down is something you should look into as well. the most common are not correctly setting up the paths, network/hardware issues, system overload or ceph errors. For the last 2 can you check your resource load charts : cpu/network/disks to see that they are not stuck at 100% and look at the ceph PG Status chart to see if cluster was active/clean or recovering from error.

So far the upgrade solved the path assignment failure.
Now I'm analyzing the VMware path problem.

 

Thanks for you help!