Forums - PetaSAN

ForumBug ReportingESX-Server ISCSI problems?
You need to log in to create posts and topics. Login · Register
ESX-Server ISCSI problems?

Pages: 1 2 3 4

admin
2,930 Posts

August 8, 2017, 1:29 pm
Quote from admin on August 8, 2017, 1:29 pm
The nodes shutdown is most probably due to fencing. A node kills another node if the later lost connection to the cluster and still has iSCSI disk resources not yet distributed. The best thing is to wait a few minutes before restarting a node that was down or was being upgraded, so to make sure all its paths were distributed.

This should fix the issue. If not let me know and i can help you disable fencing action.

The script to move paths from active nodes is being tested now and i will post it here once done.

The nodes shutdown is most probably due to fencing. A node kills another node if the later lost connection to the cluster and still has iSCSI disk resources not yet distributed. The best thing is to wait a few minutes before restarting a node that was down or was being upgraded, so to make sure all its paths were distributed.

This should fix the issue. If not let me know and i can help you disable fencing action.

The script to move paths from active nodes is being tested now and i will post it here once done.

Last edited on August 8, 2017, 1:30 pm · #31

admin
2,930 Posts

August 8, 2017, 3:38 pm
Quote from admin on August 8, 2017, 3:38 pm
We made the following move_path.py script to move an active path from a node

https://drive.google.com/file/d/0B7VNYCjYBY2yOXRXUEpGajVlalU/view?usp=sharing

better place it in /opt/petasan/scripts

chmod +x move_path.py

run syntax:

./move_path.py -id DISK_ID -ip IP_ADDRESS

example

./move_path.py -id 00001 -ip 10.0.2.100

It needs to run from the node currently serving the path

You need to specify the full id string of the disk like 00001 and not 1

It will trigger a path move, it some cases the path may end up on same node, in this case retry. In the future we intend to support this in ui, where you can specify target node and allow dynamic moving based on load. So this is very crude.

We made the following move_path.py script to move an active path from a node

https://drive.google.com/file/d/0B7VNYCjYBY2yOXRXUEpGajVlalU/view?usp=sharing

better place it in /opt/petasan/scripts

chmod +x move_path.py

run syntax:

./move_path.py -id DISK_ID -ip IP_ADDRESS

example

./move_path.py -id 00001 -ip 10.0.2.100

It needs to run from the node currently serving the path

You need to specify the full id string of the disk like 00001 and not 1

It will trigger a path move, it some cases the path may end up on same node, in this case retry. In the future we intend to support this in ui, where you can specify target node and allow dynamic moving based on load. So this is very crude.

Last edited on August 8, 2017, 3:57 pm · #32

therm
121 Posts

August 9, 2017, 6:45 am
Quote from therm on August 9, 2017, 6:45 am
Works like charm!

root@ceph-node-mru-1:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7|wc -l
7
root@ceph-node-mru-2:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7|wc -l
7
root@ceph-node-mru-3:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7|wc -l
6

This night everything was good (paths were on node1 and node2). Hopefully this will stabilize the cluster. I will report things going on here.

Thanks again!

Works like charm!

root@ceph-node-mru-1:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7|wc -l
7
root@ceph-node-mru-2:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7|wc -l
7
root@ceph-node-mru-3:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7|wc -l
6

This night everything was good (paths were on node1 and node2). Hopefully this will stabilize the cluster. I will report things going on here.

Thanks again!

Last edited on August 9, 2017, 6:47 am · #33

therm
121 Posts

August 11, 2017, 5:09 am
Quote from therm on August 11, 2017, 5:09 am
System seems to be stable now. Besides distributing paths to all nodes I now do scrubs in the daytime, but throttled:

# reduce background load due to scrub
osd_max_scrubs = 1
osd_scrub_during_recovery = false
osd_scrub_priority = 1
osd_scrub_sleep = 2
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 5
osd_deep_scrub_stride = 1048576
osd_scrub_load_threshold = 5
osd_scrub_begin_hour = 6
osd_scrub_end_hour = 22

This leads to about 30 IOPS background-reads which do not disturb.

Another problem was that after reboot interfaces eth4 and eth2 were switching names. While waiting for the next release, I fixed it for the meanwhile with:

vi /etc/udev/rules.d/70-persistent-net.rules

...

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="ac:16:2d:ac:2b:88", NAME="eth4"

..

Switching paths is now totally easy and painless!

System seems to be stable now. Besides distributing paths to all nodes I now do scrubs in the daytime, but throttled:

# reduce background load due to scrub
osd_max_scrubs = 1
osd_scrub_during_recovery = false
osd_scrub_priority = 1
osd_scrub_sleep = 2
osd_scrub_chunk_min = 1
osd_scrub_chunk_max = 5
osd_deep_scrub_stride = 1048576
osd_scrub_load_threshold = 5
osd_scrub_begin_hour = 6
osd_scrub_end_hour = 22

This leads to about 30 IOPS background-reads which do not disturb.

Another problem was that after reboot interfaces eth4 and eth2 were switching names. While waiting for the next release, I fixed it for the meanwhile with:

vi /etc/udev/rules.d/70-persistent-net.rules

...

SUBSYSTEM=="net", ACTION=="add", ATTR{address}=="ac:16:2d:ac:2b:88", NAME="eth4"

..

Switching paths is now totally easy and painless!

Last edited on August 11, 2017, 5:17 am · #34

admin
2,930 Posts

August 11, 2017, 9:57 am
Quote from admin on August 11, 2017, 9:57 am
Happy things are working well now 🙂 and thanks very much for sharing your scrub params.

Regarding the nic name change, this is strange, the only thing i can think of is that in v1.3.1 we included newer updated firmware for some kernel drivers. As you noted in v1.4 we will include a menu to name/rename nic cards, it will handle this case but it was really designed to support cases such as users changing hardware and also when running PetaSAN hyper-converged under ESX, if you add a new nic VMWare can change the order of existing ones.

Happy things are working well now 🙂 and thanks very much for sharing your scrub params.

Regarding the nic name change, this is strange, the only thing i can think of is that in v1.3.1 we included newer updated firmware for some kernel drivers. As you noted in v1.4 we will include a menu to name/rename nic cards, it will handle this case but it was really designed to support cases such as users changing hardware and also when running PetaSAN hyper-converged under ESX, if you add a new nic VMWare can change the order of existing ones.

#35

Post Reply: ESX-Server ISCSI problems?

Cancel

Pages: 1 2 3 4