Bug in move_paths
therm
121 Posts
September 18, 2017, 6:52 amQuote from therm on September 18, 2017, 6:52 amSometimes it happens that all ips of the corrospondig nic are suddenly down if used move_paths script:
root@ceph-node-mru-2:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7
inet 192.168.3.29/24 scope global eth4
inet 192.168.3.26/24 scope global secondary eth4
inet 192.168.3.22/24 scope global secondary eth4
inet 192.168.3.24/24 scope global secondary eth4
inet 192.168.4.24/24 scope global eth5
inet 192.168.4.27/24 scope global secondary eth5
inet 192.168.4.28/24 scope global secondary eth5
inet 192.168.4.21/24 scope global secondary eth5
inet 192.168.4.25/24 scope global secondary eth5
inet 192.168.4.20/24 scope global secondary eth5
root@ceph-node-mru-2:~# ./tools/move_path.py -id 00003 -ip 192.168.4.24
Done
root@ceph-node-mru-2:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7
inet 192.168.3.29/24 scope global eth4
inet 192.168.3.26/24 scope global secondary eth4
inet 192.168.3.22/24 scope global secondary eth4
inet 192.168.3.24/24 scope global secondary eth4
Sometimes it happens that all ips of the corrospondig nic are suddenly down if used move_paths script:
root@ceph-node-mru-2:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7
inet 192.168.3.29/24 scope global eth4
inet 192.168.3.26/24 scope global secondary eth4
inet 192.168.3.22/24 scope global secondary eth4
inet 192.168.3.24/24 scope global secondary eth4
inet 192.168.4.24/24 scope global eth5
inet 192.168.4.27/24 scope global secondary eth5
inet 192.168.4.28/24 scope global secondary eth5
inet 192.168.4.21/24 scope global secondary eth5
inet 192.168.4.25/24 scope global secondary eth5
inet 192.168.4.20/24 scope global secondary eth5
root@ceph-node-mru-2:~# ./tools/move_path.py -id 00003 -ip 192.168.4.24
Done
root@ceph-node-mru-2:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7
inet 192.168.3.29/24 scope global eth4
inet 192.168.3.26/24 scope global secondary eth4
inet 192.168.3.22/24 scope global secondary eth4
inet 192.168.3.24/24 scope global secondary eth4
therm
121 Posts
September 18, 2017, 6:56 amQuote from therm on September 18, 2017, 6:56 amMight that be because it is the main ip for that nic?
Might that be because it is the main ip for that nic?
therm
121 Posts
September 18, 2017, 10:04 amQuote from therm on September 18, 2017, 10:04 amAt the moment one LUN is inaccessable. In ESX one path seems to be down, but I cannot move IP (because it is the main ip) and reboot is not possible because recovery is in progress.
Do you have an idea why ESX is damn slow when only one path is down out of four pathes? And why does ESX not reconnect to the path? Do you need any further information?
Regards,
Dennis
At the moment one LUN is inaccessable. In ESX one path seems to be down, but I cannot move IP (because it is the main ip) and reboot is not possible because recovery is in progress.
Do you have an idea why ESX is damn slow when only one path is down out of four pathes? And why does ESX not reconnect to the path? Do you need any further information?
Regards,
Dennis
Last edited on September 18, 2017, 11:00 am by therm · #3
admin
2,930 Posts
September 18, 2017, 11:07 amQuote from admin on September 18, 2017, 11:07 amThe term main ip you mean in ESX this is/was the active i/o path whereas the rest of paths are active failover ?
In ESX , the one path that is down, is this the "main" ip ?
Why is there recovery happening, were any nodes/osds down ?
The term main ip you mean in ESX this is/was the active i/o path whereas the rest of paths are active failover ?
In ESX , the one path that is down, is this the "main" ip ?
Why is there recovery happening, were any nodes/osds down ?
Last edited on September 18, 2017, 11:09 am by admin · #4
therm
121 Posts
September 18, 2017, 11:17 amQuote from therm on September 18, 2017, 11:17 amWith main ip I mean for example:
inet 192.168.4.24/24 scope global eth5
inet 192.168.4.27/24 scope global secondary eth5
inet 192.168.4.28/24 scope global secondary eth5
inet 192.168.4.21/24 scope global secondary eth5
inet 192.168.4.25/24 scope global secondary eth5
inet 192.168.4.20/24 scope global secondary eth5
Because 192.168.4.24 is not a secondary ip. If I try to move those ips, all other secondary ips gone away but do not recover on other nodes.
Today morning there was another freeze off one ESX. It seems to me that this happens when petasan servers are overloaded. In this case the ESX is marked as not responding, has timeout messages on one petasan-lun, but does not reconnect. IPs are reachable. I tried to move paths, but got things like in my first post above. I restarted this server and added a disk. So it is a recovery/backfilling process. The path not reconnecting is not on this server, it is on server1.
No OSDs were down. Just messages like the following in demsg:
[Fri Sep 15 00:01:00 2017] Process accounting resumed
[Fri Sep 15 11:49:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76321186
[Fri Sep 15 11:49:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76321186
[Fri Sep 15 14:02:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76353302
[Fri Sep 15 14:02:52 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76353302
[Fri Sep 15 14:13:01 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76355154
[Fri Sep 15 14:13:01 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76355154
[Fri Sep 15 14:17:05 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76355839
[Fri Sep 15 14:17:05 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76355839
[Fri Sep 15 14:28:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76373676
[Fri Sep 15 14:28:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76373676
[Fri Sep 15 14:33:19 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76389604
[Fri Sep 15 14:33:19 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76389604
[Fri Sep 15 14:43:28 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76409064
[Fri Sep 15 14:43:28 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76409064
[Fri Sep 15 14:47:31 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76409989
[Fri Sep 15 14:47:31 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76409989
[Fri Sep 15 14:48:32 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76410200
[Fri Sep 15 14:48:32 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76410200
[Fri Sep 15 14:52:36 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76410941
[Fri Sep 15 14:52:36 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76410941
[Fri Sep 15 15:25:12 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76465507
[Fri Sep 15 15:25:12 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76465507
[Fri Sep 15 15:34:20 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76467224
[Fri Sep 15 15:34:20 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76467224
[Fri Sep 15 15:46:30 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76470739
[Fri Sep 15 15:46:30 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76470739
[Fri Sep 15 15:56:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76472483
[Fri Sep 15 15:56:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76472483
[Fri Sep 15 16:03:45 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76474447
[Fri Sep 15 16:03:45 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76474447
[Fri Sep 15 16:11:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76477603
[Fri Sep 15 16:11:52 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76477603
[Fri Sep 15 16:12:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76478040
[Fri Sep 15 16:12:52 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76478040
[Fri Sep 15 16:13:53 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76478425
[Fri Sep 15 16:13:53 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76478425
[Fri Sep 15 16:44:20 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76554367
[Fri Sep 15 16:44:20 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76554367
[Fri Sep 15 16:53:28 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76555949
[Fri Sep 15 16:53:28 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76555949
[Fri Sep 15 17:56:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76652952
[Fri Sep 15 17:56:21 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76652952
[Fri Sep 15 18:10:37 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76700300
[Fri Sep 15 18:10:37 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76700300
[Fri Sep 15 18:13:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76707468
[Fri Sep 15 18:13:23 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76707468
[Fri Sep 15 18:16:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76717224
[Fri Sep 15 18:16:22 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76717224
[Fri Sep 15 18:24:07 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76752051
[Fri Sep 15 18:24:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76752051
[Fri Sep 15 18:24:23 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76752710
[Fri Sep 15 18:24:23 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76752710
[Fri Sep 15 18:26:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76759904
[Fri Sep 15 18:26:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76759904
[Fri Sep 15 18:28:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76766387
[Fri Sep 15 18:28:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76766387
[Fri Sep 15 18:29:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76768649
[Fri Sep 15 18:29:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76768649
[Fri Sep 15 18:38:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76791937
[Fri Sep 15 18:38:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76791937
[Fri Sep 15 18:48:34 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76816265
[Fri Sep 15 18:48:34 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76816265
[Fri Sep 15 19:05:48 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76837793
[Fri Sep 15 20:58:47 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76941442
[Fri Sep 15 20:58:47 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76941442
[Fri Sep 15 23:11:19 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76981682
[Fri Sep 15 23:11:19 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76981682
[Fri Sep 15 23:56:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77005026
[Fri Sep 15 23:56:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77005026
[Sat Sep 16 00:01:03 2017] Process accounting resumed
[Sat Sep 16 02:11:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77058120
[Sat Sep 16 02:11:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77058120
[Sat Sep 16 02:33:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77062716
[Sat Sep 16 02:33:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77062716
[Sat Sep 16 04:01:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77197934
[Sat Sep 16 04:01:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77197934
[Sat Sep 16 04:46:31 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77204741
[Sat Sep 16 04:46:31 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77204741
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77207592
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77207592
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77207595
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77207595
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77210428
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77210430
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77210430
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77210431
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77210431
[Sat Sep 16 06:36:30 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77412284
[Sat Sep 16 06:36:30 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77412284
[Sat Sep 16 10:03:33 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77703827
[Sat Sep 16 10:03:33 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77703827
[Sat Sep 16 10:36:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77736601
[Sat Sep 16 10:36:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77736601
[Sat Sep 16 11:06:33 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77741337
[Sat Sep 16 11:06:33 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77741337
[Sat Sep 16 20:15:57 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Sat Sep 16 21:53:10 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78202365
[Sat Sep 16 21:53:10 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78202365
[Sat Sep 16 21:56:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78205358
[Sat Sep 16 21:56:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78205358
[Sun Sep 17 00:01:05 2017] traps: atop[50325] trap divide error ip:4073c2 sp:7ffeb48906a0 error:0 in atop[400000+26000]traps:
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78301465
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78301465
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78301468
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78301468
[Sun Sep 17 03:40:50 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78353280
[Sun Sep 17 03:40:50 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78353280
[Sun Sep 17 03:47:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78353864
[Sun Sep 17 03:47:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78353864
[Sun Sep 17 04:45:32 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78375794
[Sun Sep 17 04:45:32 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78375794
[Sun Sep 17 05:23:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78437742
[Sun Sep 17 05:23:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78437742
[Sun Sep 17 07:25:26 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78602946
[Sun Sep 17 07:25:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78602946
[Sun Sep 17 08:04:27 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78646736
[Sun Sep 17 08:04:27 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78646737
[Sun Sep 17 09:14:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78707964
[Sun Sep 17 09:14:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78707964
[Sun Sep 17 09:16:26 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Sun Sep 17 11:20:00 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78826906
[Sun Sep 17 11:20:00 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78826906
[Sun Sep 17 13:21:48 2017] ABORT_TASK: Found referenced iSCSI task_tag: 79061065
[Sun Sep 17 13:21:48 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 79061065
[Sun Sep 17 13:46:36 2017] ABORT_TASK: Found referenced iSCSI task_tag: 79065611
[Sun Sep 17 13:46:36 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 79065611
[Mon Sep 18 00:01:07 2017] Process accounting resumed
[Mon Sep 18 04:30:42 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Mon Sep 18 04:46:33 2017] COMPARE_AND_WRITE: miscompare at offset 0
With main ip I mean for example:
inet 192.168.4.24/24 scope global eth5
inet 192.168.4.27/24 scope global secondary eth5
inet 192.168.4.28/24 scope global secondary eth5
inet 192.168.4.21/24 scope global secondary eth5
inet 192.168.4.25/24 scope global secondary eth5
inet 192.168.4.20/24 scope global secondary eth5
Because 192.168.4.24 is not a secondary ip. If I try to move those ips, all other secondary ips gone away but do not recover on other nodes.
Today morning there was another freeze off one ESX. It seems to me that this happens when petasan servers are overloaded. In this case the ESX is marked as not responding, has timeout messages on one petasan-lun, but does not reconnect. IPs are reachable. I tried to move paths, but got things like in my first post above. I restarted this server and added a disk. So it is a recovery/backfilling process. The path not reconnecting is not on this server, it is on server1.
No OSDs were down. Just messages like the following in demsg:
[Fri Sep 15 00:01:00 2017] Process accounting resumed
[Fri Sep 15 11:49:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76321186
[Fri Sep 15 11:49:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76321186
[Fri Sep 15 14:02:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76353302
[Fri Sep 15 14:02:52 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76353302
[Fri Sep 15 14:13:01 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76355154
[Fri Sep 15 14:13:01 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76355154
[Fri Sep 15 14:17:05 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76355839
[Fri Sep 15 14:17:05 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76355839
[Fri Sep 15 14:28:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76373676
[Fri Sep 15 14:28:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76373676
[Fri Sep 15 14:33:19 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76389604
[Fri Sep 15 14:33:19 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76389604
[Fri Sep 15 14:43:28 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76409064
[Fri Sep 15 14:43:28 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76409064
[Fri Sep 15 14:47:31 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76409989
[Fri Sep 15 14:47:31 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76409989
[Fri Sep 15 14:48:32 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76410200
[Fri Sep 15 14:48:32 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76410200
[Fri Sep 15 14:52:36 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76410941
[Fri Sep 15 14:52:36 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76410941
[Fri Sep 15 15:25:12 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76465507
[Fri Sep 15 15:25:12 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76465507
[Fri Sep 15 15:34:20 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76467224
[Fri Sep 15 15:34:20 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76467224
[Fri Sep 15 15:46:30 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76470739
[Fri Sep 15 15:46:30 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76470739
[Fri Sep 15 15:56:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76472483
[Fri Sep 15 15:56:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76472483
[Fri Sep 15 16:03:45 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76474447
[Fri Sep 15 16:03:45 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76474447
[Fri Sep 15 16:11:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76477603
[Fri Sep 15 16:11:52 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76477603
[Fri Sep 15 16:12:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76478040
[Fri Sep 15 16:12:52 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76478040
[Fri Sep 15 16:13:53 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76478425
[Fri Sep 15 16:13:53 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76478425
[Fri Sep 15 16:44:20 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76554367
[Fri Sep 15 16:44:20 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76554367
[Fri Sep 15 16:53:28 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76555949
[Fri Sep 15 16:53:28 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76555949
[Fri Sep 15 17:56:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76652952
[Fri Sep 15 17:56:21 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76652952
[Fri Sep 15 18:10:37 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76700300
[Fri Sep 15 18:10:37 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76700300
[Fri Sep 15 18:13:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76707468
[Fri Sep 15 18:13:23 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76707468
[Fri Sep 15 18:16:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76717224
[Fri Sep 15 18:16:22 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76717224
[Fri Sep 15 18:24:07 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76752051
[Fri Sep 15 18:24:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76752051
[Fri Sep 15 18:24:23 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76752710
[Fri Sep 15 18:24:23 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76752710
[Fri Sep 15 18:26:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76759904
[Fri Sep 15 18:26:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76759904
[Fri Sep 15 18:28:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76766387
[Fri Sep 15 18:28:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76766387
[Fri Sep 15 18:29:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76768649
[Fri Sep 15 18:29:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76768649
[Fri Sep 15 18:38:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76791937
[Fri Sep 15 18:38:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76791937
[Fri Sep 15 18:48:34 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76816265
[Fri Sep 15 18:48:34 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76816265
[Fri Sep 15 19:05:48 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76837793
[Fri Sep 15 20:58:47 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76941442
[Fri Sep 15 20:58:47 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76941442
[Fri Sep 15 23:11:19 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76981682
[Fri Sep 15 23:11:19 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76981682
[Fri Sep 15 23:56:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77005026
[Fri Sep 15 23:56:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77005026
[Sat Sep 16 00:01:03 2017] Process accounting resumed
[Sat Sep 16 02:11:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77058120
[Sat Sep 16 02:11:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77058120
[Sat Sep 16 02:33:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77062716
[Sat Sep 16 02:33:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77062716
[Sat Sep 16 04:01:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77197934
[Sat Sep 16 04:01:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77197934
[Sat Sep 16 04:46:31 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77204741
[Sat Sep 16 04:46:31 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77204741
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77207592
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77207592
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77207595
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77207595
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77210428
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77210430
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77210430
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77210431
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77210431
[Sat Sep 16 06:36:30 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77412284
[Sat Sep 16 06:36:30 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77412284
[Sat Sep 16 10:03:33 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77703827
[Sat Sep 16 10:03:33 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77703827
[Sat Sep 16 10:36:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77736601
[Sat Sep 16 10:36:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77736601
[Sat Sep 16 11:06:33 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77741337
[Sat Sep 16 11:06:33 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77741337
[Sat Sep 16 20:15:57 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Sat Sep 16 21:53:10 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78202365
[Sat Sep 16 21:53:10 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78202365
[Sat Sep 16 21:56:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78205358
[Sat Sep 16 21:56:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78205358
[Sun Sep 17 00:01:05 2017] traps: atop[50325] trap divide error ip:4073c2 sp:7ffeb48906a0 error:0 in atop[400000+26000]traps:
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78301465
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78301465
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78301468
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78301468
[Sun Sep 17 03:40:50 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78353280
[Sun Sep 17 03:40:50 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78353280
[Sun Sep 17 03:47:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78353864
[Sun Sep 17 03:47:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78353864
[Sun Sep 17 04:45:32 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78375794
[Sun Sep 17 04:45:32 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78375794
[Sun Sep 17 05:23:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78437742
[Sun Sep 17 05:23:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78437742
[Sun Sep 17 07:25:26 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78602946
[Sun Sep 17 07:25:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78602946
[Sun Sep 17 08:04:27 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78646736
[Sun Sep 17 08:04:27 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78646737
[Sun Sep 17 09:14:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78707964
[Sun Sep 17 09:14:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78707964
[Sun Sep 17 09:16:26 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Sun Sep 17 11:20:00 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78826906
[Sun Sep 17 11:20:00 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78826906
[Sun Sep 17 13:21:48 2017] ABORT_TASK: Found referenced iSCSI task_tag: 79061065
[Sun Sep 17 13:21:48 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 79061065
[Sun Sep 17 13:46:36 2017] ABORT_TASK: Found referenced iSCSI task_tag: 79065611
[Sun Sep 17 13:46:36 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 79065611
[Mon Sep 18 00:01:07 2017] Process accounting resumed
[Mon Sep 18 04:30:42 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Mon Sep 18 04:46:33 2017] COMPARE_AND_WRITE: miscompare at offset 0
Last edited on September 18, 2017, 11:19 am by therm · #5
admin
2,930 Posts
September 18, 2017, 11:17 amQuote from admin on September 18, 2017, 11:17 amok i got you, the main ip is the first ip assigned on the nic ! we are looking into this
we are able to reproduce it, i will send you a fix shortly
ok i got you, the main ip is the first ip assigned on the nic ! we are looking into this
we are able to reproduce it, i will send you a fix shortly
Last edited on September 18, 2017, 11:25 am by admin · #6
admin
2,930 Posts
September 18, 2017, 11:44 amQuote from admin on September 18, 2017, 11:44 amTo prevent primary deleting other ips:
echo 1 > /proc/sys/net/ipv4/conf/all/promote_secondaries
add to /etc/sysctl.conf
net.ipv4.conf.all.promote_secondaries=1
Now the PetaSAN node running iSCSI/LIO service has the ip configured in LIO but not on its nics,
To find the active paths in LIO:
targetcli ls | grep 192.168
For paths listed as "enabled", need to make sure the ips are configured on the nic, if not add the ip to the nic
ip address add1 92.168.4.28/24 dev ethX
make sure choose the correct ethX nic based on subnet 1 or 2
To prevent primary deleting other ips:
echo 1 > /proc/sys/net/ipv4/conf/all/promote_secondaries
add to /etc/sysctl.conf
net.ipv4.conf.all.promote_secondaries=1
Now the PetaSAN node running iSCSI/LIO service has the ip configured in LIO but not on its nics,
To find the active paths in LIO:
targetcli ls | grep 192.168
For paths listed as "enabled", need to make sure the ips are configured on the nic, if not add the ip to the nic
ip address add1 92.168.4.28/24 dev ethX
make sure choose the correct ethX nic based on subnet 1 or 2
Last edited on September 18, 2017, 11:46 am by admin · #7
therm
121 Posts
September 18, 2017, 11:58 amQuote from therm on September 18, 2017, 11:58 amAfter setting this (echo..) and using move_path the ip is on both servers!
root@ceph-node-mru-1:~# ip a |grep 3.21
inet 192.168.3.21/24 scope global eth4
root@ceph-node-mru-1:~# ip addr del 192.168.3.21/24 dev eth4
root@ceph-node-mru-1:~# ip a |grep 3.21
root@ceph-node-mru-1:~# targetcli ls | grep 192.168.3.21
| | | o- 192.168.3.21:3260 ................................................................................. [OK, iser disabled]
root@ceph-node-mru-2:~# ip a |grep 3.21
inet 192.168.3.21/24 scope global secondary eth4
root@ceph-node-mru-2:~# targetcli ls | grep 192.168.3.21
| | | o- 192.168.3.21:3260 ................................................................................. [OK, iser disabled]
After setting this (echo..) and using move_path the ip is on both servers!
root@ceph-node-mru-1:~# ip a |grep 3.21
inet 192.168.3.21/24 scope global eth4
root@ceph-node-mru-1:~# ip addr del 192.168.3.21/24 dev eth4
root@ceph-node-mru-1:~# ip a |grep 3.21
root@ceph-node-mru-1:~# targetcli ls | grep 192.168.3.21
| | | o- 192.168.3.21:3260 ................................................................................. [OK, iser disabled]
root@ceph-node-mru-2:~# ip a |grep 3.21
inet 192.168.3.21/24 scope global secondary eth4
root@ceph-node-mru-2:~# targetcli ls | grep 192.168.3.21
| | | o- 192.168.3.21:3260 ................................................................................. [OK, iser disabled]
admin
2,930 Posts
September 18, 2017, 12:03 pmQuote from admin on September 18, 2017, 12:03 pmThis is ok, you need to check for "enabled" as per my prev post.
fyi they are listed in LIO but disabled for several reason like to support iSCSI discovery (so when you discover using one path it knows about the other paths ) as well as make it faster to activate it to enable when switching.
Only 1 server should have this "enabled" in LIO, the rest should be disabled.
This is ok, you need to check for "enabled" as per my prev post.
fyi they are listed in LIO but disabled for several reason like to support iSCSI discovery (so when you discover using one path it knows about the other paths ) as well as make it faster to activate it to enable when switching.
Only 1 server should have this "enabled" in LIO, the rest should be disabled.
Last edited on September 18, 2017, 12:07 pm by admin · #9
therm
121 Posts
September 18, 2017, 12:17 pmQuote from therm on September 18, 2017, 12:17 pmOk, that refers to LIO,yes? But ips on interfaces should not be on both servers, shouldn't they?
Ok, that refers to LIO,yes? But ips on interfaces should not be on both servers, shouldn't they?
Bug in move_paths
therm
121 Posts
Quote from therm on September 18, 2017, 6:52 amSometimes it happens that all ips of the corrospondig nic are suddenly down if used move_paths script:
root@ceph-node-mru-2:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7
inet 192.168.3.29/24 scope global eth4
inet 192.168.3.26/24 scope global secondary eth4
inet 192.168.3.22/24 scope global secondary eth4
inet 192.168.3.24/24 scope global secondary eth4
inet 192.168.4.24/24 scope global eth5
inet 192.168.4.27/24 scope global secondary eth5
inet 192.168.4.28/24 scope global secondary eth5
inet 192.168.4.21/24 scope global secondary eth5
inet 192.168.4.25/24 scope global secondary eth5
inet 192.168.4.20/24 scope global secondary eth5
root@ceph-node-mru-2:~# ./tools/move_path.py -id 00003 -ip 192.168.4.24
Done
root@ceph-node-mru-2:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7
inet 192.168.3.29/24 scope global eth4
inet 192.168.3.26/24 scope global secondary eth4
inet 192.168.3.22/24 scope global secondary eth4
inet 192.168.3.24/24 scope global secondary eth4
Sometimes it happens that all ips of the corrospondig nic are suddenly down if used move_paths script:
root@ceph-node-mru-2:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7
inet 192.168.3.29/24 scope global eth4
inet 192.168.3.26/24 scope global secondary eth4
inet 192.168.3.22/24 scope global secondary eth4
inet 192.168.3.24/24 scope global secondary eth4
inet 192.168.4.24/24 scope global eth5
inet 192.168.4.27/24 scope global secondary eth5
inet 192.168.4.28/24 scope global secondary eth5
inet 192.168.4.21/24 scope global secondary eth5
inet 192.168.4.25/24 scope global secondary eth5
inet 192.168.4.20/24 scope global secondary eth5
root@ceph-node-mru-2:~# ./tools/move_path.py -id 00003 -ip 192.168.4.24
Done
root@ceph-node-mru-2:~# ip a |grep 192.168 |grep -v eth6 |grep -v eth7
inet 192.168.3.29/24 scope global eth4
inet 192.168.3.26/24 scope global secondary eth4
inet 192.168.3.22/24 scope global secondary eth4
inet 192.168.3.24/24 scope global secondary eth4
therm
121 Posts
Quote from therm on September 18, 2017, 6:56 amMight that be because it is the main ip for that nic?
Might that be because it is the main ip for that nic?
therm
121 Posts
Quote from therm on September 18, 2017, 10:04 amAt the moment one LUN is inaccessable. In ESX one path seems to be down, but I cannot move IP (because it is the main ip) and reboot is not possible because recovery is in progress.
Do you have an idea why ESX is damn slow when only one path is down out of four pathes? And why does ESX not reconnect to the path? Do you need any further information?
Regards,
Dennis
At the moment one LUN is inaccessable. In ESX one path seems to be down, but I cannot move IP (because it is the main ip) and reboot is not possible because recovery is in progress.
Do you have an idea why ESX is damn slow when only one path is down out of four pathes? And why does ESX not reconnect to the path? Do you need any further information?
Regards,
Dennis
admin
2,930 Posts
Quote from admin on September 18, 2017, 11:07 amThe term main ip you mean in ESX this is/was the active i/o path whereas the rest of paths are active failover ?
In ESX , the one path that is down, is this the "main" ip ?
Why is there recovery happening, were any nodes/osds down ?
The term main ip you mean in ESX this is/was the active i/o path whereas the rest of paths are active failover ?
In ESX , the one path that is down, is this the "main" ip ?
Why is there recovery happening, were any nodes/osds down ?
therm
121 Posts
Quote from therm on September 18, 2017, 11:17 amWith main ip I mean for example:
inet 192.168.4.24/24 scope global eth5
inet 192.168.4.27/24 scope global secondary eth5
inet 192.168.4.28/24 scope global secondary eth5
inet 192.168.4.21/24 scope global secondary eth5
inet 192.168.4.25/24 scope global secondary eth5
inet 192.168.4.20/24 scope global secondary eth5Because 192.168.4.24 is not a secondary ip. If I try to move those ips, all other secondary ips gone away but do not recover on other nodes.
Today morning there was another freeze off one ESX. It seems to me that this happens when petasan servers are overloaded. In this case the ESX is marked as not responding, has timeout messages on one petasan-lun, but does not reconnect. IPs are reachable. I tried to move paths, but got things like in my first post above. I restarted this server and added a disk. So it is a recovery/backfilling process. The path not reconnecting is not on this server, it is on server1.
No OSDs were down. Just messages like the following in demsg:
[Fri Sep 15 00:01:00 2017] Process accounting resumed
[Fri Sep 15 11:49:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76321186
[Fri Sep 15 11:49:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76321186
[Fri Sep 15 14:02:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76353302
[Fri Sep 15 14:02:52 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76353302
[Fri Sep 15 14:13:01 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76355154
[Fri Sep 15 14:13:01 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76355154
[Fri Sep 15 14:17:05 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76355839
[Fri Sep 15 14:17:05 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76355839
[Fri Sep 15 14:28:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76373676
[Fri Sep 15 14:28:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76373676
[Fri Sep 15 14:33:19 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76389604
[Fri Sep 15 14:33:19 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76389604
[Fri Sep 15 14:43:28 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76409064
[Fri Sep 15 14:43:28 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76409064
[Fri Sep 15 14:47:31 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76409989
[Fri Sep 15 14:47:31 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76409989
[Fri Sep 15 14:48:32 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76410200
[Fri Sep 15 14:48:32 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76410200
[Fri Sep 15 14:52:36 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76410941
[Fri Sep 15 14:52:36 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76410941
[Fri Sep 15 15:25:12 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76465507
[Fri Sep 15 15:25:12 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76465507
[Fri Sep 15 15:34:20 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76467224
[Fri Sep 15 15:34:20 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76467224
[Fri Sep 15 15:46:30 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76470739
[Fri Sep 15 15:46:30 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76470739
[Fri Sep 15 15:56:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76472483
[Fri Sep 15 15:56:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76472483
[Fri Sep 15 16:03:45 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76474447
[Fri Sep 15 16:03:45 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76474447
[Fri Sep 15 16:11:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76477603
[Fri Sep 15 16:11:52 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76477603
[Fri Sep 15 16:12:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76478040
[Fri Sep 15 16:12:52 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76478040
[Fri Sep 15 16:13:53 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76478425
[Fri Sep 15 16:13:53 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76478425
[Fri Sep 15 16:44:20 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76554367
[Fri Sep 15 16:44:20 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76554367
[Fri Sep 15 16:53:28 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76555949
[Fri Sep 15 16:53:28 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76555949
[Fri Sep 15 17:56:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76652952
[Fri Sep 15 17:56:21 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76652952
[Fri Sep 15 18:10:37 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76700300
[Fri Sep 15 18:10:37 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76700300
[Fri Sep 15 18:13:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76707468
[Fri Sep 15 18:13:23 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76707468
[Fri Sep 15 18:16:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76717224
[Fri Sep 15 18:16:22 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76717224
[Fri Sep 15 18:24:07 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76752051
[Fri Sep 15 18:24:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76752051
[Fri Sep 15 18:24:23 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76752710
[Fri Sep 15 18:24:23 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76752710
[Fri Sep 15 18:26:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76759904
[Fri Sep 15 18:26:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76759904
[Fri Sep 15 18:28:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76766387
[Fri Sep 15 18:28:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76766387
[Fri Sep 15 18:29:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76768649
[Fri Sep 15 18:29:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76768649
[Fri Sep 15 18:38:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76791937
[Fri Sep 15 18:38:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76791937
[Fri Sep 15 18:48:34 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76816265
[Fri Sep 15 18:48:34 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76816265
[Fri Sep 15 19:05:48 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76837793
[Fri Sep 15 20:58:47 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76941442
[Fri Sep 15 20:58:47 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76941442
[Fri Sep 15 23:11:19 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76981682
[Fri Sep 15 23:11:19 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76981682
[Fri Sep 15 23:56:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77005026
[Fri Sep 15 23:56:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77005026
[Sat Sep 16 00:01:03 2017] Process accounting resumed
[Sat Sep 16 02:11:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77058120
[Sat Sep 16 02:11:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77058120
[Sat Sep 16 02:33:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77062716
[Sat Sep 16 02:33:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77062716
[Sat Sep 16 04:01:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77197934
[Sat Sep 16 04:01:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77197934
[Sat Sep 16 04:46:31 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77204741
[Sat Sep 16 04:46:31 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77204741
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77207592
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77207592
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77207595
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77207595
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77210428
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77210430
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77210430
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77210431
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77210431
[Sat Sep 16 06:36:30 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77412284
[Sat Sep 16 06:36:30 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77412284
[Sat Sep 16 10:03:33 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77703827
[Sat Sep 16 10:03:33 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77703827
[Sat Sep 16 10:36:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77736601
[Sat Sep 16 10:36:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77736601
[Sat Sep 16 11:06:33 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77741337
[Sat Sep 16 11:06:33 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77741337
[Sat Sep 16 20:15:57 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Sat Sep 16 21:53:10 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78202365
[Sat Sep 16 21:53:10 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78202365
[Sat Sep 16 21:56:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78205358
[Sat Sep 16 21:56:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78205358
[Sun Sep 17 00:01:05 2017] traps: atop[50325] trap divide error ip:4073c2 sp:7ffeb48906a0 error:0 in atop[400000+26000]traps:
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78301465
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78301465
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78301468
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78301468
[Sun Sep 17 03:40:50 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78353280
[Sun Sep 17 03:40:50 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78353280
[Sun Sep 17 03:47:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78353864
[Sun Sep 17 03:47:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78353864
[Sun Sep 17 04:45:32 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78375794
[Sun Sep 17 04:45:32 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78375794
[Sun Sep 17 05:23:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78437742
[Sun Sep 17 05:23:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78437742
[Sun Sep 17 07:25:26 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78602946
[Sun Sep 17 07:25:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78602946
[Sun Sep 17 08:04:27 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78646736
[Sun Sep 17 08:04:27 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78646737
[Sun Sep 17 09:14:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78707964
[Sun Sep 17 09:14:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78707964
[Sun Sep 17 09:16:26 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Sun Sep 17 11:20:00 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78826906
[Sun Sep 17 11:20:00 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78826906
[Sun Sep 17 13:21:48 2017] ABORT_TASK: Found referenced iSCSI task_tag: 79061065
[Sun Sep 17 13:21:48 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 79061065
[Sun Sep 17 13:46:36 2017] ABORT_TASK: Found referenced iSCSI task_tag: 79065611
[Sun Sep 17 13:46:36 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 79065611
[Mon Sep 18 00:01:07 2017] Process accounting resumed
[Mon Sep 18 04:30:42 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Mon Sep 18 04:46:33 2017] COMPARE_AND_WRITE: miscompare at offset 0
With main ip I mean for example:
inet 192.168.4.24/24 scope global eth5
inet 192.168.4.27/24 scope global secondary eth5
inet 192.168.4.28/24 scope global secondary eth5
inet 192.168.4.21/24 scope global secondary eth5
inet 192.168.4.25/24 scope global secondary eth5
inet 192.168.4.20/24 scope global secondary eth5
Because 192.168.4.24 is not a secondary ip. If I try to move those ips, all other secondary ips gone away but do not recover on other nodes.
Today morning there was another freeze off one ESX. It seems to me that this happens when petasan servers are overloaded. In this case the ESX is marked as not responding, has timeout messages on one petasan-lun, but does not reconnect. IPs are reachable. I tried to move paths, but got things like in my first post above. I restarted this server and added a disk. So it is a recovery/backfilling process. The path not reconnecting is not on this server, it is on server1.
No OSDs were down. Just messages like the following in demsg:
[Fri Sep 15 00:01:00 2017] Process accounting resumed
[Fri Sep 15 11:49:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76321186
[Fri Sep 15 11:49:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76321186
[Fri Sep 15 14:02:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76353302
[Fri Sep 15 14:02:52 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76353302
[Fri Sep 15 14:13:01 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76355154
[Fri Sep 15 14:13:01 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76355154
[Fri Sep 15 14:17:05 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76355839
[Fri Sep 15 14:17:05 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76355839
[Fri Sep 15 14:28:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76373676
[Fri Sep 15 14:28:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76373676
[Fri Sep 15 14:33:19 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76389604
[Fri Sep 15 14:33:19 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76389604
[Fri Sep 15 14:43:28 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76409064
[Fri Sep 15 14:43:28 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76409064
[Fri Sep 15 14:47:31 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76409989
[Fri Sep 15 14:47:31 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76409989
[Fri Sep 15 14:48:32 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76410200
[Fri Sep 15 14:48:32 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76410200
[Fri Sep 15 14:52:36 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76410941
[Fri Sep 15 14:52:36 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76410941
[Fri Sep 15 15:25:12 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76465507
[Fri Sep 15 15:25:12 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76465507
[Fri Sep 15 15:34:20 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76467224
[Fri Sep 15 15:34:20 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76467224
[Fri Sep 15 15:46:30 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76470739
[Fri Sep 15 15:46:30 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76470739
[Fri Sep 15 15:56:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76472483
[Fri Sep 15 15:56:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76472483
[Fri Sep 15 16:03:45 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76474447
[Fri Sep 15 16:03:45 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76474447
[Fri Sep 15 16:11:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76477603
[Fri Sep 15 16:11:52 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76477603
[Fri Sep 15 16:12:52 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76478040
[Fri Sep 15 16:12:52 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76478040
[Fri Sep 15 16:13:53 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76478425
[Fri Sep 15 16:13:53 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76478425
[Fri Sep 15 16:44:20 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76554367
[Fri Sep 15 16:44:20 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76554367
[Fri Sep 15 16:53:28 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76555949
[Fri Sep 15 16:53:28 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76555949
[Fri Sep 15 17:56:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76652952
[Fri Sep 15 17:56:21 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76652952
[Fri Sep 15 18:10:37 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76700300
[Fri Sep 15 18:10:37 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76700300
[Fri Sep 15 18:13:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76707468
[Fri Sep 15 18:13:23 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76707468
[Fri Sep 15 18:16:21 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76717224
[Fri Sep 15 18:16:22 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76717224
[Fri Sep 15 18:24:07 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76752051
[Fri Sep 15 18:24:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76752051
[Fri Sep 15 18:24:23 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76752710
[Fri Sep 15 18:24:23 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76752710
[Fri Sep 15 18:26:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76759904
[Fri Sep 15 18:26:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76759904
[Fri Sep 15 18:28:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76766387
[Fri Sep 15 18:28:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76766387
[Fri Sep 15 18:29:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76768649
[Fri Sep 15 18:29:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76768649
[Fri Sep 15 18:38:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76791937
[Fri Sep 15 18:38:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76791937
[Fri Sep 15 18:48:34 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76816265
[Fri Sep 15 18:48:34 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76816265
[Fri Sep 15 19:05:48 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76837793
[Fri Sep 15 20:58:47 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76941442
[Fri Sep 15 20:58:47 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 76941442
[Fri Sep 15 23:11:19 2017] ABORT_TASK: Found referenced iSCSI task_tag: 76981682
[Fri Sep 15 23:11:19 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 76981682
[Fri Sep 15 23:56:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77005026
[Fri Sep 15 23:56:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77005026
[Sat Sep 16 00:01:03 2017] Process accounting resumed
[Sat Sep 16 02:11:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77058120
[Sat Sep 16 02:11:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77058120
[Sat Sep 16 02:33:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77062716
[Sat Sep 16 02:33:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77062716
[Sat Sep 16 04:01:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77197934
[Sat Sep 16 04:01:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77197934
[Sat Sep 16 04:46:31 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77204741
[Sat Sep 16 04:46:31 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77204741
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77207592
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77207592
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77207595
[Sat Sep 16 05:06:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77207595
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77210428
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77210430
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77210430
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77210431
[Sat Sep 16 05:06:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77210431
[Sat Sep 16 06:36:30 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77412284
[Sat Sep 16 06:36:30 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77412284
[Sat Sep 16 10:03:33 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77703827
[Sat Sep 16 10:03:33 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77703827
[Sat Sep 16 10:36:39 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77736601
[Sat Sep 16 10:36:39 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 77736601
[Sat Sep 16 11:06:33 2017] ABORT_TASK: Found referenced iSCSI task_tag: 77741337
[Sat Sep 16 11:06:33 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 77741337
[Sat Sep 16 20:15:57 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Sat Sep 16 21:53:10 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78202365
[Sat Sep 16 21:53:10 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78202365
[Sat Sep 16 21:56:25 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78205358
[Sat Sep 16 21:56:25 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78205358
[Sun Sep 17 00:01:05 2017] traps: atop[50325] trap divide error ip:4073c2 sp:7ffeb48906a0 error:0 in atop[400000+26000]traps:
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78301465
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78301465
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78301468
[Sun Sep 17 01:16:38 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78301468
[Sun Sep 17 03:40:50 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78353280
[Sun Sep 17 03:40:50 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78353280
[Sun Sep 17 03:47:08 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78353864
[Sun Sep 17 03:47:08 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78353864
[Sun Sep 17 04:45:32 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78375794
[Sun Sep 17 04:45:32 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78375794
[Sun Sep 17 05:23:35 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78437742
[Sun Sep 17 05:23:35 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78437742
[Sun Sep 17 07:25:26 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78602946
[Sun Sep 17 07:25:26 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78602946
[Sun Sep 17 08:04:27 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78646736
[Sun Sep 17 08:04:27 2017] ABORT_TASK: Sending TMR_TASK_DOES_NOT_EXIST for ref_tag: 78646737
[Sun Sep 17 09:14:15 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78707964
[Sun Sep 17 09:14:15 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78707964
[Sun Sep 17 09:16:26 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Sun Sep 17 11:20:00 2017] ABORT_TASK: Found referenced iSCSI task_tag: 78826906
[Sun Sep 17 11:20:00 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 78826906
[Sun Sep 17 13:21:48 2017] ABORT_TASK: Found referenced iSCSI task_tag: 79061065
[Sun Sep 17 13:21:48 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 79061065
[Sun Sep 17 13:46:36 2017] ABORT_TASK: Found referenced iSCSI task_tag: 79065611
[Sun Sep 17 13:46:36 2017] ABORT_TASK: Sending TMR_FUNCTION_COMPLETE for ref_tag: 79065611
[Mon Sep 18 00:01:07 2017] Process accounting resumed
[Mon Sep 18 04:30:42 2017] COMPARE_AND_WRITE: miscompare at offset 0
[Mon Sep 18 04:46:33 2017] COMPARE_AND_WRITE: miscompare at offset 0
admin
2,930 Posts
Quote from admin on September 18, 2017, 11:17 amok i got you, the main ip is the first ip assigned on the nic ! we are looking into this
we are able to reproduce it, i will send you a fix shortly
ok i got you, the main ip is the first ip assigned on the nic ! we are looking into this
we are able to reproduce it, i will send you a fix shortly
admin
2,930 Posts
Quote from admin on September 18, 2017, 11:44 amTo prevent primary deleting other ips:
echo 1 > /proc/sys/net/ipv4/conf/all/promote_secondaries
add to /etc/sysctl.conf
net.ipv4.conf.all.promote_secondaries=1
Now the PetaSAN node running iSCSI/LIO service has the ip configured in LIO but not on its nics,
To find the active paths in LIO:
targetcli ls | grep 192.168
For paths listed as "enabled", need to make sure the ips are configured on the nic, if not add the ip to the nic
ip address add1 92.168.4.28/24 dev ethX
make sure choose the correct ethX nic based on subnet 1 or 2
To prevent primary deleting other ips:
echo 1 > /proc/sys/net/ipv4/conf/all/promote_secondaries
add to /etc/sysctl.conf
net.ipv4.conf.all.promote_secondaries=1
Now the PetaSAN node running iSCSI/LIO service has the ip configured in LIO but not on its nics,
To find the active paths in LIO:
targetcli ls | grep 192.168
For paths listed as "enabled", need to make sure the ips are configured on the nic, if not add the ip to the nic
ip address add1 92.168.4.28/24 dev ethX
make sure choose the correct ethX nic based on subnet 1 or 2
therm
121 Posts
Quote from therm on September 18, 2017, 11:58 amAfter setting this (echo..) and using move_path the ip is on both servers!
root@ceph-node-mru-1:~# ip a |grep 3.21
inet 192.168.3.21/24 scope global eth4
root@ceph-node-mru-1:~# ip addr del 192.168.3.21/24 dev eth4
root@ceph-node-mru-1:~# ip a |grep 3.21
root@ceph-node-mru-1:~# targetcli ls | grep 192.168.3.21
| | | o- 192.168.3.21:3260 ................................................................................. [OK, iser disabled]
root@ceph-node-mru-2:~# ip a |grep 3.21
inet 192.168.3.21/24 scope global secondary eth4
root@ceph-node-mru-2:~# targetcli ls | grep 192.168.3.21
| | | o- 192.168.3.21:3260 ................................................................................. [OK, iser disabled]
After setting this (echo..) and using move_path the ip is on both servers!
root@ceph-node-mru-1:~# ip a |grep 3.21
inet 192.168.3.21/24 scope global eth4
root@ceph-node-mru-1:~# ip addr del 192.168.3.21/24 dev eth4
root@ceph-node-mru-1:~# ip a |grep 3.21
root@ceph-node-mru-1:~# targetcli ls | grep 192.168.3.21
| | | o- 192.168.3.21:3260 ................................................................................. [OK, iser disabled]
root@ceph-node-mru-2:~# ip a |grep 3.21
inet 192.168.3.21/24 scope global secondary eth4
root@ceph-node-mru-2:~# targetcli ls | grep 192.168.3.21
| | | o- 192.168.3.21:3260 ................................................................................. [OK, iser disabled]
admin
2,930 Posts
Quote from admin on September 18, 2017, 12:03 pmThis is ok, you need to check for "enabled" as per my prev post.
fyi they are listed in LIO but disabled for several reason like to support iSCSI discovery (so when you discover using one path it knows about the other paths ) as well as make it faster to activate it to enable when switching.
Only 1 server should have this "enabled" in LIO, the rest should be disabled.
This is ok, you need to check for "enabled" as per my prev post.
fyi they are listed in LIO but disabled for several reason like to support iSCSI discovery (so when you discover using one path it knows about the other paths ) as well as make it faster to activate it to enable when switching.
Only 1 server should have this "enabled" in LIO, the rest should be disabled.
therm
121 Posts
Quote from therm on September 18, 2017, 12:17 pmOk, that refers to LIO,yes? But ips on interfaces should not be on both servers, shouldn't they?
Ok, that refers to LIO,yes? But ips on interfaces should not be on both servers, shouldn't they?