iSCSI drives stops when a PetaSAN node is put off-line and then is back up online again
JG
26 Posts
July 29, 2020, 3:49 pmQuote from JG on July 29, 2020, 3:49 pmHi,
We are having the following problem with our cluster. The iSCSI drives work without problems even when a node is taken offline (we shut it down or manually restart it) but when the node is powered on and it is online again all the iSCSI disks stops. We have tested disabling the Fencing option in the Maintenance Section but the problem still persists.
Below is the output of the PetaSAN logs from the three nodes of the cluster.
NODE1
28/07/2020 08:58:44 INFO PetaSAN cleaned iqns.
28/07/2020 08:58:44 INFO Image image-00004 unmapped successfully.
28/07/2020 08:58:44 INFO LIO deleted Target iqn.2016-05.com.petasan:00004
28/07/2020 08:58:44 INFO LIO deleted backstore image image-00004
28/07/2020 08:58:43 INFO Image image-00005 unmapped successfully.
28/07/2020 08:58:43 INFO LIO deleted Target iqn.2016-05.com.petasan:00005
28/07/2020 08:58:43 INFO LIO deleted backstore image image-00005
28/07/2020 08:58:43 INFO PetaSAN cleaned local paths not locked by this node in consul.
28/07/2020 08:58:43 INFO Cleaned disk path 00004/1.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/2.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/3.
28/07/2020 08:49:19 INFO Path 00005/2 acquired successfully
28/07/2020 08:49:14 INFO The path 00005/2 was locked by ceph-node2.
28/07/2020 08:49:14 INFO Found pool:rbd for disk:00005 via consul
28/07/2020 06:25:13 INFO GlusterFS mount attempt
NODE2
28/07/2020 08:59:26 INFO CIFS check_health ctdb not active, restarting.
28/07/2020 08:59:26 INFO CIFSService key change action
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node2 down
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node3 down
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node1 down
28/07/2020 08:58:45 INFO LeaderElectionBase successfully dropped old sessions
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.eth0_received 20.48 `date +%s`" | nc -q0 192.168.0.212 2003
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.eth0 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_transmitted 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_received 40.96 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond1 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_transmitted 186777.6 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_received 101611.52 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond0 0.01 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.memory.percent_util 2.18 `date +%s`" "
Exception: Error running echo command :echo "PetaSAN.NodeStats.ceph-node2.cpu_all.percent_util 6.39 `date +%s`" "
raise Exception("Error running echo command :" + cmd)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
graphite_sender.send(leader_ip)
File "/opt/petasan/scripts/node_stats.py", line 64, in get_stats
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 159, in <module>
Traceback (most recent call last):
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.eth0_received 20.48 `date +%s`" | nc -q0 192.168.0.212 2003
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.eth0 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_transmitted 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_received 40.96 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond1 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_transmitted 186777.6 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_received 101611.52 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond0 0.01 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.memory.percent_util 2.18 `date +%s`" "
28/07/2020 08:58:43 ERROR Error running echo command :echo "PetaSAN.NodeStats.ceph-node2.cpu_all.percent_util 6.39 `date +%s`" "
28/07/2020 08:58:43 ERROR Node Stats exception.
28/07/2020 08:58:40 INFO Service is starting.
28/07/2020 08:58:40 INFO Cluster is just starting, system will delete all active disk resources
28/07/2020 08:58:39 INFO sync_replication_node completed
28/07/2020 08:58:39 INFO syncing replication users ok
28/07/2020 08:58:39 INFO syncing cron ok
28/07/2020 08:58:37 INFO CIFSService init action
28/07/2020 08:58:37 INFO sync_replication_node starting
28/07/2020 08:58:37 INFO Starting Config Upload service
28/07/2020 08:58:37 INFO Starting CIFS Service
28/07/2020 08:58:37 INFO Starting petasan tuning service
28/07/2020 08:58:36 INFO Starting sync replication node service
28/07/2020 08:58:36 INFO Starting OSDs
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stdout ceph-33eb9917-3356-4a8d-961b-560d08cb8c82";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO stdout ceph-25661c57-e624-40b6-ba30-d23a5e6fc4d2";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO Running command: /sbin/vgs --noheadings --readonly --units=g --separator=";" -o vg_name,pv_count,lv_count,snap_count,vg_attr,vg_size,vg_free,vg_free_count
28/07/2020 08:58:36 INFO Starting activating PetaSAN lvs
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stdout ceph-33eb9917-3356-4a8d-961b-560d08cb8c82";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO stdout ceph-25661c57-e624-40b6-ba30-d23a5e6fc4d2";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:35 INFO LeaderElectionBase dropping old sessions
28/07/2020 08:58:35 INFO Running command: /sbin/vgs --noheadings --readonly --units=g --separator=";" -o vg_name,pv_count,lv_count,snap_count,vg_attr,vg_size,vg_free,vg_free_count
28/07/2020 08:58:35 INFO Starting activating PetaSAN lvs
28/07/2020 08:58:35 INFO Starting Node Stats Service
28/07/2020 08:58:35 INFO Starting Cluster Management application
28/07/2020 08:58:35 INFO Starting iSCSI Service
28/07/2020 08:58:35 INFO Starting cluster file sync service
28/07/2020 08:58:33 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.11.12.2 -retry-join 10.11.12.1 -retry-join 10.11.12.3
28/07/2020 08:58:22 INFO GlusterFS mount attempt
28/07/2020 08:58:16 INFO Start settings IPs
28/07/2020 06:25:18 INFO GlusterFS mount attempt
NODE3
28/07/2020 08:58:44 INFO PetaSAN cleaned iqns.
28/07/2020 08:58:44 INFO Image image-00004 unmapped successfully.
28/07/2020 08:58:44 INFO LIO deleted Target iqn.2016-05.com.petasan:00004
28/07/2020 08:58:44 INFO LIO deleted backstore image image-00004
28/07/2020 08:58:43 INFO Image image-00005 unmapped successfully.
28/07/2020 08:58:43 INFO LIO deleted Target iqn.2016-05.com.petasan:00005
28/07/2020 08:58:43 INFO LIO deleted backstore image image-00005
28/07/2020 08:58:43 INFO PetaSAN cleaned local paths not locked by this node in consul.
28/07/2020 08:58:43 INFO Cleaned disk path 00004/1.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/2.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/3.
28/07/2020 08:49:19 INFO Path 00005/2 acquired successfully
28/07/2020 08:49:14 INFO The path 00005/2 was locked by ceph-node2.
28/07/2020 08:49:14 INFO Found pool:rbd for disk:00005 via consul
28/07/2020 06:25:13 INFO GlusterFS mount attempt
Thanks in advance for your time!
Hi,
We are having the following problem with our cluster. The iSCSI drives work without problems even when a node is taken offline (we shut it down or manually restart it) but when the node is powered on and it is online again all the iSCSI disks stops. We have tested disabling the Fencing option in the Maintenance Section but the problem still persists.
Below is the output of the PetaSAN logs from the three nodes of the cluster.
NODE1
28/07/2020 08:58:44 INFO PetaSAN cleaned iqns.
28/07/2020 08:58:44 INFO Image image-00004 unmapped successfully.
28/07/2020 08:58:44 INFO LIO deleted Target iqn.2016-05.com.petasan:00004
28/07/2020 08:58:44 INFO LIO deleted backstore image image-00004
28/07/2020 08:58:43 INFO Image image-00005 unmapped successfully.
28/07/2020 08:58:43 INFO LIO deleted Target iqn.2016-05.com.petasan:00005
28/07/2020 08:58:43 INFO LIO deleted backstore image image-00005
28/07/2020 08:58:43 INFO PetaSAN cleaned local paths not locked by this node in consul.
28/07/2020 08:58:43 INFO Cleaned disk path 00004/1.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/2.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/3.
28/07/2020 08:49:19 INFO Path 00005/2 acquired successfully
28/07/2020 08:49:14 INFO The path 00005/2 was locked by ceph-node2.
28/07/2020 08:49:14 INFO Found pool:rbd for disk:00005 via consul
28/07/2020 06:25:13 INFO GlusterFS mount attempt
NODE2
28/07/2020 08:59:26 INFO CIFS check_health ctdb not active, restarting.
28/07/2020 08:59:26 INFO CIFSService key change action
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node2 down
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node3 down
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node1 down
28/07/2020 08:58:45 INFO LeaderElectionBase successfully dropped old sessions
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.eth0_received 20.48 `date +%s`" | nc -q0 192.168.0.212 2003
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.eth0 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_transmitted 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_received 40.96 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond1 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_transmitted 186777.6 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_received 101611.52 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond0 0.01 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.memory.percent_util 2.18 `date +%s`" "
Exception: Error running echo command :echo "PetaSAN.NodeStats.ceph-node2.cpu_all.percent_util 6.39 `date +%s`" "
raise Exception("Error running echo command :" + cmd)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
graphite_sender.send(leader_ip)
File "/opt/petasan/scripts/node_stats.py", line 64, in get_stats
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 159, in <module>
Traceback (most recent call last):
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.eth0_received 20.48 `date +%s`" | nc -q0 192.168.0.212 2003
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.eth0 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_transmitted 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_received 40.96 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond1 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_transmitted 186777.6 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_received 101611.52 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond0 0.01 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.memory.percent_util 2.18 `date +%s`" "
28/07/2020 08:58:43 ERROR Error running echo command :echo "PetaSAN.NodeStats.ceph-node2.cpu_all.percent_util 6.39 `date +%s`" "
28/07/2020 08:58:43 ERROR Node Stats exception.
28/07/2020 08:58:40 INFO Service is starting.
28/07/2020 08:58:40 INFO Cluster is just starting, system will delete all active disk resources
28/07/2020 08:58:39 INFO sync_replication_node completed
28/07/2020 08:58:39 INFO syncing replication users ok
28/07/2020 08:58:39 INFO syncing cron ok
28/07/2020 08:58:37 INFO CIFSService init action
28/07/2020 08:58:37 INFO sync_replication_node starting
28/07/2020 08:58:37 INFO Starting Config Upload service
28/07/2020 08:58:37 INFO Starting CIFS Service
28/07/2020 08:58:37 INFO Starting petasan tuning service
28/07/2020 08:58:36 INFO Starting sync replication node service
28/07/2020 08:58:36 INFO Starting OSDs
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stdout ceph-33eb9917-3356-4a8d-961b-560d08cb8c82";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO stdout ceph-25661c57-e624-40b6-ba30-d23a5e6fc4d2";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO Running command: /sbin/vgs --noheadings --readonly --units=g --separator=";" -o vg_name,pv_count,lv_count,snap_count,vg_attr,vg_size,vg_free,vg_free_count
28/07/2020 08:58:36 INFO Starting activating PetaSAN lvs
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stdout ceph-33eb9917-3356-4a8d-961b-560d08cb8c82";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO stdout ceph-25661c57-e624-40b6-ba30-d23a5e6fc4d2";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:35 INFO LeaderElectionBase dropping old sessions
28/07/2020 08:58:35 INFO Running command: /sbin/vgs --noheadings --readonly --units=g --separator=";" -o vg_name,pv_count,lv_count,snap_count,vg_attr,vg_size,vg_free,vg_free_count
28/07/2020 08:58:35 INFO Starting activating PetaSAN lvs
28/07/2020 08:58:35 INFO Starting Node Stats Service
28/07/2020 08:58:35 INFO Starting Cluster Management application
28/07/2020 08:58:35 INFO Starting iSCSI Service
28/07/2020 08:58:35 INFO Starting cluster file sync service
28/07/2020 08:58:33 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.11.12.2 -retry-join 10.11.12.1 -retry-join 10.11.12.3
28/07/2020 08:58:22 INFO GlusterFS mount attempt
28/07/2020 08:58:16 INFO Start settings IPs
28/07/2020 06:25:18 INFO GlusterFS mount attempt
NODE3
28/07/2020 08:58:44 INFO PetaSAN cleaned iqns.
28/07/2020 08:58:44 INFO Image image-00004 unmapped successfully.
28/07/2020 08:58:44 INFO LIO deleted Target iqn.2016-05.com.petasan:00004
28/07/2020 08:58:44 INFO LIO deleted backstore image image-00004
28/07/2020 08:58:43 INFO Image image-00005 unmapped successfully.
28/07/2020 08:58:43 INFO LIO deleted Target iqn.2016-05.com.petasan:00005
28/07/2020 08:58:43 INFO LIO deleted backstore image image-00005
28/07/2020 08:58:43 INFO PetaSAN cleaned local paths not locked by this node in consul.
28/07/2020 08:58:43 INFO Cleaned disk path 00004/1.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/2.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/3.
28/07/2020 08:49:19 INFO Path 00005/2 acquired successfully
28/07/2020 08:49:14 INFO The path 00005/2 was locked by ceph-node2.
28/07/2020 08:49:14 INFO Found pool:rbd for disk:00005 via consul
28/07/2020 06:25:13 INFO GlusterFS mount attempt
Thanks in advance for your time!
Last edited on July 29, 2020, 3:58 pm by JG · #1
admin
2,930 Posts
July 29, 2020, 4:20 pmQuote from admin on July 29, 2020, 4:20 pmCan you upgrade to 2.6, we have fixed some bugs relating to this.
Can you upgrade to 2.6, we have fixed some bugs relating to this.
JG
26 Posts
July 29, 2020, 4:37 pmQuote from JG on July 29, 2020, 4:37 pmThanks for your answer.
I just upgraded to 2.6 and it works!! Great job!!
I only noticed that once the node that is offline comes back online it came up without no assigned paths, however if I force the auto path assignment on the Paths Assignment tool the paths are assigned fine .
Thanks again!
Thanks for your answer.
I just upgraded to 2.6 and it works!! Great job!!
I only noticed that once the node that is offline comes back online it came up without no assigned paths, however if I force the auto path assignment on the Paths Assignment tool the paths are assigned fine .
Thanks again!
admin
2,930 Posts
July 29, 2020, 4:48 pmQuote from admin on July 29, 2020, 4:48 pmYes PetaSAN does not have a concept of node ownership of resources, if the node fails, its resources are assigned to other nodes, if it comes back it does not take them back, unless as you stated you use the path assignment. in the future we have plans for resource assignments in a dynamic way between nodes based on load stats we gather.
Yes PetaSAN does not have a concept of node ownership of resources, if the node fails, its resources are assigned to other nodes, if it comes back it does not take them back, unless as you stated you use the path assignment. in the future we have plans for resource assignments in a dynamic way between nodes based on load stats we gather.
JG
26 Posts
July 29, 2020, 4:55 pmQuote from JG on July 29, 2020, 4:55 pmThanks for the clarification.
This topic can be considered closed.
Thank you.
Thanks for the clarification.
This topic can be considered closed.
Thank you.
iSCSI drives stops when a PetaSAN node is put off-line and then is back up online again
JG
26 Posts
Quote from JG on July 29, 2020, 3:49 pmHi,
We are having the following problem with our cluster. The iSCSI drives work without problems even when a node is taken offline (we shut it down or manually restart it) but when the node is powered on and it is online again all the iSCSI disks stops. We have tested disabling the Fencing option in the Maintenance Section but the problem still persists.
Below is the output of the PetaSAN logs from the three nodes of the cluster.NODE1
28/07/2020 08:58:44 INFO PetaSAN cleaned iqns.
28/07/2020 08:58:44 INFO Image image-00004 unmapped successfully.
28/07/2020 08:58:44 INFO LIO deleted Target iqn.2016-05.com.petasan:00004
28/07/2020 08:58:44 INFO LIO deleted backstore image image-00004
28/07/2020 08:58:43 INFO Image image-00005 unmapped successfully.
28/07/2020 08:58:43 INFO LIO deleted Target iqn.2016-05.com.petasan:00005
28/07/2020 08:58:43 INFO LIO deleted backstore image image-00005
28/07/2020 08:58:43 INFO PetaSAN cleaned local paths not locked by this node in consul.
28/07/2020 08:58:43 INFO Cleaned disk path 00004/1.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/2.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/3.
28/07/2020 08:49:19 INFO Path 00005/2 acquired successfully
28/07/2020 08:49:14 INFO The path 00005/2 was locked by ceph-node2.
28/07/2020 08:49:14 INFO Found pool:rbd for disk:00005 via consul
28/07/2020 06:25:13 INFO GlusterFS mount attempt
NODE2
28/07/2020 08:59:26 INFO CIFS check_health ctdb not active, restarting.
28/07/2020 08:59:26 INFO CIFSService key change action
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node2 down
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node3 down
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node1 down
28/07/2020 08:58:45 INFO LeaderElectionBase successfully dropped old sessions
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.eth0_received 20.48 `date +%s`" | nc -q0 192.168.0.212 2003
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.eth0 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_transmitted 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_received 40.96 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond1 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_transmitted 186777.6 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_received 101611.52 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond0 0.01 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.memory.percent_util 2.18 `date +%s`" "
Exception: Error running echo command :echo "PetaSAN.NodeStats.ceph-node2.cpu_all.percent_util 6.39 `date +%s`" "
raise Exception("Error running echo command :" + cmd)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
graphite_sender.send(leader_ip)
File "/opt/petasan/scripts/node_stats.py", line 64, in get_stats
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 159, in <module>
Traceback (most recent call last):
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.eth0_received 20.48 `date +%s`" | nc -q0 192.168.0.212 2003
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.eth0 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_transmitted 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_received 40.96 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond1 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_transmitted 186777.6 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_received 101611.52 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond0 0.01 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.memory.percent_util 2.18 `date +%s`" "
28/07/2020 08:58:43 ERROR Error running echo command :echo "PetaSAN.NodeStats.ceph-node2.cpu_all.percent_util 6.39 `date +%s`" "
28/07/2020 08:58:43 ERROR Node Stats exception.
28/07/2020 08:58:40 INFO Service is starting.
28/07/2020 08:58:40 INFO Cluster is just starting, system will delete all active disk resources
28/07/2020 08:58:39 INFO sync_replication_node completed
28/07/2020 08:58:39 INFO syncing replication users ok
28/07/2020 08:58:39 INFO syncing cron ok
28/07/2020 08:58:37 INFO CIFSService init action
28/07/2020 08:58:37 INFO sync_replication_node starting
28/07/2020 08:58:37 INFO Starting Config Upload service
28/07/2020 08:58:37 INFO Starting CIFS Service
28/07/2020 08:58:37 INFO Starting petasan tuning service
28/07/2020 08:58:36 INFO Starting sync replication node service
28/07/2020 08:58:36 INFO Starting OSDs
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stdout ceph-33eb9917-3356-4a8d-961b-560d08cb8c82";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO stdout ceph-25661c57-e624-40b6-ba30-d23a5e6fc4d2";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO Running command: /sbin/vgs --noheadings --readonly --units=g --separator=";" -o vg_name,pv_count,lv_count,snap_count,vg_attr,vg_size,vg_free,vg_free_count
28/07/2020 08:58:36 INFO Starting activating PetaSAN lvs
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stdout ceph-33eb9917-3356-4a8d-961b-560d08cb8c82";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO stdout ceph-25661c57-e624-40b6-ba30-d23a5e6fc4d2";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:35 INFO LeaderElectionBase dropping old sessions
28/07/2020 08:58:35 INFO Running command: /sbin/vgs --noheadings --readonly --units=g --separator=";" -o vg_name,pv_count,lv_count,snap_count,vg_attr,vg_size,vg_free,vg_free_count
28/07/2020 08:58:35 INFO Starting activating PetaSAN lvs
28/07/2020 08:58:35 INFO Starting Node Stats Service
28/07/2020 08:58:35 INFO Starting Cluster Management application
28/07/2020 08:58:35 INFO Starting iSCSI Service
28/07/2020 08:58:35 INFO Starting cluster file sync service
28/07/2020 08:58:33 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.11.12.2 -retry-join 10.11.12.1 -retry-join 10.11.12.3
28/07/2020 08:58:22 INFO GlusterFS mount attempt
28/07/2020 08:58:16 INFO Start settings IPs
28/07/2020 06:25:18 INFO GlusterFS mount attempt
NODE3
28/07/2020 08:58:44 INFO PetaSAN cleaned iqns.
28/07/2020 08:58:44 INFO Image image-00004 unmapped successfully.
28/07/2020 08:58:44 INFO LIO deleted Target iqn.2016-05.com.petasan:00004
28/07/2020 08:58:44 INFO LIO deleted backstore image image-00004
28/07/2020 08:58:43 INFO Image image-00005 unmapped successfully.
28/07/2020 08:58:43 INFO LIO deleted Target iqn.2016-05.com.petasan:00005
28/07/2020 08:58:43 INFO LIO deleted backstore image image-00005
28/07/2020 08:58:43 INFO PetaSAN cleaned local paths not locked by this node in consul.
28/07/2020 08:58:43 INFO Cleaned disk path 00004/1.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/2.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/3.
28/07/2020 08:49:19 INFO Path 00005/2 acquired successfully
28/07/2020 08:49:14 INFO The path 00005/2 was locked by ceph-node2.
28/07/2020 08:49:14 INFO Found pool:rbd for disk:00005 via consul
28/07/2020 06:25:13 INFO GlusterFS mount attempt
Thanks in advance for your time!
Hi,
We are having the following problem with our cluster. The iSCSI drives work without problems even when a node is taken offline (we shut it down or manually restart it) but when the node is powered on and it is online again all the iSCSI disks stops. We have tested disabling the Fencing option in the Maintenance Section but the problem still persists.
Below is the output of the PetaSAN logs from the three nodes of the cluster.
NODE1
28/07/2020 08:58:44 INFO PetaSAN cleaned iqns.
28/07/2020 08:58:44 INFO Image image-00004 unmapped successfully.
28/07/2020 08:58:44 INFO LIO deleted Target iqn.2016-05.com.petasan:00004
28/07/2020 08:58:44 INFO LIO deleted backstore image image-00004
28/07/2020 08:58:43 INFO Image image-00005 unmapped successfully.
28/07/2020 08:58:43 INFO LIO deleted Target iqn.2016-05.com.petasan:00005
28/07/2020 08:58:43 INFO LIO deleted backstore image image-00005
28/07/2020 08:58:43 INFO PetaSAN cleaned local paths not locked by this node in consul.
28/07/2020 08:58:43 INFO Cleaned disk path 00004/1.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/2.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/3.
28/07/2020 08:49:19 INFO Path 00005/2 acquired successfully
28/07/2020 08:49:14 INFO The path 00005/2 was locked by ceph-node2.
28/07/2020 08:49:14 INFO Found pool:rbd for disk:00005 via consul
28/07/2020 06:25:13 INFO GlusterFS mount attempt
NODE2
28/07/2020 08:59:26 INFO CIFS check_health ctdb not active, restarting.
28/07/2020 08:59:26 INFO CIFSService key change action
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node2 down
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node3 down
28/07/2020 08:58:51 WARNING CIFS init degraded Gluster FS : ceph-node1 down
28/07/2020 08:58:45 INFO LeaderElectionBase successfully dropped old sessions
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.eth0_received 20.48 `date +%s`" | nc -q0 192.168.0.212 2003
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.eth0 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_transmitted 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_received 40.96 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond1 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_transmitted 186777.6 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_received 101611.52 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond0 0.01 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.memory.percent_util 2.18 `date +%s`" "
Exception: Error running echo command :echo "PetaSAN.NodeStats.ceph-node2.cpu_all.percent_util 6.39 `date +%s`" "
raise Exception("Error running echo command :" + cmd)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
graphite_sender.send(leader_ip)
File "/opt/petasan/scripts/node_stats.py", line 64, in get_stats
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 159, in <module>
Traceback (most recent call last):
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.eth0_received 20.48 `date +%s`" | nc -q0 192.168.0.212 2003
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.eth0 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_transmitted 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond1_received 40.96 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond1 0.0 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_transmitted 186777.6 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.throughput.bond0_received 101611.52 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.ifaces.percent_util.bond0 0.01 `date +%s`" "
PetaSAN.NodeStats.ceph-node2.memory.percent_util 2.18 `date +%s`" "
28/07/2020 08:58:43 ERROR Error running echo command :echo "PetaSAN.NodeStats.ceph-node2.cpu_all.percent_util 6.39 `date +%s`" "
28/07/2020 08:58:43 ERROR Node Stats exception.
28/07/2020 08:58:40 INFO Service is starting.
28/07/2020 08:58:40 INFO Cluster is just starting, system will delete all active disk resources
28/07/2020 08:58:39 INFO sync_replication_node completed
28/07/2020 08:58:39 INFO syncing replication users ok
28/07/2020 08:58:39 INFO syncing cron ok
28/07/2020 08:58:37 INFO CIFSService init action
28/07/2020 08:58:37 INFO sync_replication_node starting
28/07/2020 08:58:37 INFO Starting Config Upload service
28/07/2020 08:58:37 INFO Starting CIFS Service
28/07/2020 08:58:37 INFO Starting petasan tuning service
28/07/2020 08:58:36 INFO Starting sync replication node service
28/07/2020 08:58:36 INFO Starting OSDs
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stdout ceph-33eb9917-3356-4a8d-961b-560d08cb8c82";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO stdout ceph-25661c57-e624-40b6-ba30-d23a5e6fc4d2";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO Running command: /sbin/vgs --noheadings --readonly --units=g --separator=";" -o vg_name,pv_count,lv_count,snap_count,vg_attr,vg_size,vg_free,vg_free_count
28/07/2020 08:58:36 INFO Starting activating PetaSAN lvs
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stderr /dev/sdd: open failed: No medium found
28/07/2020 08:58:36 INFO stdout ceph-33eb9917-3356-4a8d-961b-560d08cb8c82";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:36 INFO stdout ceph-25661c57-e624-40b6-ba30-d23a5e6fc4d2";"1";"1";"0";"wz--n-";"1788.00g";"0g";"0
28/07/2020 08:58:35 INFO LeaderElectionBase dropping old sessions
28/07/2020 08:58:35 INFO Running command: /sbin/vgs --noheadings --readonly --units=g --separator=";" -o vg_name,pv_count,lv_count,snap_count,vg_attr,vg_size,vg_free,vg_free_count
28/07/2020 08:58:35 INFO Starting activating PetaSAN lvs
28/07/2020 08:58:35 INFO Starting Node Stats Service
28/07/2020 08:58:35 INFO Starting Cluster Management application
28/07/2020 08:58:35 INFO Starting iSCSI Service
28/07/2020 08:58:35 INFO Starting cluster file sync service
28/07/2020 08:58:33 INFO str_start_command: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> consul agent -raft-protocol 2 -config-dir /opt/petasan/config/etc/consul.d/server -bind 10.11.12.2 -retry-join 10.11.12.1 -retry-join 10.11.12.3
28/07/2020 08:58:22 INFO GlusterFS mount attempt
28/07/2020 08:58:16 INFO Start settings IPs
28/07/2020 06:25:18 INFO GlusterFS mount attempt
NODE3
28/07/2020 08:58:44 INFO PetaSAN cleaned iqns.
28/07/2020 08:58:44 INFO Image image-00004 unmapped successfully.
28/07/2020 08:58:44 INFO LIO deleted Target iqn.2016-05.com.petasan:00004
28/07/2020 08:58:44 INFO LIO deleted backstore image image-00004
28/07/2020 08:58:43 INFO Image image-00005 unmapped successfully.
28/07/2020 08:58:43 INFO LIO deleted Target iqn.2016-05.com.petasan:00005
28/07/2020 08:58:43 INFO LIO deleted backstore image image-00005
28/07/2020 08:58:43 INFO PetaSAN cleaned local paths not locked by this node in consul.
28/07/2020 08:58:43 INFO Cleaned disk path 00004/1.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/2.
28/07/2020 08:58:43 INFO Cleaned disk path 00005/3.
28/07/2020 08:49:19 INFO Path 00005/2 acquired successfully
28/07/2020 08:49:14 INFO The path 00005/2 was locked by ceph-node2.
28/07/2020 08:49:14 INFO Found pool:rbd for disk:00005 via consul
28/07/2020 06:25:13 INFO GlusterFS mount attempt
Thanks in advance for your time!
admin
2,930 Posts
Quote from admin on July 29, 2020, 4:20 pmCan you upgrade to 2.6, we have fixed some bugs relating to this.
Can you upgrade to 2.6, we have fixed some bugs relating to this.
JG
26 Posts
Quote from JG on July 29, 2020, 4:37 pmThanks for your answer.
I just upgraded to 2.6 and it works!! Great job!!
I only noticed that once the node that is offline comes back online it came up without no assigned paths, however if I force the auto path assignment on the Paths Assignment tool the paths are assigned fine .Thanks again!
Thanks for your answer.
I just upgraded to 2.6 and it works!! Great job!!
I only noticed that once the node that is offline comes back online it came up without no assigned paths, however if I force the auto path assignment on the Paths Assignment tool the paths are assigned fine .
Thanks again!
admin
2,930 Posts
Quote from admin on July 29, 2020, 4:48 pmYes PetaSAN does not have a concept of node ownership of resources, if the node fails, its resources are assigned to other nodes, if it comes back it does not take them back, unless as you stated you use the path assignment. in the future we have plans for resource assignments in a dynamic way between nodes based on load stats we gather.
Yes PetaSAN does not have a concept of node ownership of resources, if the node fails, its resources are assigned to other nodes, if it comes back it does not take them back, unless as you stated you use the path assignment. in the future we have plans for resource assignments in a dynamic way between nodes based on load stats we gather.
JG
26 Posts
Quote from JG on July 29, 2020, 4:55 pmThanks for the clarification.
This topic can be considered closed.
Thank you.
Thanks for the clarification.
This topic can be considered closed.
Thank you.