v3.0.1 - No graph statistics from 4 and 5 nodes
aslyotov
3 Posts
February 19, 2022, 3:44 amQuote from aslyotov on February 19, 2022, 3:44 amHi!
I installed PetaSAN 3.0.1 cluster with 5 nodes: the first three nodes have MON+OSD roles, but the last two nodes have only OSD roles.
If I choose "Dashboard" -> "View Chart:" -> select -> "Node statistics", and then select in the "Node:" drop-down list any of my first three nodes - everything is OK. I see graphs.
But if I choose in the "Node:" drop-down list any of my last two nodes (4 or 5) I see empty graphs. No errors. No messages about "no data". Just empty graphs.
I'm going to SSH console of 4 and 5 nodes and check gluster peers, and didn't see it
# gluster peer status
Number of Peers: 0
This instruction: https://www.petasan.org/forums/?view=thread&id=181&part=2#postid-871 doesn't help me. 🙁
Please, tell me how to fix this problem?
Self fixed: I reboot all 5 nodes and graphs start working. Reboot everything is not good solution as I think, but right now it is only solution I found. 🙁
Hi!
I installed PetaSAN 3.0.1 cluster with 5 nodes: the first three nodes have MON+OSD roles, but the last two nodes have only OSD roles.
If I choose "Dashboard" -> "View Chart:" -> select -> "Node statistics", and then select in the "Node:" drop-down list any of my first three nodes - everything is OK. I see graphs.
But if I choose in the "Node:" drop-down list any of my last two nodes (4 or 5) I see empty graphs. No errors. No messages about "no data". Just empty graphs.
I'm going to SSH console of 4 and 5 nodes and check gluster peers, and didn't see it
# gluster peer status
Number of Peers: 0
This instruction: https://www.petasan.org/forums/?view=thread&id=181&part=2#postid-871 doesn't help me. 🙁
Please, tell me how to fix this problem?
Self fixed: I reboot all 5 nodes and graphs start working. Reboot everything is not good solution as I think, but right now it is only solution I found. 🙁
Last edited on February 19, 2022, 4:16 am by aslyotov · #1
admin
2,918 Posts
February 20, 2022, 11:00 amQuote from admin on February 20, 2022, 11:00 amThanks a lot for your feedback. We did find a bug and will be included in next bug fix release. which is due in a couple of days. It is related to Ubuntu 20.04 now using a different version of netcat based on bsd which is slightly different, netcat is used by the node stats so communicate to the stats server, it does not behave the same in case of failover.
gluster peer status is correct to work only on first 3 nodes, this is the server component of gluster where it stores the shared data for stats, nodes 4 and above are clients only.
Thanks a lot for your feedback. We did find a bug and will be included in next bug fix release. which is due in a couple of days. It is related to Ubuntu 20.04 now using a different version of netcat based on bsd which is slightly different, netcat is used by the node stats so communicate to the stats server, it does not behave the same in case of failover.
gluster peer status is correct to work only on first 3 nodes, this is the server component of gluster where it stores the shared data for stats, nodes 4 and above are clients only.
admin
2,918 Posts
February 23, 2022, 2:48 pmQuote from admin on February 23, 2022, 2:48 pmIssue has been fixed in Release 3.0.2.
Issue has been fixed in Release 3.0.2.
aslyotov
3 Posts
February 25, 2022, 2:48 amQuote from aslyotov on February 25, 2022, 2:48 amI upgraded my PetaSAN cluster to 3.0.2 (and reboot each node after upgrade), but it doesn't help.
Some graphs like "Disk utilization", "Disk IOPS", "Disk throughput" randomly stop showing data on random node.
I didn't find any order of this malfunction yet, but it can happens on any node at any time.
Guys, your PetaSAN is really great and very useful product, but it's monitoring features is poore and unstable. May be you'll think about rebuilding this feature at all?
I do not think what it is very complex task. You can use Prometheus+Node Exporter+any other exporters+Grafana solution for example to get and show any metrics you need.
I upgraded my PetaSAN cluster to 3.0.2 (and reboot each node after upgrade), but it doesn't help.
Some graphs like "Disk utilization", "Disk IOPS", "Disk throughput" randomly stop showing data on random node.
I didn't find any order of this malfunction yet, but it can happens on any node at any time.
Guys, your PetaSAN is really great and very useful product, but it's monitoring features is poore and unstable. May be you'll think about rebuilding this feature at all?
I do not think what it is very complex task. You can use Prometheus+Node Exporter+any other exporters+Grafana solution for example to get and show any metrics you need.
admin
2,918 Posts
February 25, 2022, 3:40 pmQuote from admin on February 25, 2022, 3:40 pmYour issue not related to the fix we did in 3.0.2. It could be many issue: network connectivity, hardware load, not enough ram...
When you see the issue, do all stats from a specific node stop ? or some stats show up but not others on that node ? do they work after a while or once they stop they do not work ?
On a node that is not working:
what is status of
systemctl status petasan-node-stats
do you see any errors in /opt/petasan/log/PetaSAN.log ?
Try to manually write a fake 50% cpu from this node and see if we get errors and if it shows up on chart
first get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.ps-node-01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003
The current stats module has been wroking for a long time and has been stable, we use grafana/graphite/carbon stack which is quite robust, in addition it supports high availabilty with a 3x data redundancy provided via Gluster shared filesystem external from Ceph. We need to understand more the issue you have.
Your issue not related to the fix we did in 3.0.2. It could be many issue: network connectivity, hardware load, not enough ram...
When you see the issue, do all stats from a specific node stop ? or some stats show up but not others on that node ? do they work after a while or once they stop they do not work ?
On a node that is not working:
what is status of
systemctl status petasan-node-stats
do you see any errors in /opt/petasan/log/PetaSAN.log ?
Try to manually write a fake 50% cpu from this node and see if we get errors and if it shows up on chart
first get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.ps-node-01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003
The current stats module has been wroking for a long time and has been stable, we use grafana/graphite/carbon stack which is quite robust, in addition it supports high availabilty with a 3x data redundancy provided via Gluster shared filesystem external from Ceph. We need to understand more the issue you have.
Last edited on February 25, 2022, 3:45 pm by admin · #5
admin
2,918 Posts
March 3, 2022, 9:18 amQuote from admin on March 3, 2022, 9:18 amany chance testing the above ?
any chance testing the above ?
aslyotov
3 Posts
March 3, 2022, 1:31 pmQuote from aslyotov on March 3, 2022, 1:31 pmSorry, right now I decided to install the native Ceph Pacific v16.2.7 on my 5 nodes.
PetaSan is great product, but it's unstable statistic graphs are spoiling everything(
Sorry, right now I decided to install the native Ceph Pacific v16.2.7 on my 5 nodes.
PetaSan is great product, but it's unstable statistic graphs are spoiling everything(
v3.0.1 - No graph statistics from 4 and 5 nodes
aslyotov
3 Posts
Quote from aslyotov on February 19, 2022, 3:44 amHi!
I installed PetaSAN 3.0.1 cluster with 5 nodes: the first three nodes have MON+OSD roles, but the last two nodes have only OSD roles.
If I choose "Dashboard" -> "View Chart:" -> select -> "Node statistics", and then select in the "Node:" drop-down list any of my first three nodes - everything is OK. I see graphs.
But if I choose in the "Node:" drop-down list any of my last two nodes (4 or 5) I see empty graphs. No errors. No messages about "no data". Just empty graphs.
I'm going to SSH console of 4 and 5 nodes and check gluster peers, and didn't see it
# gluster peer status
Number of Peers: 0This instruction: https://www.petasan.org/forums/?view=thread&id=181&part=2#postid-871 doesn't help me. 🙁
Please, tell me how to fix this problem?
Self fixed: I reboot all 5 nodes and graphs start working. Reboot everything is not good solution as I think, but right now it is only solution I found. 🙁
Hi!
I installed PetaSAN 3.0.1 cluster with 5 nodes: the first three nodes have MON+OSD roles, but the last two nodes have only OSD roles.
If I choose "Dashboard" -> "View Chart:" -> select -> "Node statistics", and then select in the "Node:" drop-down list any of my first three nodes - everything is OK. I see graphs.
But if I choose in the "Node:" drop-down list any of my last two nodes (4 or 5) I see empty graphs. No errors. No messages about "no data". Just empty graphs.
I'm going to SSH console of 4 and 5 nodes and check gluster peers, and didn't see it
# gluster peer status
Number of Peers: 0
This instruction: https://www.petasan.org/forums/?view=thread&id=181&part=2#postid-871 doesn't help me. 🙁
Please, tell me how to fix this problem?
Self fixed: I reboot all 5 nodes and graphs start working. Reboot everything is not good solution as I think, but right now it is only solution I found. 🙁
admin
2,918 Posts
Quote from admin on February 20, 2022, 11:00 amThanks a lot for your feedback. We did find a bug and will be included in next bug fix release. which is due in a couple of days. It is related to Ubuntu 20.04 now using a different version of netcat based on bsd which is slightly different, netcat is used by the node stats so communicate to the stats server, it does not behave the same in case of failover.
gluster peer status is correct to work only on first 3 nodes, this is the server component of gluster where it stores the shared data for stats, nodes 4 and above are clients only.
Thanks a lot for your feedback. We did find a bug and will be included in next bug fix release. which is due in a couple of days. It is related to Ubuntu 20.04 now using a different version of netcat based on bsd which is slightly different, netcat is used by the node stats so communicate to the stats server, it does not behave the same in case of failover.
gluster peer status is correct to work only on first 3 nodes, this is the server component of gluster where it stores the shared data for stats, nodes 4 and above are clients only.
admin
2,918 Posts
Quote from admin on February 23, 2022, 2:48 pmIssue has been fixed in Release 3.0.2.
Issue has been fixed in Release 3.0.2.
aslyotov
3 Posts
Quote from aslyotov on February 25, 2022, 2:48 amI upgraded my PetaSAN cluster to 3.0.2 (and reboot each node after upgrade), but it doesn't help.
Some graphs like "Disk utilization", "Disk IOPS", "Disk throughput" randomly stop showing data on random node.
I didn't find any order of this malfunction yet, but it can happens on any node at any time.
Guys, your PetaSAN is really great and very useful product, but it's monitoring features is poore and unstable. May be you'll think about rebuilding this feature at all?
I do not think what it is very complex task. You can use Prometheus+Node Exporter+any other exporters+Grafana solution for example to get and show any metrics you need.
I upgraded my PetaSAN cluster to 3.0.2 (and reboot each node after upgrade), but it doesn't help.
Some graphs like "Disk utilization", "Disk IOPS", "Disk throughput" randomly stop showing data on random node.
I didn't find any order of this malfunction yet, but it can happens on any node at any time.
Guys, your PetaSAN is really great and very useful product, but it's monitoring features is poore and unstable. May be you'll think about rebuilding this feature at all?
I do not think what it is very complex task. You can use Prometheus+Node Exporter+any other exporters+Grafana solution for example to get and show any metrics you need.
admin
2,918 Posts
Quote from admin on February 25, 2022, 3:40 pmYour issue not related to the fix we did in 3.0.2. It could be many issue: network connectivity, hardware load, not enough ram...
When you see the issue, do all stats from a specific node stop ? or some stats show up but not others on that node ? do they work after a while or once they stop they do not work ?
On a node that is not working:
what is status of
systemctl status petasan-node-statsdo you see any errors in /opt/petasan/log/PetaSAN.log ?
Try to manually write a fake 50% cpu from this node and see if we get errors and if it shows up on chart
first get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.pythen send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.ps-node-01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003The current stats module has been wroking for a long time and has been stable, we use grafana/graphite/carbon stack which is quite robust, in addition it supports high availabilty with a 3x data redundancy provided via Gluster shared filesystem external from Ceph. We need to understand more the issue you have.
Your issue not related to the fix we did in 3.0.2. It could be many issue: network connectivity, hardware load, not enough ram...
When you see the issue, do all stats from a specific node stop ? or some stats show up but not others on that node ? do they work after a while or once they stop they do not work ?
On a node that is not working:
what is status of
systemctl status petasan-node-stats
do you see any errors in /opt/petasan/log/PetaSAN.log ?
Try to manually write a fake 50% cpu from this node and see if we get errors and if it shows up on chart
first get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.ps-node-01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003
The current stats module has been wroking for a long time and has been stable, we use grafana/graphite/carbon stack which is quite robust, in addition it supports high availabilty with a 3x data redundancy provided via Gluster shared filesystem external from Ceph. We need to understand more the issue you have.
admin
2,918 Posts
Quote from admin on March 3, 2022, 9:18 amany chance testing the above ?
any chance testing the above ?
aslyotov
3 Posts
Quote from aslyotov on March 3, 2022, 1:31 pmSorry, right now I decided to install the native Ceph Pacific v16.2.7 on my 5 nodes.
PetaSan is great product, but it's unstable statistic graphs are spoiling everything(
Sorry, right now I decided to install the native Ceph Pacific v16.2.7 on my 5 nodes.
PetaSan is great product, but it's unstable statistic graphs are spoiling everything(