Forums - PetaSAN

ForumBug Reportingv3.0.1 - No graph statistics from …
You need to log in to create posts and topics. Login · Register
v3.0.1 - No graph statistics from 4 and 5 nodes

aslyotov
3 Posts

February 19, 2022, 3:44 am
Quote from aslyotov on February 19, 2022, 3:44 am
Hi!

I installed PetaSAN 3.0.1 cluster with 5 nodes: the first three nodes have MON+OSD roles, but the last two nodes have only OSD roles.

If I choose "Dashboard" -> "View Chart:" -> select -> "Node statistics", and then select in the "Node:" drop-down list any of my first three nodes - everything is OK. I see graphs.

But if I choose in the "Node:" drop-down list any of my last two nodes (4 or 5) I see empty graphs. No errors. No messages about "no data". Just empty graphs.

I'm going to SSH console of 4 and 5 nodes and check gluster peers, and didn't see it

# gluster peer status
Number of Peers: 0

This instruction: https://www.petasan.org/forums/?view=thread&id=181&part=2#postid-871 doesn't help me. 🙁

Please, tell me how to fix this problem?

Self fixed: I reboot all 5 nodes and graphs start working. Reboot everything is not good solution as I think, but right now it is only solution I found. 🙁

Hi!

I installed PetaSAN 3.0.1 cluster with 5 nodes: the first three nodes have MON+OSD roles, but the last two nodes have only OSD roles.

If I choose "Dashboard" -> "View Chart:" -> select -> "Node statistics", and then select in the "Node:" drop-down list any of my first three nodes - everything is OK. I see graphs.

But if I choose in the "Node:" drop-down list any of my last two nodes (4 or 5) I see empty graphs. No errors. No messages about "no data". Just empty graphs.

I'm going to SSH console of 4 and 5 nodes and check gluster peers, and didn't see it

# gluster peer status
Number of Peers: 0

This instruction: https://www.petasan.org/forums/?view=thread&id=181&part=2#postid-871 doesn't help me. 🙁

Please, tell me how to fix this problem?

Self fixed: I reboot all 5 nodes and graphs start working. Reboot everything is not good solution as I think, but right now it is only solution I found. 🙁

Last edited on February 19, 2022, 4:16 am by aslyotov · #1

admin
2,959 Posts

February 20, 2022, 11:00 am
Quote from admin on February 20, 2022, 11:00 am
Thanks a lot for your feedback. We did find a bug and will be included in next bug fix release. which is due in a couple of days. It is related to Ubuntu 20.04 now using a different version of netcat based on bsd which is slightly different, netcat is used by the node stats so communicate to the stats server, it does not behave the same in case of failover.

gluster peer status is correct to work only on first 3 nodes, this is the server component of gluster where it stores the shared data for stats, nodes 4 and above are clients only.

Thanks a lot for your feedback. We did find a bug and will be included in next bug fix release. which is due in a couple of days. It is related to Ubuntu 20.04 now using a different version of netcat based on bsd which is slightly different, netcat is used by the node stats so communicate to the stats server, it does not behave the same in case of failover.

gluster peer status is correct to work only on first 3 nodes, this is the server component of gluster where it stores the shared data for stats, nodes 4 and above are clients only.

#2

admin
2,959 Posts

February 23, 2022, 2:48 pm
Quote from admin on February 23, 2022, 2:48 pm
Issue has been fixed in Release 3.0.2.

Issue has been fixed in Release 3.0.2.

#3

aslyotov
3 Posts

February 25, 2022, 2:48 am
Quote from aslyotov on February 25, 2022, 2:48 am
I upgraded my PetaSAN cluster to 3.0.2 (and reboot each node after upgrade), but it doesn't help.

Some graphs like "Disk utilization", "Disk IOPS", "Disk throughput" randomly stop showing data on random node.

I didn't find any order of this malfunction yet, but it can happens on any node at any time.

Guys, your PetaSAN is really great and very useful product, but it's monitoring features is poore and unstable. May be you'll think about rebuilding this feature at all?

I do not think what it is very complex task. You can use Prometheus+Node Exporter+any other exporters+Grafana solution for example to get and show any metrics you need.

I upgraded my PetaSAN cluster to 3.0.2 (and reboot each node after upgrade), but it doesn't help.

Some graphs like "Disk utilization", "Disk IOPS", "Disk throughput" randomly stop showing data on random node.

I didn't find any order of this malfunction yet, but it can happens on any node at any time.

Guys, your PetaSAN is really great and very useful product, but it's monitoring features is poore and unstable. May be you'll think about rebuilding this feature at all?

I do not think what it is very complex task. You can use Prometheus+Node Exporter+any other exporters+Grafana solution for example to get and show any metrics you need.

#4

admin
2,959 Posts

February 25, 2022, 3:40 pm
Quote from admin on February 25, 2022, 3:40 pm
Your issue not related to the fix we did in 3.0.2. It could be many issue: network connectivity, hardware load, not enough ram...

When you see the issue, do all stats from a specific node stop ? or some stats show up but not others on that node ? do they work after a while or once they stop they do not work ?

On a node that is not working:

what is status of
systemctl status petasan-node-stats

do you see any errors in /opt/petasan/log/PetaSAN.log ?

Try to manually write a fake 50% cpu from this node and see if we get errors and if it shows up on chart

first get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py

then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.ps-node-01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003

The current stats module has been wroking for a long time and has been stable, we use grafana/graphite/carbon stack which is quite robust, in addition it supports high availabilty with a 3x data redundancy provided via Gluster shared filesystem external from Ceph. We need to understand more the issue you have.

Your issue not related to the fix we did in 3.0.2. It could be many issue: network connectivity, hardware load, not enough ram...

When you see the issue, do all stats from a specific node stop ? or some stats show up but not others on that node ? do they work after a while or once they stop they do not work ?

On a node that is not working:

what is status of
systemctl status petasan-node-stats

do you see any errors in /opt/petasan/log/PetaSAN.log ?

Try to manually write a fake 50% cpu from this node and see if we get errors and if it shows up on chart

first get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py

then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.ps-node-01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003

The current stats module has been wroking for a long time and has been stable, we use grafana/graphite/carbon stack which is quite robust, in addition it supports high availabilty with a 3x data redundancy provided via Gluster shared filesystem external from Ceph. We need to understand more the issue you have.

Last edited on February 25, 2022, 3:45 pm by admin · #5

admin
2,959 Posts

March 3, 2022, 9:18 am
Quote from admin on March 3, 2022, 9:18 am
any chance testing the above ?

any chance testing the above ?

#6

aslyotov
3 Posts

March 3, 2022, 1:31 pm
Quote from aslyotov on March 3, 2022, 1:31 pm
Sorry, right now I decided to install the native Ceph Pacific v16.2.7 on my 5 nodes.

PetaSan is great product, but it's unstable statistic graphs are spoiling everything(

Sorry, right now I decided to install the native Ceph Pacific v16.2.7 on my 5 nodes.

PetaSan is great product, but it's unstable statistic graphs are spoiling everything(

#7

Post Reply: v3.0.1 - No graph statistics from 4 and 5 nodes

Cancel