Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Errors in logs - related to stats?

Hi there,

For a number of months now (perhaps since 3.1 update, maybe before) we've been seeing the following errors in the logs of all the nodes, repeating:

23/02/2023 20:46:17 ERROR Error running echo command :echo "PetaSAN.NodeStats.gl-san-02a.cpu_all.percent_util 13.61 `date +%s`" "
23/02/2023 20:46:17 ERROR Node Stats exception.
PetaSAN.NodeStats.gl-san-02a.ifaces.throughput.bond0-20_received 50564352.0 `date +%s`" | nc -v -q0 192.168.37.138 2003
PetaSAN.NodeStats.gl-san-02a.ifaces.percent_util.bond0-20 1.01 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.throughput.bond0-1_transmitted 240035.84 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.throughput.bond0-1_received 438292.48 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.percent_util.bond0-1 0.01 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.throughput.bond0_transmitted 42225715.2 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.throughput.bond0_received 51456921.6 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.percent_util.bond0 1.03 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.memory.percent_util 23.98 `date +%s`" "
Exception: Error running echo command :echo "PetaSAN.NodeStats.gl-san-02a.cpu_all.percent_util 14.81 `date +%s`" "
raise Exception("Error running echo command :" + cmd)
File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
graphite_sender.send(leader_ip)
File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
Traceback (most recent call last):
PetaSAN.NodeStats.gl-san-02a.ifaces.throughput.bond0-20_received 50564352.0 `date +%s`" | nc -v -q0 192.168.37.138 2003
PetaSAN.NodeStats.gl-san-02a.ifaces.percent_util.bond0-20 1.01 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.throughput.bond0-1_transmitted 240035.84 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.throughput.bond0-1_received 438292.48 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.percent_util.bond0-1 0.01 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.throughput.bond0_transmitted 42225715.2 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.throughput.bond0_received 51456921.6 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.ifaces.percent_util.bond0 1.03 `date +%s`" "
PetaSAN.NodeStats.gl-san-02a.memory.percent_util 23.98 `date +%s`" "

I think this causes the stats pages related to the nodes to show as blank, as well as making spotting other issues in the logs somewhat difficult.

Have you any suggestions as to what may be causing it, and how to fix it?

One additional question - we're seeing the deep scrubs often not quite complete in time, and so we get the warnings such as "2 pgs not deep-scrubbed in time" for time to time, which puts the cluster in a warning state - is there a way to drop the frequency of the scrubs so that they have plenty of time to complete (we've tried increasing the speed, but this causes performance issues, the LVM caches start to fill up as the HDDs are under heavy read load and are unable to handle enough writes)

Thanks!

it could be network issues or load issues. does the error happen all the time ? is the system stressed ? run atop and look at load on cpu/disk/net.

Hi, it happens all the time, constantly, and it happens on all 5 nodes.

The system isn't stressed. For example right now

Network: eth0 (iscsiA) 60Mbs / 10G, eth1 (iscsiB) 57Mbs / 10G, bond0 170Mbs / 40G

CPU (72cores): avg 15-18%

Memory: 256GB total, free 2.8G, cache 2.0G, buff 105G

Disk: only the nvmes show busy (but with low latency, and not much throughput - I believe they always read like this way in this kernel)

Thanks

1) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
2) on node which is stats server, try

/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh

3) If this does not fix issues, on all nodes do a
systemctl restart petasan-node-stats

That's fixed it! Thanks!

Is there a way to adjust the layout or size of the stats screen? The popup to show the numbers is off the edge of the iframe when mousing over

 

what browser are you using ? can you show a screenshot ?

Hi,

I'm using Google Chrome (Version 110.0.5481.178 (Official Build) (64-bit))

This is full screen on 1920 x 1080 display