Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

graphs grafana 404 errors, solved

Pages: 1 2

Gone though the stat-stop stat-setup and stat-start scripts

But for some reason my apache2 log for other_vhosts_access.log gives me a bunch of 404 errors

 

Node1    10.10.0.151:80 127.0.0.1 - - [02/Feb/2021:04:01:14 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"

Node2    10.10.0.152:80 127.0.0.1 - - [02/Feb/2021:09:39:49 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"

node 3    10.10.0.153:80 127.0.0.1 - - [02/Feb/2021:03:52:32 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"

 

I can access each Grafana control panel,  but when I test the DS1 setup I get HTTP Error Not Found

Thinking Grafana  might be ok, but graphite is  failing somewhere..

 

Well, somethings missing no open port for localhost:8080, on all nodes

tcp 0 0 localhost:8600 0.0.0.0:* LISTEN 1182/consul
tcp 0 0 localhost:rfe 0.0.0.0:* LISTEN 1323/python3
tcp 0 0 localhost:8500 0.0.0.0:* LISTEN 1182/consul
udp 0 0 localhost:ntp 0.0.0.0:* 1172/ntpd
udp 0 0 localhost:8600 0.0.0.0:* 1182/consul
udp6 0 0 localhost:ntp [::]:* 1172/ntpd

Ok, finally at something that might make sense

/ope/petasan/log/PetaSAN.log

While reviewing petasan logs I have a couple of errors , this one first.  I see several of these that appear to be getting in trouble around  cpu_all.percent_util

PetaSAN.NodeStats.peta2.ifaces.throughput.eth1-100_received 173475.84 `date +%s`" | nc -q0 10.10.0.152 2003
Traceback (most recent call last):
File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
graphite_sender.send(leader_ip)
File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.peta2.cpu_all.percent_util 1.17 `date +%s`" "

 

Another one,  in the /opt/petesan/log/stats.log

[2021-02-02 08:15:33] write_graphite plugin: send to localhost:2003 (tcp) failed with status 4 (Interrupted system call)
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:26] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:27] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:27] Systemd detected, trying to signal readyness.
[2021-02-02 08:16:27] Initialization complete, entering read-loop.
[2021-02-02 09:27:55] Exiting normally.

 

 

can you check the shared file system is mounted
mount | grep shared

grafana can only be started in one node only
find the stats leader

/opt/petasan/scripts/util/get_cluster_leader.py

restart services
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-start.sh

for other management nodes, make sure you stop them else it could cuase issues
/opt/petasan/scripts/stats-stop.sh

Thanks I'll post output when I get back in tomorrow, while I do very much appreciate your comments/assistance, I have tried this several times with out much luck.

The 10.10.0.x addresses it is using is also the management ip, now thinking its hitting the management web UI, noticed I have 2 https ports, one using nginx and the other apache on the server that is the cluster leader.

Other 2 nodes only have nginx running for https

Not sure if I have some sort of network issue, management nodes are 10.10.0.151, 10.10.0.152, 10.10.0.153  all with netmask 255.255.254.0 on eth0 vlan 1

Back end is 172.16.0.151, 172.16.0.152, 172.16.0.153 all netmasked with 255.255.255.0 on eth1 vlan 100

I have eth2 and eth3 as well but not config or connected

These are real hardware, supermicro MB, no VMs

Well still deploying nodes and graphs are no joy

netstat -aelptu | grep http on the cluster leader

tcp 0 0 0.0.0.0:https 0.0.0.0:* LISTEN root 42306 1418/nginx: master
tcp 0 0 0.0.0.0:http 0.0.0.0:* LISTEN root 42307 1418/nginx: master
tcp 0 0 peta1:https it01hp.xxxxxx.com:59353 ESTABLISHED www-data 218608 1420/nginx: worker
tcp6 0 0 [::]:http-alt [::]:* LISTEN root 119893 18256/apache2

Guess apache does not have a standard ipv4 config? So no 8080 on tcp (http-alt)

Sorry to be such a PITA, pretty much happy so far with the way the rest of the stuff is falling into place.

 

 

on the stats idebtified with

/opt/petasan/scripts/util/get_cluster_leader.py

what is the output of

systemctl status apache2
netstat -npl | grep apache

root@peta1:/etc/apache2/conf-enabled# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Wed 2021-02-03 15:36:08 EST; 3h 50min ago
Process: 72461 ExecStop=/usr/sbin/apachectl stop (code=exited, status=0/SUCCESS)
Process: 72469 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 72474 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─72474 /usr/sbin/apache2 -k start
├─72476 /usr/sbin/apache2 -k start
└─72477 /usr/sbin/apache2 -k start

Feb 03 15:36:08 peta1 systemd[1]: Stopped The Apache HTTP Server.
Feb 03 15:36:08 peta1 systemd[1]: Starting The Apache HTTP Server...
Feb 03 15:36:08 peta1 apachectl[72469]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.151. Set the 'ServerName' directive globally to suppress this mes
sage
Feb 03 15:36:08 peta1 systemd[1]: Started The Apache HTTP Server.

 

root@peta1:/etc/apache2/conf-enabled# netstat -npl | grep apache
tcp6 0 0 :::8080 :::* LISTEN 72474/apache2

Ok, well FYI

disabled ip6 on nodes with in sysctl.conf, since I was not using it anyay

net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1

Now on the cluster leader

root@peta2:/etc# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Thu 2021-02-04 03:48:05 EST; 4h 0min ago
Process: 224099 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 224103 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─224103 /usr/sbin/apache2 -k start
├─224104 /usr/sbin/apache2 -k start
└─224105 /usr/sbin/apache2 -k start

Feb 04 03:48:05 peta2 systemd[1]: Starting The Apache HTTP Server...
Feb 04 03:48:05 peta2 apachectl[224099]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.152. Set the 'ServerName' directive globally to suppress this me
ssage
Feb 04 03:48:05 peta2 systemd[1]: Started The Apache HTTP Server.

 

root@peta2:/etc# netstat -npl | grep apache
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTE 224103/apache2

uploaded logs, I know this has to help some

node1 https://pastebin.com/raw/FtTHkVQY

node2 https://pastebin.com/raw/Gu5YK6bh

node3 https://pastebin.com/raw/itu01AAa

node 4 https://pastebin.com/raw/r8mPEqKf

area of concern, and maybe nothing is even getting updated, each log contains some errors getting a module in python

Traceback (most recent call last):
  File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
    get_stats()
  File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
    graphite_sender.send(leader_ip)
  File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
    raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.peta4.cpu_all.percent_util 65.73  `date +%s`" "

 

Moving on adding additional nodes

Just added #5

Node5  https://pastebin.com/raw/DrincJhH

 

Well Grafana datasource

on the cluster leader IP Grafana control panel the datasource (DS1)  http://localhost:8080 always gives error when I try to save and test.

using cluster leader IP +  3000 gives

If you're seeing this Grafana has failed to load its application files

etc etc

Guess this makes sense as logs contain alot of 404 errors

 

Pages: 1 2