ForumGeneral Discussiongraphs grafana 404 errors, solved

You need to log in to create posts and topics. Login · Register

graphs grafana 404 errors, solved

davlaw
35 Posts

February 2, 2021, 2:51 pm

Gone though the stat-stop stat-setup and stat-start scripts

But for some reason my apache2 log for other_vhosts_access.log gives me a bunch of 404 errors

Node1 10.10.0.151:80 127.0.0.1 - - [02/Feb/2021:04:01:14 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"

Node2 10.10.0.152:80 127.0.0.1 - - [02/Feb/2021:09:39:49 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"

node 3 10.10.0.153:80 127.0.0.1 - - [02/Feb/2021:03:52:32 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"

I can access each Grafana control panel, but when I test the DS1 setup I get HTTP Error Not Found

Thinking Grafana might be ok, but graphite is failing somewhere..

davlaw
35 Posts

February 2, 2021, 3:26 pm

Well, somethings missing no open port for localhost:8080, on all nodes

tcp 0 0 localhost:8600 0.0.0.0:* LISTEN 1182/consul
tcp 0 0 localhost:rfe 0.0.0.0:* LISTEN 1323/python3
tcp 0 0 localhost:8500 0.0.0.0:* LISTEN 1182/consul
udp 0 0 localhost:ntp 0.0.0.0:* 1172/ntpd
udp 0 0 localhost:8600 0.0.0.0:* 1182/consul
udp6 0 0 localhost:ntp [::]:* 1172/ntpd

davlaw
35 Posts

February 2, 2021, 4:39 pm

Ok, finally at something that might make sense

/ope/petasan/log/PetaSAN.log

While reviewing petasan logs I have a couple of errors , this one first. I see several of these that appear to be getting in trouble around cpu_all.percent_util

PetaSAN.NodeStats.peta2.ifaces.throughput.eth1-100_received 173475.84 `date +%s`" | nc -q0 10.10.0.152 2003
Traceback (most recent call last):
File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
graphite_sender.send(leader_ip)
File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.peta2.cpu_all.percent_util 1.17 `date +%s`" "

Another one, in the /opt/petesan/log/stats.log

[2021-02-02 08:15:33] write_graphite plugin: send to localhost:2003 (tcp) failed with status 4 (Interrupted system call)
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:26] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:27] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:27] Systemd detected, trying to signal readyness.
[2021-02-02 08:16:27] Initialization complete, entering read-loop.
[2021-02-02 09:27:55] Exiting normally.

admin
2,969 Posts

February 2, 2021, 7:22 pm

can you check the shared file system is mounted
mount | grep shared

grafana can only be started in one node only
find the stats leader

/opt/petasan/scripts/util/get_cluster_leader.py

restart services
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-start.sh

for other management nodes, make sure you stop them else it could cuase issues
/opt/petasan/scripts/stats-stop.sh

davlaw
35 Posts

February 2, 2021, 10:17 pm

Thanks I'll post output when I get back in tomorrow, while I do very much appreciate your comments/assistance, I have tried this several times with out much luck.

The 10.10.0.x addresses it is using is also the management ip, now thinking its hitting the management web UI, noticed I have 2 https ports, one using nginx and the other apache on the server that is the cluster leader.

Other 2 nodes only have nginx running for https

Not sure if I have some sort of network issue, management nodes are 10.10.0.151, 10.10.0.152, 10.10.0.153 all with netmask 255.255.254.0 on eth0 vlan 1

Back end is 172.16.0.151, 172.16.0.152, 172.16.0.153 all netmasked with 255.255.255.0 on eth1 vlan 100

I have eth2 and eth3 as well but not config or connected

These are real hardware, supermicro MB, no VMs

davlaw
35 Posts

February 3, 2021, 8:29 pm

Well still deploying nodes and graphs are no joy

netstat -aelptu | grep http on the cluster leader

tcp 0 0 0.0.0.0:https 0.0.0.0:* LISTEN root 42306 1418/nginx: master
tcp 0 0 0.0.0.0:http 0.0.0.0:* LISTEN root 42307 1418/nginx: master
tcp 0 0 peta1:https it01hp.xxxxxx.com:59353 ESTABLISHED www-data 218608 1420/nginx: worker
tcp6 0 0 [::]:http-alt [::]:* LISTEN root 119893 18256/apache2

Guess apache does not have a standard ipv4 config? So no 8080 on tcp (http-alt)

Sorry to be such a PITA, pretty much happy so far with the way the rest of the stuff is falling into place.

admin
2,969 Posts

February 3, 2021, 8:45 pm

on the stats idebtified with

/opt/petasan/scripts/util/get_cluster_leader.py

what is the output of

systemctl status apache2
netstat -npl | grep apache

davlaw
35 Posts

February 4, 2021, 12:29 am

root@peta1:/etc/apache2/conf-enabled# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Wed 2021-02-03 15:36:08 EST; 3h 50min ago
Process: 72461 ExecStop=/usr/sbin/apachectl stop (code=exited, status=0/SUCCESS)
Process: 72469 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 72474 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─72474 /usr/sbin/apache2 -k start
├─72476 /usr/sbin/apache2 -k start
└─72477 /usr/sbin/apache2 -k start

Feb 03 15:36:08 peta1 systemd[1]: Stopped The Apache HTTP Server.
Feb 03 15:36:08 peta1 systemd[1]: Starting The Apache HTTP Server...
Feb 03 15:36:08 peta1 apachectl[72469]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.151. Set the 'ServerName' directive globally to suppress this mes
sage
Feb 03 15:36:08 peta1 systemd[1]: Started The Apache HTTP Server.

root@peta1:/etc/apache2/conf-enabled# netstat -npl | grep apache
tcp6 0 0 :::8080 :::* LISTEN 72474/apache2

davlaw
35 Posts

February 4, 2021, 12:55 pm

Quote from davlaw on February 4, 2021, 12:55 pm
Ok, well FYI

disabled ip6 on nodes with in sysctl.conf, since I was not using it anyay

net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1

Now on the cluster leader

root@peta2:/etc# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Thu 2021-02-04 03:48:05 EST; 4h 0min ago
Process: 224099 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 224103 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─224103 /usr/sbin/apache2 -k start
├─224104 /usr/sbin/apache2 -k start
└─224105 /usr/sbin/apache2 -k start

Feb 04 03:48:05 peta2 systemd[1]: Starting The Apache HTTP Server...
Feb 04 03:48:05 peta2 apachectl[224099]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.152. Set the 'ServerName' directive globally to suppress this me
ssage
Feb 04 03:48:05 peta2 systemd[1]: Started The Apache HTTP Server.

root@peta2:/etc# netstat -npl | grep apache
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTE 224103/apache2

uploaded logs, I know this has to help some

node1 https://pastebin.com/raw/FtTHkVQY

node2 https://pastebin.com/raw/Gu5YK6bh

node3 https://pastebin.com/raw/itu01AAa

node 4 https://pastebin.com/raw/r8mPEqKf

area of concern, and maybe nothing is even getting updated, each log contains some errors getting a module in python
Traceback (most recent call last):
  File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
    get_stats()
  File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
    graphite_sender.send(leader_ip)
  File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
    raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.peta4.cpu_all.percent_util 65.73  `date +%s`" "
Moving on adding additional nodes

Just added #5

Node5 https://pastebin.com/raw/DrincJhH

Ok, well FYI

disabled ip6 on nodes with in sysctl.conf, since I was not using it anyay

net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1

Now on the cluster leader

root@peta2:/etc# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Thu 2021-02-04 03:48:05 EST; 4h 0min ago
Process: 224099 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 224103 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─224103 /usr/sbin/apache2 -k start
├─224104 /usr/sbin/apache2 -k start
└─224105 /usr/sbin/apache2 -k start

Feb 04 03:48:05 peta2 systemd[1]: Starting The Apache HTTP Server...
Feb 04 03:48:05 peta2 apachectl[224099]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.152. Set the 'ServerName' directive globally to suppress this me
ssage
Feb 04 03:48:05 peta2 systemd[1]: Started The Apache HTTP Server.

root@peta2:/etc# netstat -npl | grep apache
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTE 224103/apache2

uploaded logs, I know this has to help some

node1 https://pastebin.com/raw/FtTHkVQY

node2 https://pastebin.com/raw/Gu5YK6bh

node3 https://pastebin.com/raw/itu01AAa

node 4 https://pastebin.com/raw/r8mPEqKf

area of concern, and maybe nothing is even getting updated, each log contains some errors getting a module in python

Traceback (most recent call last):
  File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
    get_stats()
  File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
    graphite_sender.send(leader_ip)
  File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
    raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.peta4.cpu_all.percent_util 65.73  `date +%s`" "

Moving on adding additional nodes

Just added #5

Node5 https://pastebin.com/raw/DrincJhH

davlaw
35 Posts

February 4, 2021, 3:41 pm

Well Grafana datasource

on the cluster leader IP Grafana control panel the datasource (DS1) http://localhost:8080 always gives error when I try to save and test.

using cluster leader IP + 3000 gives

If you're seeing this Grafana has failed to load its application files

etc etc

Guess this makes sense as logs contain alot of 404 errors