graphs grafana 404 errors, solved
Pages: 1 2
davlaw
35 Posts
February 2, 2021, 2:51 pmQuote from davlaw on February 2, 2021, 2:51 pmGone though the stat-stop stat-setup and stat-start scripts
But for some reason my apache2 log for other_vhosts_access.log gives me a bunch of 404 errors
Node1 10.10.0.151:80 127.0.0.1 - - [02/Feb/2021:04:01:14 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
Node2 10.10.0.152:80 127.0.0.1 - - [02/Feb/2021:09:39:49 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
node 3 10.10.0.153:80 127.0.0.1 - - [02/Feb/2021:03:52:32 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
I can access each Grafana control panel, but when I test the DS1 setup I get HTTP Error Not Found
Thinking Grafana might be ok, but graphite is failing somewhere..
Gone though the stat-stop stat-setup and stat-start scripts
But for some reason my apache2 log for other_vhosts_access.log gives me a bunch of 404 errors
Node1 10.10.0.151:80 127.0.0.1 - - [02/Feb/2021:04:01:14 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
Node2 10.10.0.152:80 127.0.0.1 - - [02/Feb/2021:09:39:49 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
node 3 10.10.0.153:80 127.0.0.1 - - [02/Feb/2021:03:52:32 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
I can access each Grafana control panel, but when I test the DS1 setup I get HTTP Error Not Found
Thinking Grafana might be ok, but graphite is failing somewhere..
Last edited on February 23, 2021, 1:28 pm by davlaw · #1
davlaw
35 Posts
February 2, 2021, 3:26 pmQuote from davlaw on February 2, 2021, 3:26 pmWell, somethings missing no open port for localhost:8080, on all nodes
tcp 0 0 localhost:8600 0.0.0.0:* LISTEN 1182/consul
tcp 0 0 localhost:rfe 0.0.0.0:* LISTEN 1323/python3
tcp 0 0 localhost:8500 0.0.0.0:* LISTEN 1182/consul
udp 0 0 localhost:ntp 0.0.0.0:* 1172/ntpd
udp 0 0 localhost:8600 0.0.0.0:* 1182/consul
udp6 0 0 localhost:ntp [::]:* 1172/ntpd
Well, somethings missing no open port for localhost:8080, on all nodes
tcp 0 0 localhost:8600 0.0.0.0:* LISTEN 1182/consul
tcp 0 0 localhost:rfe 0.0.0.0:* LISTEN 1323/python3
tcp 0 0 localhost:8500 0.0.0.0:* LISTEN 1182/consul
udp 0 0 localhost:ntp 0.0.0.0:* 1172/ntpd
udp 0 0 localhost:8600 0.0.0.0:* 1182/consul
udp6 0 0 localhost:ntp [::]:* 1172/ntpd
Last edited on February 2, 2021, 3:28 pm by davlaw · #2
davlaw
35 Posts
February 2, 2021, 4:39 pmQuote from davlaw on February 2, 2021, 4:39 pmOk, finally at something that might make sense
/ope/petasan/log/PetaSAN.log
While reviewing petasan logs I have a couple of errors , this one first. I see several of these that appear to be getting in trouble around cpu_all.percent_util
PetaSAN.NodeStats.peta2.ifaces.throughput.eth1-100_received 173475.84 `date +%s`" | nc -q0 10.10.0.152 2003
Traceback (most recent call last):
File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
graphite_sender.send(leader_ip)
File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.peta2.cpu_all.percent_util 1.17 `date +%s`" "
Another one, in the /opt/petesan/log/stats.log
[2021-02-02 08:15:33] write_graphite plugin: send to localhost:2003 (tcp) failed with status 4 (Interrupted system call)
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:26] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:27] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:27] Systemd detected, trying to signal readyness.
[2021-02-02 08:16:27] Initialization complete, entering read-loop.
[2021-02-02 09:27:55] Exiting normally.
Ok, finally at something that might make sense
/ope/petasan/log/PetaSAN.log
While reviewing petasan logs I have a couple of errors , this one first. I see several of these that appear to be getting in trouble around cpu_all.percent_util
PetaSAN.NodeStats.peta2.ifaces.throughput.eth1-100_received 173475.84 `date +%s`" | nc -q0 10.10.0.152 2003
Traceback (most recent call last):
File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
graphite_sender.send(leader_ip)
File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.peta2.cpu_all.percent_util 1.17 `date +%s`" "
Another one, in the /opt/petesan/log/stats.log
[2021-02-02 08:15:33] write_graphite plugin: send to localhost:2003 (tcp) failed with status 4 (Interrupted system call)
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:26] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:27] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:27] Systemd detected, trying to signal readyness.
[2021-02-02 08:16:27] Initialization complete, entering read-loop.
[2021-02-02 09:27:55] Exiting normally.
admin
2,930 Posts
February 2, 2021, 7:22 pmQuote from admin on February 2, 2021, 7:22 pmcan you check the shared file system is mounted
mount | grep shared
grafana can only be started in one node only
find the stats leader
/opt/petasan/scripts/util/get_cluster_leader.py
restart services
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-start.sh
for other management nodes, make sure you stop them else it could cuase issues
/opt/petasan/scripts/stats-stop.sh
can you check the shared file system is mounted
mount | grep shared
grafana can only be started in one node only
find the stats leader
/opt/petasan/scripts/util/get_cluster_leader.py
restart services
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-start.sh
for other management nodes, make sure you stop them else it could cuase issues
/opt/petasan/scripts/stats-stop.sh
davlaw
35 Posts
February 2, 2021, 10:17 pmQuote from davlaw on February 2, 2021, 10:17 pmThanks I'll post output when I get back in tomorrow, while I do very much appreciate your comments/assistance, I have tried this several times with out much luck.
The 10.10.0.x addresses it is using is also the management ip, now thinking its hitting the management web UI, noticed I have 2 https ports, one using nginx and the other apache on the server that is the cluster leader.
Other 2 nodes only have nginx running for https
Not sure if I have some sort of network issue, management nodes are 10.10.0.151, 10.10.0.152, 10.10.0.153 all with netmask 255.255.254.0 on eth0 vlan 1
Back end is 172.16.0.151, 172.16.0.152, 172.16.0.153 all netmasked with 255.255.255.0 on eth1 vlan 100
I have eth2 and eth3 as well but not config or connected
These are real hardware, supermicro MB, no VMs
Thanks I'll post output when I get back in tomorrow, while I do very much appreciate your comments/assistance, I have tried this several times with out much luck.
The 10.10.0.x addresses it is using is also the management ip, now thinking its hitting the management web UI, noticed I have 2 https ports, one using nginx and the other apache on the server that is the cluster leader.
Other 2 nodes only have nginx running for https
Not sure if I have some sort of network issue, management nodes are 10.10.0.151, 10.10.0.152, 10.10.0.153 all with netmask 255.255.254.0 on eth0 vlan 1
Back end is 172.16.0.151, 172.16.0.152, 172.16.0.153 all netmasked with 255.255.255.0 on eth1 vlan 100
I have eth2 and eth3 as well but not config or connected
These are real hardware, supermicro MB, no VMs
davlaw
35 Posts
February 3, 2021, 8:29 pmQuote from davlaw on February 3, 2021, 8:29 pmWell still deploying nodes and graphs are no joy
netstat -aelptu | grep http on the cluster leader
tcp 0 0 0.0.0.0:https 0.0.0.0:* LISTEN root 42306 1418/nginx: master
tcp 0 0 0.0.0.0:http 0.0.0.0:* LISTEN root 42307 1418/nginx: master
tcp 0 0 peta1:https it01hp.xxxxxx.com:59353 ESTABLISHED www-data 218608 1420/nginx: worker
tcp6 0 0 [::]:http-alt [::]:* LISTEN root 119893 18256/apache2
Guess apache does not have a standard ipv4 config? So no 8080 on tcp (http-alt)
Sorry to be such a PITA, pretty much happy so far with the way the rest of the stuff is falling into place.
Well still deploying nodes and graphs are no joy
netstat -aelptu | grep http on the cluster leader
tcp 0 0 0.0.0.0:https 0.0.0.0:* LISTEN root 42306 1418/nginx: master
tcp 0 0 0.0.0.0:http 0.0.0.0:* LISTEN root 42307 1418/nginx: master
tcp 0 0 peta1:https it01hp.xxxxxx.com:59353 ESTABLISHED www-data 218608 1420/nginx: worker
tcp6 0 0 [::]:http-alt [::]:* LISTEN root 119893 18256/apache2
Guess apache does not have a standard ipv4 config? So no 8080 on tcp (http-alt)
Sorry to be such a PITA, pretty much happy so far with the way the rest of the stuff is falling into place.
Last edited on February 3, 2021, 8:30 pm by davlaw · #6
admin
2,930 Posts
February 3, 2021, 8:45 pmQuote from admin on February 3, 2021, 8:45 pmon the stats idebtified with
/opt/petasan/scripts/util/get_cluster_leader.py
what is the output of
systemctl status apache2
netstat -npl | grep apache
on the stats idebtified with
/opt/petasan/scripts/util/get_cluster_leader.py
what is the output of
systemctl status apache2
netstat -npl | grep apache
davlaw
35 Posts
February 4, 2021, 12:29 amQuote from davlaw on February 4, 2021, 12:29 amroot@peta1:/etc/apache2/conf-enabled# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Wed 2021-02-03 15:36:08 EST; 3h 50min ago
Process: 72461 ExecStop=/usr/sbin/apachectl stop (code=exited, status=0/SUCCESS)
Process: 72469 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 72474 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─72474 /usr/sbin/apache2 -k start
├─72476 /usr/sbin/apache2 -k start
└─72477 /usr/sbin/apache2 -k start
Feb 03 15:36:08 peta1 systemd[1]: Stopped The Apache HTTP Server.
Feb 03 15:36:08 peta1 systemd[1]: Starting The Apache HTTP Server...
Feb 03 15:36:08 peta1 apachectl[72469]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.151. Set the 'ServerName' directive globally to suppress this mes
sage
Feb 03 15:36:08 peta1 systemd[1]: Started The Apache HTTP Server.
root@peta1:/etc/apache2/conf-enabled# netstat -npl | grep apache
tcp6 0 0 :::8080 :::* LISTEN 72474/apache2
root@peta1:/etc/apache2/conf-enabled# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Wed 2021-02-03 15:36:08 EST; 3h 50min ago
Process: 72461 ExecStop=/usr/sbin/apachectl stop (code=exited, status=0/SUCCESS)
Process: 72469 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 72474 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─72474 /usr/sbin/apache2 -k start
├─72476 /usr/sbin/apache2 -k start
└─72477 /usr/sbin/apache2 -k start
Feb 03 15:36:08 peta1 systemd[1]: Stopped The Apache HTTP Server.
Feb 03 15:36:08 peta1 systemd[1]: Starting The Apache HTTP Server...
Feb 03 15:36:08 peta1 apachectl[72469]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.151. Set the 'ServerName' directive globally to suppress this mes
sage
Feb 03 15:36:08 peta1 systemd[1]: Started The Apache HTTP Server.
root@peta1:/etc/apache2/conf-enabled# netstat -npl | grep apache
tcp6 0 0 :::8080 :::* LISTEN 72474/apache2
davlaw
35 Posts
February 4, 2021, 12:55 pmQuote from davlaw on February 4, 2021, 12:55 pmOk, well FYI
disabled ip6 on nodes with in sysctl.conf, since I was not using it anyay
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
Now on the cluster leader
root@peta2:/etc# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Thu 2021-02-04 03:48:05 EST; 4h 0min ago
Process: 224099 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 224103 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─224103 /usr/sbin/apache2 -k start
├─224104 /usr/sbin/apache2 -k start
└─224105 /usr/sbin/apache2 -k start
Feb 04 03:48:05 peta2 systemd[1]: Starting The Apache HTTP Server...
Feb 04 03:48:05 peta2 apachectl[224099]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.152. Set the 'ServerName' directive globally to suppress this me
ssage
Feb 04 03:48:05 peta2 systemd[1]: Started The Apache HTTP Server.
root@peta2:/etc# netstat -npl | grep apache
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTE 224103/apache2
uploaded logs, I know this has to help some
node1 https://pastebin.com/raw/FtTHkVQY
node2 https://pastebin.com/raw/Gu5YK6bh
node3 https://pastebin.com/raw/itu01AAa
node 4 https://pastebin.com/raw/r8mPEqKf
area of concern, and maybe nothing is even getting updated, each log contains some errors getting a module in python
Traceback (most recent call last):
File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
graphite_sender.send(leader_ip)
File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.peta4.cpu_all.percent_util 65.73 `date +%s`" "
Moving on adding additional nodes
Just added #5
Node5 https://pastebin.com/raw/DrincJhH
Ok, well FYI
disabled ip6 on nodes with in sysctl.conf, since I was not using it anyay
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
Now on the cluster leader
root@peta2:/etc# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Thu 2021-02-04 03:48:05 EST; 4h 0min ago
Process: 224099 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 224103 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─224103 /usr/sbin/apache2 -k start
├─224104 /usr/sbin/apache2 -k start
└─224105 /usr/sbin/apache2 -k start
Feb 04 03:48:05 peta2 systemd[1]: Starting The Apache HTTP Server...
Feb 04 03:48:05 peta2 apachectl[224099]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.152. Set the 'ServerName' directive globally to suppress this me
ssage
Feb 04 03:48:05 peta2 systemd[1]: Started The Apache HTTP Server.
root@peta2:/etc# netstat -npl | grep apache
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTE 224103/apache2
uploaded logs, I know this has to help some
node1 https://pastebin.com/raw/FtTHkVQY
node2 https://pastebin.com/raw/Gu5YK6bh
node3 https://pastebin.com/raw/itu01AAa
node 4 https://pastebin.com/raw/r8mPEqKf
area of concern, and maybe nothing is even getting updated, each log contains some errors getting a module in python
Traceback (most recent call last):
File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
graphite_sender.send(leader_ip)
File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.peta4.cpu_all.percent_util 65.73 `date +%s`" "
Moving on adding additional nodes
Just added #5
Node5 https://pastebin.com/raw/DrincJhH
Last edited on February 4, 2021, 1:44 pm by davlaw · #9
davlaw
35 Posts
February 4, 2021, 3:41 pmQuote from davlaw on February 4, 2021, 3:41 pmWell Grafana datasource
on the cluster leader IP Grafana control panel the datasource (DS1) http://localhost:8080 always gives error when I try to save and test.
using cluster leader IP + 3000 gives
If you're seeing this Grafana has failed to load its application files
etc etc
Guess this makes sense as logs contain alot of 404 errors
Well Grafana datasource
on the cluster leader IP Grafana control panel the datasource (DS1) http://localhost:8080 always gives error when I try to save and test.
using cluster leader IP + 3000 gives
If you're seeing this Grafana has failed to load its application files
etc etc
Guess this makes sense as logs contain alot of 404 errors
Pages: 1 2
graphs grafana 404 errors, solved
davlaw
35 Posts
Quote from davlaw on February 2, 2021, 2:51 pmGone though the stat-stop stat-setup and stat-start scripts
But for some reason my apache2 log for other_vhosts_access.log gives me a bunch of 404 errors
Node1 10.10.0.151:80 127.0.0.1 - - [02/Feb/2021:04:01:14 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
Node2 10.10.0.152:80 127.0.0.1 - - [02/Feb/2021:09:39:49 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
node 3 10.10.0.153:80 127.0.0.1 - - [02/Feb/2021:03:52:32 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
I can access each Grafana control panel, but when I test the DS1 setup I get HTTP Error Not Found
Thinking Grafana might be ok, but graphite is failing somewhere..
Gone though the stat-stop stat-setup and stat-start scripts
But for some reason my apache2 log for other_vhosts_access.log gives me a bunch of 404 errors
Node1 10.10.0.151:80 127.0.0.1 - - [02/Feb/2021:04:01:14 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
Node2 10.10.0.152:80 127.0.0.1 - - [02/Feb/2021:09:39:49 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
node 3 10.10.0.153:80 127.0.0.1 - - [02/Feb/2021:03:52:32 -0500] "POST /render HTTP/1.1" 404 434 "-" "Grafana/6.2.5"
I can access each Grafana control panel, but when I test the DS1 setup I get HTTP Error Not Found
Thinking Grafana might be ok, but graphite is failing somewhere..
davlaw
35 Posts
Quote from davlaw on February 2, 2021, 3:26 pmWell, somethings missing no open port for localhost:8080, on all nodes
tcp 0 0 localhost:8600 0.0.0.0:* LISTEN 1182/consul
tcp 0 0 localhost:rfe 0.0.0.0:* LISTEN 1323/python3
tcp 0 0 localhost:8500 0.0.0.0:* LISTEN 1182/consul
udp 0 0 localhost:ntp 0.0.0.0:* 1172/ntpd
udp 0 0 localhost:8600 0.0.0.0:* 1182/consul
udp6 0 0 localhost:ntp [::]:* 1172/ntpd
Well, somethings missing no open port for localhost:8080, on all nodes
tcp 0 0 localhost:8600 0.0.0.0:* LISTEN 1182/consul
tcp 0 0 localhost:rfe 0.0.0.0:* LISTEN 1323/python3
tcp 0 0 localhost:8500 0.0.0.0:* LISTEN 1182/consul
udp 0 0 localhost:ntp 0.0.0.0:* 1172/ntpd
udp 0 0 localhost:8600 0.0.0.0:* 1182/consul
udp6 0 0 localhost:ntp [::]:* 1172/ntpd
davlaw
35 Posts
Quote from davlaw on February 2, 2021, 4:39 pmOk, finally at something that might make sense
/ope/petasan/log/PetaSAN.log
While reviewing petasan logs I have a couple of errors , this one first. I see several of these that appear to be getting in trouble around cpu_all.percent_util
PetaSAN.NodeStats.peta2.ifaces.throughput.eth1-100_received 173475.84 `date +%s`" | nc -q0 10.10.0.152 2003
Traceback (most recent call last):
File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
graphite_sender.send(leader_ip)
File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.peta2.cpu_all.percent_util 1.17 `date +%s`" "
Another one, in the /opt/petesan/log/stats.log
[2021-02-02 08:15:33] write_graphite plugin: send to localhost:2003 (tcp) failed with status 4 (Interrupted system call)
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:26] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:27] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:27] Systemd detected, trying to signal readyness.
[2021-02-02 08:16:27] Initialization complete, entering read-loop.
[2021-02-02 09:27:55] Exiting normally.
Ok, finally at something that might make sense
/ope/petasan/log/PetaSAN.log
While reviewing petasan logs I have a couple of errors , this one first. I see several of these that appear to be getting in trouble around cpu_all.percent_util
PetaSAN.NodeStats.peta2.ifaces.throughput.eth1-100_received 173475.84 `date +%s`" | nc -q0 10.10.0.152 2003
Traceback (most recent call last):
File "/opt/petasan/scripts/node_stats.py", line 168, in <module>
get_stats()
File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats
graphite_sender.send(leader_ip)
File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send
raise Exception("Error running echo command :" + cmd)
Exception: Error running echo command :echo "PetaSAN.NodeStats.peta2.cpu_all.percent_util 1.17 `date +%s`" "
Another one, in the /opt/petesan/log/stats.log
[2021-02-02 08:15:33] write_graphite plugin: send to localhost:2003 (tcp) failed with status 4 (Interrupted system call)
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:26] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:26] plugin_load: plugin "write_graphite" successfully loaded.
[2021-02-02 08:16:26] plugin_load: plugin "python" successfully loaded.
[2021-02-02 08:16:27] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2021-02-02 08:16:27] Systemd detected, trying to signal readyness.
[2021-02-02 08:16:27] Initialization complete, entering read-loop.
[2021-02-02 09:27:55] Exiting normally.
admin
2,930 Posts
Quote from admin on February 2, 2021, 7:22 pmcan you check the shared file system is mounted
mount | grep sharedgrafana can only be started in one node only
find the stats leader/opt/petasan/scripts/util/get_cluster_leader.py
restart services
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-start.shfor other management nodes, make sure you stop them else it could cuase issues
/opt/petasan/scripts/stats-stop.sh
can you check the shared file system is mounted
mount | grep shared
grafana can only be started in one node only
find the stats leader
/opt/petasan/scripts/util/get_cluster_leader.py
restart services
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-start.sh
for other management nodes, make sure you stop them else it could cuase issues
/opt/petasan/scripts/stats-stop.sh
davlaw
35 Posts
Quote from davlaw on February 2, 2021, 10:17 pmThanks I'll post output when I get back in tomorrow, while I do very much appreciate your comments/assistance, I have tried this several times with out much luck.
The 10.10.0.x addresses it is using is also the management ip, now thinking its hitting the management web UI, noticed I have 2 https ports, one using nginx and the other apache on the server that is the cluster leader.
Other 2 nodes only have nginx running for https
Not sure if I have some sort of network issue, management nodes are 10.10.0.151, 10.10.0.152, 10.10.0.153 all with netmask 255.255.254.0 on eth0 vlan 1
Back end is 172.16.0.151, 172.16.0.152, 172.16.0.153 all netmasked with 255.255.255.0 on eth1 vlan 100
I have eth2 and eth3 as well but not config or connected
These are real hardware, supermicro MB, no VMs
Thanks I'll post output when I get back in tomorrow, while I do very much appreciate your comments/assistance, I have tried this several times with out much luck.
The 10.10.0.x addresses it is using is also the management ip, now thinking its hitting the management web UI, noticed I have 2 https ports, one using nginx and the other apache on the server that is the cluster leader.
Other 2 nodes only have nginx running for https
Not sure if I have some sort of network issue, management nodes are 10.10.0.151, 10.10.0.152, 10.10.0.153 all with netmask 255.255.254.0 on eth0 vlan 1
Back end is 172.16.0.151, 172.16.0.152, 172.16.0.153 all netmasked with 255.255.255.0 on eth1 vlan 100
I have eth2 and eth3 as well but not config or connected
These are real hardware, supermicro MB, no VMs
davlaw
35 Posts
Quote from davlaw on February 3, 2021, 8:29 pmWell still deploying nodes and graphs are no joy
netstat -aelptu | grep http on the cluster leader
tcp 0 0 0.0.0.0:https 0.0.0.0:* LISTEN root 42306 1418/nginx: master
tcp 0 0 0.0.0.0:http 0.0.0.0:* LISTEN root 42307 1418/nginx: master
tcp 0 0 peta1:https it01hp.xxxxxx.com:59353 ESTABLISHED www-data 218608 1420/nginx: worker
tcp6 0 0 [::]:http-alt [::]:* LISTEN root 119893 18256/apache2Guess apache does not have a standard ipv4 config? So no 8080 on tcp (http-alt)
Sorry to be such a PITA, pretty much happy so far with the way the rest of the stuff is falling into place.
Well still deploying nodes and graphs are no joy
netstat -aelptu | grep http on the cluster leader
tcp 0 0 0.0.0.0:https 0.0.0.0:* LISTEN root 42306 1418/nginx: master
tcp 0 0 0.0.0.0:http 0.0.0.0:* LISTEN root 42307 1418/nginx: master
tcp 0 0 peta1:https it01hp.xxxxxx.com:59353 ESTABLISHED www-data 218608 1420/nginx: worker
tcp6 0 0 [::]:http-alt [::]:* LISTEN root 119893 18256/apache2
Guess apache does not have a standard ipv4 config? So no 8080 on tcp (http-alt)
Sorry to be such a PITA, pretty much happy so far with the way the rest of the stuff is falling into place.
admin
2,930 Posts
Quote from admin on February 3, 2021, 8:45 pmon the stats idebtified with
/opt/petasan/scripts/util/get_cluster_leader.py
what is the output of
systemctl status apache2
netstat -npl | grep apache
on the stats idebtified with
/opt/petasan/scripts/util/get_cluster_leader.py
what is the output of
systemctl status apache2
netstat -npl | grep apache
davlaw
35 Posts
Quote from davlaw on February 4, 2021, 12:29 amroot@peta1:/etc/apache2/conf-enabled# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Wed 2021-02-03 15:36:08 EST; 3h 50min ago
Process: 72461 ExecStop=/usr/sbin/apachectl stop (code=exited, status=0/SUCCESS)
Process: 72469 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 72474 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─72474 /usr/sbin/apache2 -k start
├─72476 /usr/sbin/apache2 -k start
└─72477 /usr/sbin/apache2 -k startFeb 03 15:36:08 peta1 systemd[1]: Stopped The Apache HTTP Server.
Feb 03 15:36:08 peta1 systemd[1]: Starting The Apache HTTP Server...
Feb 03 15:36:08 peta1 apachectl[72469]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.151. Set the 'ServerName' directive globally to suppress this mes
sage
Feb 03 15:36:08 peta1 systemd[1]: Started The Apache HTTP Server.
root@peta1:/etc/apache2/conf-enabled# netstat -npl | grep apache
tcp6 0 0 :::8080 :::* LISTEN 72474/apache2
root@peta1:/etc/apache2/conf-enabled# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Wed 2021-02-03 15:36:08 EST; 3h 50min ago
Process: 72461 ExecStop=/usr/sbin/apachectl stop (code=exited, status=0/SUCCESS)
Process: 72469 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 72474 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─72474 /usr/sbin/apache2 -k start
├─72476 /usr/sbin/apache2 -k start
└─72477 /usr/sbin/apache2 -k start
Feb 03 15:36:08 peta1 systemd[1]: Stopped The Apache HTTP Server.
Feb 03 15:36:08 peta1 systemd[1]: Starting The Apache HTTP Server...
Feb 03 15:36:08 peta1 apachectl[72469]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.151. Set the 'ServerName' directive globally to suppress this mes
sage
Feb 03 15:36:08 peta1 systemd[1]: Started The Apache HTTP Server.
root@peta1:/etc/apache2/conf-enabled# netstat -npl | grep apache
tcp6 0 0 :::8080 :::* LISTEN 72474/apache2
davlaw
35 Posts
Quote from davlaw on February 4, 2021, 12:55 pmOk, well FYI
disabled ip6 on nodes with in sysctl.conf, since I was not using it anyay
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1Now on the cluster leader
root@peta2:/etc# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Thu 2021-02-04 03:48:05 EST; 4h 0min ago
Process: 224099 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 224103 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─224103 /usr/sbin/apache2 -k start
├─224104 /usr/sbin/apache2 -k start
└─224105 /usr/sbin/apache2 -k startFeb 04 03:48:05 peta2 systemd[1]: Starting The Apache HTTP Server...
Feb 04 03:48:05 peta2 apachectl[224099]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.152. Set the 'ServerName' directive globally to suppress this me
ssage
Feb 04 03:48:05 peta2 systemd[1]: Started The Apache HTTP Server.
root@peta2:/etc# netstat -npl | grep apache
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTE 224103/apache2uploaded logs, I know this has to help some
node1 https://pastebin.com/raw/FtTHkVQY
node2 https://pastebin.com/raw/Gu5YK6bh
node3 https://pastebin.com/raw/itu01AAa
node 4 https://pastebin.com/raw/r8mPEqKf
area of concern, and maybe nothing is even getting updated, each log contains some errors getting a module in python
Traceback (most recent call last): File "/opt/petasan/scripts/node_stats.py", line 168, in <module> get_stats() File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats graphite_sender.send(leader_ip) File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send raise Exception("Error running echo command :" + cmd) Exception: Error running echo command :echo "PetaSAN.NodeStats.peta4.cpu_all.percent_util 65.73 `date +%s`" "
Moving on adding additional nodes
Just added #5
Node5 https://pastebin.com/raw/DrincJhH
Ok, well FYI
disabled ip6 on nodes with in sysctl.conf, since I was not using it anyay
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
Now on the cluster leader
root@peta2:/etc# systemctl status apache2
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/apache2.service.d
└─apache2-systemd.conf
Active: active (running) since Thu 2021-02-04 03:48:05 EST; 4h 0min ago
Process: 224099 ExecStart=/usr/sbin/apachectl start (code=exited, status=0/SUCCESS)
Main PID: 224103 (apache2)
Tasks: 55 (limit: 4915)
CGroup: /system.slice/apache2.service
├─224103 /usr/sbin/apache2 -k start
├─224104 /usr/sbin/apache2 -k start
└─224105 /usr/sbin/apache2 -k start
Feb 04 03:48:05 peta2 systemd[1]: Starting The Apache HTTP Server...
Feb 04 03:48:05 peta2 apachectl[224099]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 10.10.0.152. Set the 'ServerName' directive globally to suppress this me
ssage
Feb 04 03:48:05 peta2 systemd[1]: Started The Apache HTTP Server.
root@peta2:/etc# netstat -npl | grep apache
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTE 224103/apache2
uploaded logs, I know this has to help some
node1 https://pastebin.com/raw/FtTHkVQY
node2 https://pastebin.com/raw/Gu5YK6bh
node3 https://pastebin.com/raw/itu01AAa
node 4 https://pastebin.com/raw/r8mPEqKf
area of concern, and maybe nothing is even getting updated, each log contains some errors getting a module in python
Traceback (most recent call last): File "/opt/petasan/scripts/node_stats.py", line 168, in <module> get_stats() File "/opt/petasan/scripts/node_stats.py", line 66, in get_stats graphite_sender.send(leader_ip) File "/usr/lib/python3/dist-packages/PetaSAN/core/common/graphite_sender.py", line 59, in send raise Exception("Error running echo command :" + cmd) Exception: Error running echo command :echo "PetaSAN.NodeStats.peta4.cpu_all.percent_util 65.73 `date +%s`" "
Moving on adding additional nodes
Just added #5
Node5 https://pastebin.com/raw/DrincJhH
davlaw
35 Posts
Quote from davlaw on February 4, 2021, 3:41 pmWell Grafana datasource
on the cluster leader IP Grafana control panel the datasource (DS1) http://localhost:8080 always gives error when I try to save and test.
using cluster leader IP + 3000 gives
If you're seeing this Grafana has failed to load its application files
etc etc
Guess this makes sense as logs contain alot of 404 errors
Well Grafana datasource
on the cluster leader IP Grafana control panel the datasource (DS1) http://localhost:8080 always gives error when I try to save and test.
using cluster leader IP + 3000 gives
If you're seeing this Grafana has failed to load its application files
etc etc
Guess this makes sense as logs contain alot of 404 errors