Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Bad Gateway if 1st mgmt mode is down

Pages: 1 2

Hi admin,

If the first mgmt node is down, the graph is showing "502 Bad Gateway". Can we fix that?

We're running 2.5.3

Thank you and kind regards,

Reto

if you refresh the browser do you still get this error ? it may take like a minute for the charts  to work on failover.

No, it was still broken after 10 or 15 minutes.

is this something you can reproduce now ?

what is the output of

/opt/petasan/scripts/util/get_cluster_leader.py

 

Same Issue here.

if i then log in via another node ip it works.

can you run the command

/opt/petasan/scripts/util/get_cluster_leader.py

on the node that shows the error and another one that works.

could it be load related, when you shut one node there could be recovery traffic/load, does it work after recovery finishes ?

At the moment this hasn't happened, so I can't test it. if it happens again I'll try it.

I have had this same issue, to varying degrees, any time a node is rebooted or is down. In some cases it seems to recover, but the amount of time seems inconsistent. Some times it recovers with in a few minutes, some times it takes longer. However...

Right now, after needing to down node 1 for a short period of time, graphs are appear to be permanently broken. It has been more than 24 hours since this event, and graphs have not recovered. Either getting no data, and the red exclamation in the upper left corner (Request Error). Or I get the 502 Bad Gateway error. When I ctrl-F5 to force browser refresh, get the same errors. This happens regardless of which node IP I log into, all of them show this same issue. It will also at times flash a large red error "Annotation Query Failed" (bad gateway) over the graph area.

They also all show the same results for: /opt/petasan/scripts/util/get_cluster_leader.py

This is on PetaSAN 3.1.0.

Since this is currently a lab environment, I have some time to experiment. If anyone has suggestions as to logs to check, or other trouble shooting steps? Would be nice to get graphs to recover properly after a node reboot. I want to say at one point, when it was stuck like this (broken across all nodes for more than a few hours), I shutdown the entire cluster. When I brought all nodes back up with a fresh boot, graphs were restored. However that would not be an acceptable fix in a production environment.

-Greg-

Hello @rootcurry, we are having this issue on our production cluster, last Sunday I updated from 3.0.1 to 3.1.0 and statistics stopped to work on all nodes, I opened this thread [1] in General Discussion but I guess I should put it in here, unfortunately I don't have the time to do a "reverse" engineering on how they are handling statistics on PetaSAN so for now we are on our own with this issue. I can't reboot the nodes neither. If you find anything when TS your test cluster and don't mind you please share it. It's sad, we have been using PetaSAN for a little more than 2 years and the issues has been minimum so far.

[1] http://www.petasan.org/forums/?view=thread&id=1091

1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?

3) is shared file system mounted on all nodes ?
mount | grep shared

4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py

on that server, what is status of graphite service
systemctl status carbon-cache

5) what is output of

gluster vol status gfs-vol

Pages: 1 2