Forums - PetaSAN

ForumBug ReportingBad Gateway if 1st mgmt mode is d …
You need to log in to create posts and topics. Login · Register
Bad Gateway if 1st mgmt mode is down

Pages: 1 2

RST
17 Posts

July 24, 2020, 6:37 am
Quote from RST on July 24, 2020, 6:37 am
Hi admin,

If the first mgmt node is down, the graph is showing "502 Bad Gateway". Can we fix that?

We're running 2.5.3

Thank you and kind regards,

Reto

Hi admin,

If the first mgmt node is down, the graph is showing "502 Bad Gateway". Can we fix that?

We're running 2.5.3

Thank you and kind regards,

Reto

#1

admin
2,930 Posts

July 28, 2020, 4:01 pm
Quote from admin on July 28, 2020, 4:01 pm
if you refresh the browser do you still get this error ? it may take like a minute for the charts to work on failover.

if you refresh the browser do you still get this error ? it may take like a minute for the charts to work on failover.

#2

RST
17 Posts

July 28, 2020, 4:09 pm
Quote from RST on July 28, 2020, 4:09 pm
No, it was still broken after 10 or 15 minutes.

No, it was still broken after 10 or 15 minutes.

#3

admin
2,930 Posts

July 28, 2020, 5:10 pm
Quote from admin on July 28, 2020, 5:10 pm
is this something you can reproduce now ?

what is the output of

/opt/petasan/scripts/util/get_cluster_leader.py

is this something you can reproduce now ?

what is the output of

/opt/petasan/scripts/util/get_cluster_leader.py

#4

exitsys
43 Posts

September 28, 2020, 11:56 pm
Quote from exitsys on September 28, 2020, 11:56 pm
Same Issue here.

if i then log in via another node ip it works.

Same Issue here.

if i then log in via another node ip it works.

Last edited on September 29, 2020, 12:16 am by exitsys · #5

admin
2,930 Posts

September 29, 2020, 10:49 am
Quote from admin on September 29, 2020, 10:49 am
can you run the command

/opt/petasan/scripts/util/get_cluster_leader.py

on the node that shows the error and another one that works.

could it be load related, when you shut one node there could be recovery traffic/load, does it work after recovery finishes ?

can you run the command

/opt/petasan/scripts/util/get_cluster_leader.py

on the node that shows the error and another one that works.

could it be load related, when you shut one node there could be recovery traffic/load, does it work after recovery finishes ?

#6

exitsys
43 Posts

October 1, 2020, 8:52 pm
Quote from exitsys on October 1, 2020, 8:52 pm
At the moment this hasn't happened, so I can't test it. if it happens again I'll try it.

At the moment this hasn't happened, so I can't test it. if it happens again I'll try it.

#7

rootcurry
3 Posts

July 26, 2022, 1:31 pm
Quote from rootcurry on July 26, 2022, 1:31 pm
I have had this same issue, to varying degrees, any time a node is rebooted or is down. In some cases it seems to recover, but the amount of time seems inconsistent. Some times it recovers with in a few minutes, some times it takes longer. However...

Right now, after needing to down node 1 for a short period of time, graphs are appear to be permanently broken. It has been more than 24 hours since this event, and graphs have not recovered. Either getting no data, and the red exclamation in the upper left corner (Request Error). Or I get the 502 Bad Gateway error. When I ctrl-F5 to force browser refresh, get the same errors. This happens regardless of which node IP I log into, all of them show this same issue. It will also at times flash a large red error "Annotation Query Failed" (bad gateway) over the graph area.

They also all show the same results for: /opt/petasan/scripts/util/get_cluster_leader.py

This is on PetaSAN 3.1.0.

Since this is currently a lab environment, I have some time to experiment. If anyone has suggestions as to logs to check, or other trouble shooting steps? Would be nice to get graphs to recover properly after a node reboot. I want to say at one point, when it was stuck like this (broken across all nodes for more than a few hours), I shutdown the entire cluster. When I brought all nodes back up with a fresh boot, graphs were restored. However that would not be an acceptable fix in a production environment.

-Greg-

I have had this same issue, to varying degrees, any time a node is rebooted or is down. In some cases it seems to recover, but the amount of time seems inconsistent. Some times it recovers with in a few minutes, some times it takes longer. However...

Right now, after needing to down node 1 for a short period of time, graphs are appear to be permanently broken. It has been more than 24 hours since this event, and graphs have not recovered. Either getting no data, and the red exclamation in the upper left corner (Request Error). Or I get the 502 Bad Gateway error. When I ctrl-F5 to force browser refresh, get the same errors. This happens regardless of which node IP I log into, all of them show this same issue. It will also at times flash a large red error "Annotation Query Failed" (bad gateway) over the graph area.

They also all show the same results for: /opt/petasan/scripts/util/get_cluster_leader.py

This is on PetaSAN 3.1.0.

Since this is currently a lab environment, I have some time to experiment. If anyone has suggestions as to logs to check, or other trouble shooting steps? Would be nice to get graphs to recover properly after a node reboot. I want to say at one point, when it was stuck like this (broken across all nodes for more than a few hours), I shutdown the entire cluster. When I brought all nodes back up with a fresh boot, graphs were restored. However that would not be an acceptable fix in a production environment.

-Greg-

#8

JG
26 Posts

July 26, 2022, 7:35 pm
Quote from JG on July 26, 2022, 7:35 pm
Hello @rootcurry, we are having this issue on our production cluster, last Sunday I updated from 3.0.1 to 3.1.0 and statistics stopped to work on all nodes, I opened this thread [1] in General Discussion but I guess I should put it in here, unfortunately I don't have the time to do a "reverse" engineering on how they are handling statistics on PetaSAN so for now we are on our own with this issue. I can't reboot the nodes neither. If you find anything when TS your test cluster and don't mind you please share it. It's sad, we have been using PetaSAN for a little more than 2 years and the issues has been minimum so far.

[1] http://www.petasan.org/forums/?view=thread&id=1091

Hello @rootcurry, we are having this issue on our production cluster, last Sunday I updated from 3.0.1 to 3.1.0 and statistics stopped to work on all nodes, I opened this thread [1] in General Discussion but I guess I should put it in here, unfortunately I don't have the time to do a "reverse" engineering on how they are handling statistics on PetaSAN so for now we are on our own with this issue. I can't reboot the nodes neither. If you find anything when TS your test cluster and don't mind you please share it. It's sad, we have been using PetaSAN for a little more than 2 years and the issues has been minimum so far.

[1] http://www.petasan.org/forums/?view=thread&id=1091

Last edited on July 26, 2022, 7:35 pm by JG · #9

admin
2,930 Posts

July 26, 2022, 11:49 pm
Quote from admin on July 26, 2022, 11:49 pm
1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?

3) is shared file system mounted on all nodes ?
mount | grep shared

4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py

on that server, what is status of graphite service
systemctl status carbon-cache

5) what is output of

gluster vol status gfs-vol

1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?

3) is shared file system mounted on all nodes ?
mount | grep shared

4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py

on that server, what is status of graphite service
systemctl status carbon-cache

5) what is output of

gluster vol status gfs-vol

Last edited on July 26, 2022, 11:52 pm by admin · #10

Post Reply: Bad Gateway if 1st mgmt mode is down

Cancel

Pages: 1 2