Bad Gateway if 1st mgmt mode is down
Pages: 1 2
RST
17 Posts
July 24, 2020, 6:37 amQuote from RST on July 24, 2020, 6:37 amHi admin,
If the first mgmt node is down, the graph is showing "502 Bad Gateway". Can we fix that?
We're running 2.5.3
Thank you and kind regards,
Reto
Hi admin,
If the first mgmt node is down, the graph is showing "502 Bad Gateway". Can we fix that?
We're running 2.5.3
Thank you and kind regards,
Reto
admin
2,930 Posts
July 28, 2020, 4:01 pmQuote from admin on July 28, 2020, 4:01 pmif you refresh the browser do you still get this error ? it may take like a minute for the charts to work on failover.
if you refresh the browser do you still get this error ? it may take like a minute for the charts to work on failover.
RST
17 Posts
July 28, 2020, 4:09 pmQuote from RST on July 28, 2020, 4:09 pmNo, it was still broken after 10 or 15 minutes.
No, it was still broken after 10 or 15 minutes.
admin
2,930 Posts
July 28, 2020, 5:10 pmQuote from admin on July 28, 2020, 5:10 pmis this something you can reproduce now ?
what is the output of
/opt/petasan/scripts/util/get_cluster_leader.py
is this something you can reproduce now ?
what is the output of
/opt/petasan/scripts/util/get_cluster_leader.py
exitsys
43 Posts
September 28, 2020, 11:56 pmQuote from exitsys on September 28, 2020, 11:56 pmSame Issue here.
if i then log in via another node ip it works.
Same Issue here.
if i then log in via another node ip it works.
Last edited on September 29, 2020, 12:16 am by exitsys · #5
admin
2,930 Posts
September 29, 2020, 10:49 amQuote from admin on September 29, 2020, 10:49 amcan you run the command
/opt/petasan/scripts/util/get_cluster_leader.py
on the node that shows the error and another one that works.
could it be load related, when you shut one node there could be recovery traffic/load, does it work after recovery finishes ?
can you run the command
/opt/petasan/scripts/util/get_cluster_leader.py
on the node that shows the error and another one that works.
could it be load related, when you shut one node there could be recovery traffic/load, does it work after recovery finishes ?
exitsys
43 Posts
October 1, 2020, 8:52 pmQuote from exitsys on October 1, 2020, 8:52 pmAt the moment this hasn't happened, so I can't test it. if it happens again I'll try it.
At the moment this hasn't happened, so I can't test it. if it happens again I'll try it.
rootcurry
3 Posts
July 26, 2022, 1:31 pmQuote from rootcurry on July 26, 2022, 1:31 pmI have had this same issue, to varying degrees, any time a node is rebooted or is down. In some cases it seems to recover, but the amount of time seems inconsistent. Some times it recovers with in a few minutes, some times it takes longer. However...
Right now, after needing to down node 1 for a short period of time, graphs are appear to be permanently broken. It has been more than 24 hours since this event, and graphs have not recovered. Either getting no data, and the red exclamation in the upper left corner (Request Error). Or I get the 502 Bad Gateway error. When I ctrl-F5 to force browser refresh, get the same errors. This happens regardless of which node IP I log into, all of them show this same issue. It will also at times flash a large red error "Annotation Query Failed" (bad gateway) over the graph area.
They also all show the same results for: /opt/petasan/scripts/util/get_cluster_leader.py
This is on PetaSAN 3.1.0.
Since this is currently a lab environment, I have some time to experiment. If anyone has suggestions as to logs to check, or other trouble shooting steps? Would be nice to get graphs to recover properly after a node reboot. I want to say at one point, when it was stuck like this (broken across all nodes for more than a few hours), I shutdown the entire cluster. When I brought all nodes back up with a fresh boot, graphs were restored. However that would not be an acceptable fix in a production environment.
-Greg-
I have had this same issue, to varying degrees, any time a node is rebooted or is down. In some cases it seems to recover, but the amount of time seems inconsistent. Some times it recovers with in a few minutes, some times it takes longer. However...
Right now, after needing to down node 1 for a short period of time, graphs are appear to be permanently broken. It has been more than 24 hours since this event, and graphs have not recovered. Either getting no data, and the red exclamation in the upper left corner (Request Error). Or I get the 502 Bad Gateway error. When I ctrl-F5 to force browser refresh, get the same errors. This happens regardless of which node IP I log into, all of them show this same issue. It will also at times flash a large red error "Annotation Query Failed" (bad gateway) over the graph area.
They also all show the same results for: /opt/petasan/scripts/util/get_cluster_leader.py
This is on PetaSAN 3.1.0.
Since this is currently a lab environment, I have some time to experiment. If anyone has suggestions as to logs to check, or other trouble shooting steps? Would be nice to get graphs to recover properly after a node reboot. I want to say at one point, when it was stuck like this (broken across all nodes for more than a few hours), I shutdown the entire cluster. When I brought all nodes back up with a fresh boot, graphs were restored. However that would not be an acceptable fix in a production environment.
-Greg-
JG
26 Posts
July 26, 2022, 7:35 pmQuote from JG on July 26, 2022, 7:35 pmHello @rootcurry, we are having this issue on our production cluster, last Sunday I updated from 3.0.1 to 3.1.0 and statistics stopped to work on all nodes, I opened this thread [1] in General Discussion but I guess I should put it in here, unfortunately I don't have the time to do a "reverse" engineering on how they are handling statistics on PetaSAN so for now we are on our own with this issue. I can't reboot the nodes neither. If you find anything when TS your test cluster and don't mind you please share it. It's sad, we have been using PetaSAN for a little more than 2 years and the issues has been minimum so far.
[1] http://www.petasan.org/forums/?view=thread&id=1091
Hello @rootcurry, we are having this issue on our production cluster, last Sunday I updated from 3.0.1 to 3.1.0 and statistics stopped to work on all nodes, I opened this thread [1] in General Discussion but I guess I should put it in here, unfortunately I don't have the time to do a "reverse" engineering on how they are handling statistics on PetaSAN so for now we are on our own with this issue. I can't reboot the nodes neither. If you find anything when TS your test cluster and don't mind you please share it. It's sad, we have been using PetaSAN for a little more than 2 years and the issues has been minimum so far.
[1] http://www.petasan.org/forums/?view=thread&id=1091
Last edited on July 26, 2022, 7:35 pm by JG · #9
admin
2,930 Posts
July 26, 2022, 11:49 pmQuote from admin on July 26, 2022, 11:49 pm1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?
3) is shared file system mounted on all nodes ?
mount | grep shared
4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on that server, what is status of graphite service
systemctl status carbon-cache
5) what is output of
gluster vol status gfs-vol
1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?
3) is shared file system mounted on all nodes ?
mount | grep shared
4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on that server, what is status of graphite service
systemctl status carbon-cache
5) what is output of
gluster vol status gfs-vol
Last edited on July 26, 2022, 11:52 pm by admin · #10
Pages: 1 2
Bad Gateway if 1st mgmt mode is down
RST
17 Posts
Quote from RST on July 24, 2020, 6:37 amHi admin,
If the first mgmt node is down, the graph is showing "502 Bad Gateway". Can we fix that?
We're running 2.5.3
Thank you and kind regards,
Reto
Hi admin,
If the first mgmt node is down, the graph is showing "502 Bad Gateway". Can we fix that?
We're running 2.5.3
Thank you and kind regards,
Reto
admin
2,930 Posts
Quote from admin on July 28, 2020, 4:01 pmif you refresh the browser do you still get this error ? it may take like a minute for the charts to work on failover.
if you refresh the browser do you still get this error ? it may take like a minute for the charts to work on failover.
RST
17 Posts
Quote from RST on July 28, 2020, 4:09 pmNo, it was still broken after 10 or 15 minutes.
No, it was still broken after 10 or 15 minutes.
admin
2,930 Posts
Quote from admin on July 28, 2020, 5:10 pmis this something you can reproduce now ?
what is the output of
/opt/petasan/scripts/util/get_cluster_leader.py
is this something you can reproduce now ?
what is the output of
/opt/petasan/scripts/util/get_cluster_leader.py
exitsys
43 Posts
Quote from exitsys on September 28, 2020, 11:56 pmSame Issue here.
if i then log in via another node ip it works.
Same Issue here.
if i then log in via another node ip it works.
admin
2,930 Posts
Quote from admin on September 29, 2020, 10:49 amcan you run the command
/opt/petasan/scripts/util/get_cluster_leader.py
on the node that shows the error and another one that works.
could it be load related, when you shut one node there could be recovery traffic/load, does it work after recovery finishes ?
can you run the command
/opt/petasan/scripts/util/get_cluster_leader.py
on the node that shows the error and another one that works.
could it be load related, when you shut one node there could be recovery traffic/load, does it work after recovery finishes ?
exitsys
43 Posts
Quote from exitsys on October 1, 2020, 8:52 pmAt the moment this hasn't happened, so I can't test it. if it happens again I'll try it.
At the moment this hasn't happened, so I can't test it. if it happens again I'll try it.
rootcurry
3 Posts
Quote from rootcurry on July 26, 2022, 1:31 pmI have had this same issue, to varying degrees, any time a node is rebooted or is down. In some cases it seems to recover, but the amount of time seems inconsistent. Some times it recovers with in a few minutes, some times it takes longer. However...
Right now, after needing to down node 1 for a short period of time, graphs are appear to be permanently broken. It has been more than 24 hours since this event, and graphs have not recovered. Either getting no data, and the red exclamation in the upper left corner (Request Error). Or I get the 502 Bad Gateway error. When I ctrl-F5 to force browser refresh, get the same errors. This happens regardless of which node IP I log into, all of them show this same issue. It will also at times flash a large red error "Annotation Query Failed" (bad gateway) over the graph area.
They also all show the same results for: /opt/petasan/scripts/util/get_cluster_leader.py
This is on PetaSAN 3.1.0.
Since this is currently a lab environment, I have some time to experiment. If anyone has suggestions as to logs to check, or other trouble shooting steps? Would be nice to get graphs to recover properly after a node reboot. I want to say at one point, when it was stuck like this (broken across all nodes for more than a few hours), I shutdown the entire cluster. When I brought all nodes back up with a fresh boot, graphs were restored. However that would not be an acceptable fix in a production environment.
-Greg-
I have had this same issue, to varying degrees, any time a node is rebooted or is down. In some cases it seems to recover, but the amount of time seems inconsistent. Some times it recovers with in a few minutes, some times it takes longer. However...
Right now, after needing to down node 1 for a short period of time, graphs are appear to be permanently broken. It has been more than 24 hours since this event, and graphs have not recovered. Either getting no data, and the red exclamation in the upper left corner (Request Error). Or I get the 502 Bad Gateway error. When I ctrl-F5 to force browser refresh, get the same errors. This happens regardless of which node IP I log into, all of them show this same issue. It will also at times flash a large red error "Annotation Query Failed" (bad gateway) over the graph area.
They also all show the same results for: /opt/petasan/scripts/util/get_cluster_leader.py
This is on PetaSAN 3.1.0.
Since this is currently a lab environment, I have some time to experiment. If anyone has suggestions as to logs to check, or other trouble shooting steps? Would be nice to get graphs to recover properly after a node reboot. I want to say at one point, when it was stuck like this (broken across all nodes for more than a few hours), I shutdown the entire cluster. When I brought all nodes back up with a fresh boot, graphs were restored. However that would not be an acceptable fix in a production environment.
-Greg-
JG
26 Posts
Quote from JG on July 26, 2022, 7:35 pmHello @rootcurry, we are having this issue on our production cluster, last Sunday I updated from 3.0.1 to 3.1.0 and statistics stopped to work on all nodes, I opened this thread [1] in General Discussion but I guess I should put it in here, unfortunately I don't have the time to do a "reverse" engineering on how they are handling statistics on PetaSAN so for now we are on our own with this issue. I can't reboot the nodes neither. If you find anything when TS your test cluster and don't mind you please share it. It's sad, we have been using PetaSAN for a little more than 2 years and the issues has been minimum so far.
[1] http://www.petasan.org/forums/?view=thread&id=1091
Hello @rootcurry, we are having this issue on our production cluster, last Sunday I updated from 3.0.1 to 3.1.0 and statistics stopped to work on all nodes, I opened this thread [1] in General Discussion but I guess I should put it in here, unfortunately I don't have the time to do a "reverse" engineering on how they are handling statistics on PetaSAN so for now we are on our own with this issue. I can't reboot the nodes neither. If you find anything when TS your test cluster and don't mind you please share it. It's sad, we have been using PetaSAN for a little more than 2 years and the issues has been minimum so far.
[1] http://www.petasan.org/forums/?view=thread&id=1091
admin
2,930 Posts
Quote from admin on July 26, 2022, 11:49 pm1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?3) is shared file system mounted on all nodes ?
mount | grep shared4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.pyon that server, what is status of graphite service
systemctl status carbon-cache5) what is output of
gluster vol status gfs-vol
1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?
3) is shared file system mounted on all nodes ?
mount | grep shared
4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on that server, what is status of graphite service
systemctl status carbon-cache
5) what is output of
gluster vol status gfs-vol