High memory usage of glusterfs process
mmenzo
8 Posts
December 29, 2017, 8:53 pmQuote from mmenzo on December 29, 2017, 8:53 pmHello,
For a while now I've been having weird issues with our PetaSAN cluster. Our cluster consists of 3 physical servers, all of them running PetaSAN 1.4.0. A while ago our monitoring server gave an alert for one of those servers because of high memory usage. After investigating, it looked like a process was consuming about 5,5 GB of memory, while on the other servers in the cluster the same process only consumed about 400MB. The command which was consuming the high memory is:
/usr/sbin/glusterfs --volfile-server=192.168.162.180 --volfile-server=192.168.162.181 --volfile-id=gfs-vol /opt/petasan/config/shared
According to the memory graph, the server has slowly been consuming more memory over the last 4 weeks. As a quick fix, I've rebooted the server and after it came back up the memory usage seemed stable. A few weeks later, our monitoring gave another alert over the second storage server in our cluster. Since I've rebooted the first server, the problem has transferred to the second server, which has slowly been consuming more and more memory. After rebooting that server too, the third server in the cluster is giving an alert right now. Since I don't really see any other posts on the forum about this specific issue, I think there is something wrong with our configuration.
We are basically running the default configuration of PetaSAN, the only manual change we've made is the network configuration. Since we only had two physical uplinks at the time, we've setup VLANs in the configuration (which is done by manually installing the 'vlan' package, enabling the 801q module, updating /etc/network/interfaces and the /opt/petasan/config/cluster_info.json file on all servers). We know that this isn't really supported by default, but it worked fine in PetaSAN 1.3.0, only after updating to 1.4.0 we've been having these memory issues (but the storage cluster still works fine).
Here are some screenshots of the memory usage of our three storage servers and the process list of the affected server.
First server (which was restarted):
Second server (current issue):
Process list of the second server:
192.168.162.180 is the backend 1 interface of the server itself, and 192.168.162.181 is the backend 1 interface of the second server. The third server is 192.168.162.182.
I hope you can help us investigate this issue. I've not yet restarted the server which is having the memory issue right now, in case you want me to run specific commands. It would be a huge hassle to reinstall the full cluster since it's used as the storage layer for our office VMware cluster. Thanks in forwards!
Hello,
For a while now I've been having weird issues with our PetaSAN cluster. Our cluster consists of 3 physical servers, all of them running PetaSAN 1.4.0. A while ago our monitoring server gave an alert for one of those servers because of high memory usage. After investigating, it looked like a process was consuming about 5,5 GB of memory, while on the other servers in the cluster the same process only consumed about 400MB. The command which was consuming the high memory is:
/usr/sbin/glusterfs --volfile-server=192.168.162.180 --volfile-server=192.168.162.181 --volfile-id=gfs-vol /opt/petasan/config/shared
According to the memory graph, the server has slowly been consuming more memory over the last 4 weeks. As a quick fix, I've rebooted the server and after it came back up the memory usage seemed stable. A few weeks later, our monitoring gave another alert over the second storage server in our cluster. Since I've rebooted the first server, the problem has transferred to the second server, which has slowly been consuming more and more memory. After rebooting that server too, the third server in the cluster is giving an alert right now. Since I don't really see any other posts on the forum about this specific issue, I think there is something wrong with our configuration.
We are basically running the default configuration of PetaSAN, the only manual change we've made is the network configuration. Since we only had two physical uplinks at the time, we've setup VLANs in the configuration (which is done by manually installing the 'vlan' package, enabling the 801q module, updating /etc/network/interfaces and the /opt/petasan/config/cluster_info.json file on all servers). We know that this isn't really supported by default, but it worked fine in PetaSAN 1.3.0, only after updating to 1.4.0 we've been having these memory issues (but the storage cluster still works fine).
Here are some screenshots of the memory usage of our three storage servers and the process list of the affected server.
First server (which was restarted):
Second server (current issue):
Process list of the second server:
192.168.162.180 is the backend 1 interface of the server itself, and 192.168.162.181 is the backend 1 interface of the second server. The third server is 192.168.162.182.
I hope you can help us investigate this issue. I've not yet restarted the server which is having the memory issue right now, in case you want me to run specific commands. It would be a huge hassle to reinstall the full cluster since it's used as the storage layer for our office VMware cluster. Thanks in forwards!
Last edited on December 29, 2017, 8:54 pm by mmenzo · #1
admin
2,930 Posts
December 29, 2017, 10:32 pmQuote from admin on December 29, 2017, 10:32 pmThis is the client component of gluster. It is used to mount the share /opt/petasan/config/shared
For now a simple fix could be to unmount this share
umount /opt/petasan/config/shared
this should terminate the process. PetaSAN services will automatically restart it within 30s but it should come up after freeing memory.
This share is currently used only to store the collected stats data which is displayed on the dashboard, a 30 s drop is not important. v 1.3 only collected Ceph cluster stats, v 1.4 adds node load stats for every node so it is much more data. Only 1 management node is the active stats collector, so this may explain why it is high on only one node.
We will look into it, but hopefully this could be a quick fix. Also the 16 G RAM is quite low for your nodes. If it works you can put it in a cron job.
This is the client component of gluster. It is used to mount the share /opt/petasan/config/shared
For now a simple fix could be to unmount this share
umount /opt/petasan/config/shared
this should terminate the process. PetaSAN services will automatically restart it within 30s but it should come up after freeing memory.
This share is currently used only to store the collected stats data which is displayed on the dashboard, a 30 s drop is not important. v 1.3 only collected Ceph cluster stats, v 1.4 adds node load stats for every node so it is much more data. Only 1 management node is the active stats collector, so this may explain why it is high on only one node.
We will look into it, but hopefully this could be a quick fix. Also the 16 G RAM is quite low for your nodes. If it works you can put it in a cron job.
Last edited on December 29, 2017, 10:46 pm by admin · #2
mmenzo
8 Posts
December 30, 2017, 11:44 amQuote from mmenzo on December 30, 2017, 11:44 amThank you for your quick response! The quick fix you've posted worked and the memory usage is back to the same levels as the other servers. I'll see if I have a few more memory sticks laying around in order to increase the memory to about 32 GB or so (according to the recommended hardware PDF, this should be enough for our environment).
Thanks again! 🙂
Thank you for your quick response! The quick fix you've posted worked and the memory usage is back to the same levels as the other servers. I'll see if I have a few more memory sticks laying around in order to increase the memory to about 32 GB or so (according to the recommended hardware PDF, this should be enough for our environment).
Thanks again! 🙂
High memory usage of glusterfs process
mmenzo
8 Posts
Quote from mmenzo on December 29, 2017, 8:53 pmHello,
For a while now I've been having weird issues with our PetaSAN cluster. Our cluster consists of 3 physical servers, all of them running PetaSAN 1.4.0. A while ago our monitoring server gave an alert for one of those servers because of high memory usage. After investigating, it looked like a process was consuming about 5,5 GB of memory, while on the other servers in the cluster the same process only consumed about 400MB. The command which was consuming the high memory is:
/usr/sbin/glusterfs --volfile-server=192.168.162.180 --volfile-server=192.168.162.181 --volfile-id=gfs-vol /opt/petasan/config/shared
According to the memory graph, the server has slowly been consuming more memory over the last 4 weeks. As a quick fix, I've rebooted the server and after it came back up the memory usage seemed stable. A few weeks later, our monitoring gave another alert over the second storage server in our cluster. Since I've rebooted the first server, the problem has transferred to the second server, which has slowly been consuming more and more memory. After rebooting that server too, the third server in the cluster is giving an alert right now. Since I don't really see any other posts on the forum about this specific issue, I think there is something wrong with our configuration.
We are basically running the default configuration of PetaSAN, the only manual change we've made is the network configuration. Since we only had two physical uplinks at the time, we've setup VLANs in the configuration (which is done by manually installing the 'vlan' package, enabling the 801q module, updating /etc/network/interfaces and the /opt/petasan/config/cluster_info.json file on all servers). We know that this isn't really supported by default, but it worked fine in PetaSAN 1.3.0, only after updating to 1.4.0 we've been having these memory issues (but the storage cluster still works fine).
Here are some screenshots of the memory usage of our three storage servers and the process list of the affected server.
First server (which was restarted):
Second server (current issue):
Process list of the second server:
192.168.162.180 is the backend 1 interface of the server itself, and 192.168.162.181 is the backend 1 interface of the second server. The third server is 192.168.162.182.
I hope you can help us investigate this issue. I've not yet restarted the server which is having the memory issue right now, in case you want me to run specific commands. It would be a huge hassle to reinstall the full cluster since it's used as the storage layer for our office VMware cluster. Thanks in forwards!
Hello,
For a while now I've been having weird issues with our PetaSAN cluster. Our cluster consists of 3 physical servers, all of them running PetaSAN 1.4.0. A while ago our monitoring server gave an alert for one of those servers because of high memory usage. After investigating, it looked like a process was consuming about 5,5 GB of memory, while on the other servers in the cluster the same process only consumed about 400MB. The command which was consuming the high memory is:
/usr/sbin/glusterfs --volfile-server=192.168.162.180 --volfile-server=192.168.162.181 --volfile-id=gfs-vol /opt/petasan/config/shared
According to the memory graph, the server has slowly been consuming more memory over the last 4 weeks. As a quick fix, I've rebooted the server and after it came back up the memory usage seemed stable. A few weeks later, our monitoring gave another alert over the second storage server in our cluster. Since I've rebooted the first server, the problem has transferred to the second server, which has slowly been consuming more and more memory. After rebooting that server too, the third server in the cluster is giving an alert right now. Since I don't really see any other posts on the forum about this specific issue, I think there is something wrong with our configuration.
We are basically running the default configuration of PetaSAN, the only manual change we've made is the network configuration. Since we only had two physical uplinks at the time, we've setup VLANs in the configuration (which is done by manually installing the 'vlan' package, enabling the 801q module, updating /etc/network/interfaces and the /opt/petasan/config/cluster_info.json file on all servers). We know that this isn't really supported by default, but it worked fine in PetaSAN 1.3.0, only after updating to 1.4.0 we've been having these memory issues (but the storage cluster still works fine).
Here are some screenshots of the memory usage of our three storage servers and the process list of the affected server.
First server (which was restarted):
Second server (current issue):
Process list of the second server:
192.168.162.180 is the backend 1 interface of the server itself, and 192.168.162.181 is the backend 1 interface of the second server. The third server is 192.168.162.182.
I hope you can help us investigate this issue. I've not yet restarted the server which is having the memory issue right now, in case you want me to run specific commands. It would be a huge hassle to reinstall the full cluster since it's used as the storage layer for our office VMware cluster. Thanks in forwards!
admin
2,930 Posts
Quote from admin on December 29, 2017, 10:32 pmThis is the client component of gluster. It is used to mount the share /opt/petasan/config/shared
For now a simple fix could be to unmount this share
umount /opt/petasan/config/shared
this should terminate the process. PetaSAN services will automatically restart it within 30s but it should come up after freeing memory.
This share is currently used only to store the collected stats data which is displayed on the dashboard, a 30 s drop is not important. v 1.3 only collected Ceph cluster stats, v 1.4 adds node load stats for every node so it is much more data. Only 1 management node is the active stats collector, so this may explain why it is high on only one node.
We will look into it, but hopefully this could be a quick fix. Also the 16 G RAM is quite low for your nodes. If it works you can put it in a cron job.
This is the client component of gluster. It is used to mount the share /opt/petasan/config/shared
For now a simple fix could be to unmount this share
umount /opt/petasan/config/shared
this should terminate the process. PetaSAN services will automatically restart it within 30s but it should come up after freeing memory.
This share is currently used only to store the collected stats data which is displayed on the dashboard, a 30 s drop is not important. v 1.3 only collected Ceph cluster stats, v 1.4 adds node load stats for every node so it is much more data. Only 1 management node is the active stats collector, so this may explain why it is high on only one node.
We will look into it, but hopefully this could be a quick fix. Also the 16 G RAM is quite low for your nodes. If it works you can put it in a cron job.
mmenzo
8 Posts
Quote from mmenzo on December 30, 2017, 11:44 amThank you for your quick response! The quick fix you've posted worked and the memory usage is back to the same levels as the other servers. I'll see if I have a few more memory sticks laying around in order to increase the memory to about 32 GB or so (according to the recommended hardware PDF, this should be enough for our environment).
Thanks again! 🙂
Thank you for your quick response! The quick fix you've posted worked and the memory usage is back to the same levels as the other servers. I'll see if I have a few more memory sticks laying around in order to increase the memory to about 32 GB or so (according to the recommended hardware PDF, this should be enough for our environment).
Thanks again! 🙂