Forums - PetaSAN

ForumBug ReportingHigh memory usage of glusterfs pr …
You need to log in to create posts and topics. Login · Register
High memory usage of glusterfs process

mmenzo
8 Posts

December 29, 2017, 8:53 pm
Quote from mmenzo on December 29, 2017, 8:53 pm
Hello,

For a while now I've been having weird issues with our PetaSAN cluster. Our cluster consists of 3 physical servers, all of them running PetaSAN 1.4.0. A while ago our monitoring server gave an alert for one of those servers because of high memory usage. After investigating, it looked like a process was consuming about 5,5 GB of memory, while on the other servers in the cluster the same process only consumed about 400MB. The command which was consuming the high memory is:

/usr/sbin/glusterfs --volfile-server=192.168.162.180 --volfile-server=192.168.162.181 --volfile-id=gfs-vol /opt/petasan/config/shared

According to the memory graph, the server has slowly been consuming more memory over the last 4 weeks. As a quick fix, I've rebooted the server and after it came back up the memory usage seemed stable. A few weeks later, our monitoring gave another alert over the second storage server in our cluster. Since I've rebooted the first server, the problem has transferred to the second server, which has slowly been consuming more and more memory. After rebooting that server too, the third server in the cluster is giving an alert right now. Since I don't really see any other posts on the forum about this specific issue, I think there is something wrong with our configuration.

We are basically running the default configuration of PetaSAN, the only manual change we've made is the network configuration. Since we only had two physical uplinks at the time, we've setup VLANs in the configuration (which is done by manually installing the 'vlan' package, enabling the 801q module, updating /etc/network/interfaces and the /opt/petasan/config/cluster_info.json file on all servers). We know that this isn't really supported by default, but it worked fine in PetaSAN 1.3.0, only after updating to 1.4.0 we've been having these memory issues (but the storage cluster still works fine).

Here are some screenshots of the memory usage of our three storage servers and the process list of the affected server.

First server (which was restarted):

Second server (current issue):

Process list of the second server:

192.168.162.180 is the backend 1 interface of the server itself, and 192.168.162.181 is the backend 1 interface of the second server. The third server is 192.168.162.182.

I hope you can help us investigate this issue. I've not yet restarted the server which is having the memory issue right now, in case you want me to run specific commands. It would be a huge hassle to reinstall the full cluster since it's used as the storage layer for our office VMware cluster. Thanks in forwards!

Hello,

For a while now I've been having weird issues with our PetaSAN cluster. Our cluster consists of 3 physical servers, all of them running PetaSAN 1.4.0. A while ago our monitoring server gave an alert for one of those servers because of high memory usage. After investigating, it looked like a process was consuming about 5,5 GB of memory, while on the other servers in the cluster the same process only consumed about 400MB. The command which was consuming the high memory is:

/usr/sbin/glusterfs --volfile-server=192.168.162.180 --volfile-server=192.168.162.181 --volfile-id=gfs-vol /opt/petasan/config/shared

According to the memory graph, the server has slowly been consuming more memory over the last 4 weeks. As a quick fix, I've rebooted the server and after it came back up the memory usage seemed stable. A few weeks later, our monitoring gave another alert over the second storage server in our cluster. Since I've rebooted the first server, the problem has transferred to the second server, which has slowly been consuming more and more memory. After rebooting that server too, the third server in the cluster is giving an alert right now. Since I don't really see any other posts on the forum about this specific issue, I think there is something wrong with our configuration.

We are basically running the default configuration of PetaSAN, the only manual change we've made is the network configuration. Since we only had two physical uplinks at the time, we've setup VLANs in the configuration (which is done by manually installing the 'vlan' package, enabling the 801q module, updating /etc/network/interfaces and the /opt/petasan/config/cluster_info.json file on all servers). We know that this isn't really supported by default, but it worked fine in PetaSAN 1.3.0, only after updating to 1.4.0 we've been having these memory issues (but the storage cluster still works fine).

Here are some screenshots of the memory usage of our three storage servers and the process list of the affected server.

First server (which was restarted):

Second server (current issue):

Process list of the second server:

192.168.162.180 is the backend 1 interface of the server itself, and 192.168.162.181 is the backend 1 interface of the second server. The third server is 192.168.162.182.

I hope you can help us investigate this issue. I've not yet restarted the server which is having the memory issue right now, in case you want me to run specific commands. It would be a huge hassle to reinstall the full cluster since it's used as the storage layer for our office VMware cluster. Thanks in forwards!

Last edited on December 29, 2017, 8:54 pm by mmenzo · #1

admin
2,930 Posts

December 29, 2017, 10:32 pm
Quote from admin on December 29, 2017, 10:32 pm
This is the client component of gluster. It is used to mount the share /opt/petasan/config/shared

For now a simple fix could be to unmount this share

umount /opt/petasan/config/shared

this should terminate the process. PetaSAN services will automatically restart it within 30s but it should come up after freeing memory.

This share is currently used only to store the collected stats data which is displayed on the dashboard, a 30 s drop is not important. v 1.3 only collected Ceph cluster stats, v 1.4 adds node load stats for every node so it is much more data. Only 1 management node is the active stats collector, so this may explain why it is high on only one node.

We will look into it, but hopefully this could be a quick fix. Also the 16 G RAM is quite low for your nodes. If it works you can put it in a cron job.

This is the client component of gluster. It is used to mount the share /opt/petasan/config/shared

For now a simple fix could be to unmount this share

umount /opt/petasan/config/shared

this should terminate the process. PetaSAN services will automatically restart it within 30s but it should come up after freeing memory.

This share is currently used only to store the collected stats data which is displayed on the dashboard, a 30 s drop is not important. v 1.3 only collected Ceph cluster stats, v 1.4 adds node load stats for every node so it is much more data. Only 1 management node is the active stats collector, so this may explain why it is high on only one node.

We will look into it, but hopefully this could be a quick fix. Also the 16 G RAM is quite low for your nodes. If it works you can put it in a cron job.

Last edited on December 29, 2017, 10:46 pm by admin · #2

mmenzo
8 Posts

December 30, 2017, 11:44 am
Quote from mmenzo on December 30, 2017, 11:44 am
Thank you for your quick response! The quick fix you've posted worked and the memory usage is back to the same levels as the other servers. I'll see if I have a few more memory sticks laying around in order to increase the memory to about 32 GB or so (according to the recommended hardware PDF, this should be enough for our environment).

Thanks again! 🙂

Thank you for your quick response! The quick fix you've posted worked and the memory usage is back to the same levels as the other servers. I'll see if I have a few more memory sticks laying around in order to increase the memory to about 32 GB or so (according to the recommended hardware PDF, this should be enough for our environment).

Thanks again! 🙂

#3

Post Reply: High memory usage of glusterfs process

Cancel