Forums - PetaSAN

ForumBug ReportingMemory leak
You need to log in to create posts and topics. Login · Register
Memory leak

netatvision
10 Posts

March 30, 2019, 4:51 pm
Quote from netatvision on March 30, 2019, 4:51 pm
Hello Hatem,

it seems, that there is a memory leak. we have four HPG9 servers, each with 64gig, 6 ssd osds and 2 210ten gig nics. The free memory is decreasing every day. As we started with PetaSAN, we´ve had 32gig and after a few days some osds where crashing and we could see, that it was a problem of to less memory. We increased the memory to 64gig, we are now scared about loose every day mb´s of free ram and the osd crash will start again.

Regards, Carsten

Hello Hatem,

it seems, that there is a memory leak. we have four HPG9 servers, each with 64gig, 6 ssd osds and 2 210ten gig nics. The free memory is decreasing every day. As we started with PetaSAN, we´ve had 32gig and after a few days some osds where crashing and we could see, that it was a problem of to less memory. We increased the memory to 64gig, we are now scared about loose every day mb´s of free ram and the osd crash will start again.

Regards, Carsten

Last edited on March 30, 2019, 5:26 pm by netatvision · #1

admin
2,930 Posts

March 30, 2019, 5:47 pm
Quote from admin on March 30, 2019, 5:47 pm
see what processes are taking memory

atop -m

32GB is on the low side, if you find your osds are talking too much memory, lower it via

bluestore_cache_size_ssd = 1073741824

in your conf file ( default is 3G ) and restart the OSDs

see what processes are taking memory

atop -m

32GB is on the low side, if you find your osds are talking too much memory, lower it via

bluestore_cache_size_ssd = 1073741824

in your conf file ( default is 3G ) and restart the OSDs

#2

netatvision
10 Posts

March 30, 2019, 6:18 pm
Quote from netatvision on March 30, 2019, 6:18 pm
https://we.tl/t-sUezOg4E4X

There is a screenshot and a free.log from the past days. It seems that every osd is taking about 5.5Gig. In the moment it´s still enough free memory.

https://we.tl/t-sUezOg4E4X

There is a screenshot and a free.log from the past days. It seems that every osd is taking about 5.5Gig. In the moment it´s still enough free memory.

#3

admin
2,930 Posts

March 30, 2019, 6:43 pm
Quote from admin on March 30, 2019, 6:43 pm
If it stops increasing you are ok.

there has been recent changes to osd caching to improve performance by increasing the cache size, specially for ssds, in 2.2 it is 3 G, in 2.3 it is 4 G for cache size aside from the actual osd memory. it should be reflected in the docs which state 2 G.

you can adjust it via the conf value posted earlier,

If it stops increasing you are ok.

there has been recent changes to osd caching to improve performance by increasing the cache size, specially for ssds, in 2.2 it is 3 G, in 2.3 it is 4 G for cache size aside from the actual osd memory. it should be reflected in the docs which state 2 G.

you can adjust it via the conf value posted earlier,

Last edited on March 30, 2019, 6:45 pm by admin · #4

netatvision
10 Posts

April 1, 2019, 1:21 pm
Quote from netatvision on April 1, 2019, 1:21 pm
Hello,

the system has currently 64Gig RAM. But the memory goes less and less.

top - 14:37:52 up 8 days, 15:36, 1 user, load average: 0.27, 0.35, 0.41
Tasks: 407 total,   1 running, 406 sleeping,   0 stopped,   0 zombie
%Cpu(s): 1.4 us, 0.9 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 10335688 free, 52682724 used, 2808028 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 12097032 avail Mem

PID USER      PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
3361 ceph      20   0 5742732 4.490g 29416 S   0.0 7.2 811:15.15 ceph-osd
2245 ceph      20   0 5787616 4.483g 29588 S   0.0 7.1 879:18.55 ceph-osd
2161 ceph      20   0 5755288 4.421g 29332 S   0.0 7.0 743:29.45 ceph-osd
2504 ceph      20   0 5862844 4.389g 29288 S   6.2 7.0 917:50.79 ceph-osd
2340 ceph      20   0 5804500 4.341g 29396 S   6.2 6.9 785:43.14 ceph-osd
2856 ceph      20   0 5796268 4.244g 29452 S   0.0 6.8 717:14.39 ceph-osd
1957 root      20   0 4232972 3.483g   6344 S   0.0 5.5 340:16.43 glusterfs

top - 06:37:57 up 9 days, 7:36, 1 user, load average: 0.53, 0.56, 0.58
Tasks: 417 total,   2 running, 415 sleeping,   0 stopped,   0 zombie
%Cpu(s): 1.5 us, 0.8 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 9377700 free, 53625548 used, 2823192 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 11153976 avail Mem

PID USER      PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
2340 ceph      20   0 5804500 4.536g 29396 S   6.9 7.2 820:04.66 ceph-osd
3361 ceph      20   0 5841036 4.489g 29416 S   6.3 7.2 843:55.98 ceph-osd
2161 ceph      20   0 5820824 4.475g 29332 S   5.9 7.1 776:23.41 ceph-osd
2504 ceph      20   0 5895612 4.458g 29288 S   8.2 7.1 954:37.42 ceph-osd
2245 ceph      20   0 5885920 4.375g 29588 S   6.6 7.0 916:32.48 ceph-osd
2856 ceph      20   0 5796268 4.283g 29452 S   6.2 6.8 750:19.23 ceph-osd
1957 root      20   0 4495116 3.758g   6344 S   2.7 6.0 366:16.19 glusterfs

top - 11:22:58 up 9 days, 12:21, 1 user, load average: 2.67, 2.69, 2.58
Tasks: 416 total,   3 running, 413 sleeping,   0 stopped,   0 zombie
%Cpu(s): 4.5 us, 2.5 sy, 0.0 ni, 91.9 id, 0.5 wa, 0.0 hi, 0.6 si, 0.0 st
KiB Mem : 65826440 total, 9206748 free, 53747900 used, 2871792 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 11026764 avail Mem

PID USER      PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
2504 ceph      20   0 5895612 4.600g 29288 S 28.6 7.3   1013:52 ceph-osd
2245 ceph      20   0 5918688 4.546g 29588 S 25.9 7.2 971:24.57 ceph-osd
2340 ceph      20   0 5804500 4.422g 29396 S 24.9 7.0 871:51.41 ceph-osd
2856 ceph      20   0 5796268 4.384g 29452 S 20.2 7.0 794:34.80 ceph-osd
2161 ceph      20   0 5820824 4.346g 29332 S 22.7 6.9 825:44.07 ceph-osd
3361 ceph      20   0 5939340 4.295g 29416 S 24.3 6.8 895:14.77 ceph-osd
1957 root      20   0 4626188 3.839g   6344 S   2.7 6.1 374:18.16 glusterfs

top - 15:07:59 up 9 days, 16:06, 1 user, load average: 1.32, 1.46, 1.58
Tasks: 412 total,   2 running, 410 sleeping,   0 stopped,   0 zombie
%Cpu(s): 3.6 us, 2.0 sy, 0.0 ni, 93.6 id, 0.4 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem : 65826440 total, 8921292 free, 53994352 used, 2910796 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 10775720 avail Mem

PID USER      PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
2161 ceph      20   0 5820824 4.508g 29332 S 17.6 7.2 871:03.41 ceph-osd
2504 ceph      20   0 5961148 4.470g 29288 S 21.5 7.1   1068:31 ceph-osd
2340 ceph      20   0 5870036 4.451g 29396 S 19.8 7.1 920:06.44 ceph-osd
2856 ceph      20   0 5960108 4.450g 29452 S 16.0 7.1 835:20.51 ceph-osd
2245 ceph      20   0 5951456 4.386g 29588 S 19.8 7.0   1022:26 ceph-osd
3361 ceph      20   0 5939340 4.367g 29416 S 18.2 7.0 942:55.13 ceph-osd
1957 root      20   0 4626188 3.907g   6344 S   3.0 6.2 381:07.24 glusterfs

Also the glusterfs uses more and more memory.

When is the maximum use per osd reached?

When is the maximum for glusterfs reached?

We are still using the default cache size of 3GB per osd.

If we increase the memory to 80Gig RAM is this then definitly enough for our configuration?

Thx for your help, Carsten

Hello,

the system has currently 64Gig RAM. But the memory goes less and less.

top - 14:37:52 up 8 days, 15:36, 1 user, load average: 0.27, 0.35, 0.41
Tasks: 407 total,   1 running, 406 sleeping,   0 stopped,   0 zombie
%Cpu(s): 1.4 us, 0.9 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 10335688 free, 52682724 used, 2808028 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 12097032 avail Mem

PID USER      PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
3361 ceph      20   0 5742732 4.490g 29416 S   0.0 7.2 811:15.15 ceph-osd
2245 ceph      20   0 5787616 4.483g 29588 S   0.0 7.1 879:18.55 ceph-osd
2161 ceph      20   0 5755288 4.421g 29332 S   0.0 7.0 743:29.45 ceph-osd
2504 ceph      20   0 5862844 4.389g 29288 S   6.2 7.0 917:50.79 ceph-osd
2340 ceph      20   0 5804500 4.341g 29396 S   6.2 6.9 785:43.14 ceph-osd
2856 ceph      20   0 5796268 4.244g 29452 S   0.0 6.8 717:14.39 ceph-osd
1957 root      20   0 4232972 3.483g   6344 S   0.0 5.5 340:16.43 glusterfs

top - 06:37:57 up 9 days, 7:36, 1 user, load average: 0.53, 0.56, 0.58
Tasks: 417 total,   2 running, 415 sleeping,   0 stopped,   0 zombie
%Cpu(s): 1.5 us, 0.8 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 9377700 free, 53625548 used, 2823192 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 11153976 avail Mem

PID USER      PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
2340 ceph      20   0 5804500 4.536g 29396 S   6.9 7.2 820:04.66 ceph-osd
3361 ceph      20   0 5841036 4.489g 29416 S   6.3 7.2 843:55.98 ceph-osd
2161 ceph      20   0 5820824 4.475g 29332 S   5.9 7.1 776:23.41 ceph-osd
2504 ceph      20   0 5895612 4.458g 29288 S   8.2 7.1 954:37.42 ceph-osd
2245 ceph      20   0 5885920 4.375g 29588 S   6.6 7.0 916:32.48 ceph-osd
2856 ceph      20   0 5796268 4.283g 29452 S   6.2 6.8 750:19.23 ceph-osd
1957 root      20   0 4495116 3.758g   6344 S   2.7 6.0 366:16.19 glusterfs

top - 11:22:58 up 9 days, 12:21, 1 user, load average: 2.67, 2.69, 2.58
Tasks: 416 total,   3 running, 413 sleeping,   0 stopped,   0 zombie
%Cpu(s): 4.5 us, 2.5 sy, 0.0 ni, 91.9 id, 0.5 wa, 0.0 hi, 0.6 si, 0.0 st
KiB Mem : 65826440 total, 9206748 free, 53747900 used, 2871792 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 11026764 avail Mem

PID USER      PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
2504 ceph      20   0 5895612 4.600g 29288 S 28.6 7.3   1013:52 ceph-osd
2245 ceph      20   0 5918688 4.546g 29588 S 25.9 7.2 971:24.57 ceph-osd
2340 ceph      20   0 5804500 4.422g 29396 S 24.9 7.0 871:51.41 ceph-osd
2856 ceph      20   0 5796268 4.384g 29452 S 20.2 7.0 794:34.80 ceph-osd
2161 ceph      20   0 5820824 4.346g 29332 S 22.7 6.9 825:44.07 ceph-osd
3361 ceph      20   0 5939340 4.295g 29416 S 24.3 6.8 895:14.77 ceph-osd
1957 root      20   0 4626188 3.839g   6344 S   2.7 6.1 374:18.16 glusterfs

top - 15:07:59 up 9 days, 16:06, 1 user, load average: 1.32, 1.46, 1.58
Tasks: 412 total,   2 running, 410 sleeping,   0 stopped,   0 zombie
%Cpu(s): 3.6 us, 2.0 sy, 0.0 ni, 93.6 id, 0.4 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem : 65826440 total, 8921292 free, 53994352 used, 2910796 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 10775720 avail Mem

PID USER      PR NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
2161 ceph      20   0 5820824 4.508g 29332 S 17.6 7.2 871:03.41 ceph-osd
2504 ceph      20   0 5961148 4.470g 29288 S 21.5 7.1   1068:31 ceph-osd
2340 ceph      20   0 5870036 4.451g 29396 S 19.8 7.1 920:06.44 ceph-osd
2856 ceph      20   0 5960108 4.450g 29452 S 16.0 7.1 835:20.51 ceph-osd
2245 ceph      20   0 5951456 4.386g 29588 S 19.8 7.0   1022:26 ceph-osd
3361 ceph      20   0 5939340 4.367g 29416 S 18.2 7.0 942:55.13 ceph-osd
1957 root      20   0 4626188 3.907g   6344 S   3.0 6.2 381:07.24 glusterfs

Also the glusterfs uses more and more memory.

When is the maximum use per osd reached?

When is the maximum for glusterfs reached?

We are still using the default cache size of 3GB per osd.

If we increase the memory to 80Gig RAM is this then definitly enough for our configuration?

Thx for your help, Carsten

#5

admin
2,930 Posts

April 2, 2019, 1:09 pm
Quote from admin on April 2, 2019, 1:09 pm
For gluster do a

umount /opt/petasan/config/shared

this will clear gluster memory, it is safe to unmount do as we automatically remount it, the gluster share is used for stats.

If the ceph daemons do not stabilize, do a

ceph daemon osd.X dump_mempools --cluster XX

to see where the memory is taken by the OSD, typically the OSD will take more than the cache assigned to it which you can lower as per prior post ( 3G for ssd). If you use compression or ec pools or the OSD is doing backfills, it could overshoot this by a large amount.

For gluster do a

umount /opt/petasan/config/shared

this will clear gluster memory, it is safe to unmount do as we automatically remount it, the gluster share is used for stats.

If the ceph daemons do not stabilize, do a

ceph daemon osd.X dump_mempools --cluster XX

to see where the memory is taken by the OSD, typically the OSD will take more than the cache assigned to it which you can lower as per prior post ( 3G for ssd). If you use compression or ec pools or the OSD is doing backfills, it could overshoot this by a large amount.

#6

netatvision
10 Posts

April 2, 2019, 2:14 pm
Quote from netatvision on April 2, 2019, 2:14 pm
Thx a lot.

It seem´s to be a problem with glusterfs on only one node. On all other nodes the glusterfs daemon uses ~500MB. On the failing host glusterfs has used at least 5Gig. I´ve unmounted the volume on the problematic host. I will monitor it and will send you an update the next days.

The osd daemons using 4,5 Gig RAM up to 4,9 Gig RAM. It looks like that there is no increase any more.

Question: Does it make sense to increase the physical RAM in case of a failure, that there are enough ressources not get trouble because of less memory?    And what is your recommendation 128Gig maybe?

Thx a lot, Carsten

Thx a lot.

It seem´s to be a problem with glusterfs on only one node. On all other nodes the glusterfs daemon uses ~500MB. On the failing host glusterfs has used at least 5Gig. I´ve unmounted the volume on the problematic host. I will monitor it and will send you an update the next days.

The osd daemons using 4,5 Gig RAM up to 4,9 Gig RAM. It looks like that there is no increase any more.

Question: Does it make sense to increase the physical RAM in case of a failure, that there are enough ressources not get trouble because of less memory?    And what is your recommendation 128Gig maybe?

Thx a lot, Carsten

#7

Post Reply: Memory leak

Cancel