Memory leak
netatvision
10 Posts
March 30, 2019, 4:51 pmQuote from netatvision on March 30, 2019, 4:51 pmHello Hatem,
it seems, that there is a memory leak. we have four HPG9 servers, each with 64gig, 6 ssd osds and 2 2*10ten gig nics. The free memory is decreasing every day. As we started with PetaSAN, we´ve had 32gig and after a few days some osds where crashing and we could see, that it was a problem of to less memory. We increased the memory to 64gig, we are now scared about loose every day mb´s of free ram and the osd crash will start again.
Regards, Carsten
Hello Hatem,
it seems, that there is a memory leak. we have four HPG9 servers, each with 64gig, 6 ssd osds and 2 2*10ten gig nics. The free memory is decreasing every day. As we started with PetaSAN, we´ve had 32gig and after a few days some osds where crashing and we could see, that it was a problem of to less memory. We increased the memory to 64gig, we are now scared about loose every day mb´s of free ram and the osd crash will start again.
Regards, Carsten
Last edited on March 30, 2019, 5:26 pm by netatvision · #1
admin
2,930 Posts
March 30, 2019, 5:47 pmQuote from admin on March 30, 2019, 5:47 pmsee what processes are taking memory
atop -m
32GB is on the low side, if you find your osds are talking too much memory, lower it via
bluestore_cache_size_ssd = 1073741824
in your conf file ( default is 3G ) and restart the OSDs
see what processes are taking memory
atop -m
32GB is on the low side, if you find your osds are talking too much memory, lower it via
bluestore_cache_size_ssd = 1073741824
in your conf file ( default is 3G ) and restart the OSDs
netatvision
10 Posts
March 30, 2019, 6:18 pmQuote from netatvision on March 30, 2019, 6:18 pm
There is a screenshot and a free.log from the past days. It seems that every osd is taking about 5.5Gig. In the moment it´s still enough free memory.
There is a screenshot and a free.log from the past days. It seems that every osd is taking about 5.5Gig. In the moment it´s still enough free memory.
admin
2,930 Posts
March 30, 2019, 6:43 pmQuote from admin on March 30, 2019, 6:43 pmIf it stops increasing you are ok.
there has been recent changes to osd caching to improve performance by increasing the cache size, specially for ssds, in 2.2 it is 3 G, in 2.3 it is 4 G for cache size aside from the actual osd memory. it should be reflected in the docs which state 2 G.
you can adjust it via the conf value posted earlier,
If it stops increasing you are ok.
there has been recent changes to osd caching to improve performance by increasing the cache size, specially for ssds, in 2.2 it is 3 G, in 2.3 it is 4 G for cache size aside from the actual osd memory. it should be reflected in the docs which state 2 G.
you can adjust it via the conf value posted earlier,
Last edited on March 30, 2019, 6:45 pm by admin · #4
netatvision
10 Posts
April 1, 2019, 1:21 pmQuote from netatvision on April 1, 2019, 1:21 pmHello,
the system has currently 64Gig RAM. But the memory goes less and less.
top - 14:37:52 up 8 days, 15:36, 1 user, load average: 0.27, 0.35, 0.41
Tasks: 407 total, 1 running, 406 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.4 us, 0.9 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 10335688 free, 52682724 used, 2808028 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 12097032 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3361 ceph 20 0 5742732 4.490g 29416 S 0.0 7.2 811:15.15 ceph-osd
2245 ceph 20 0 5787616 4.483g 29588 S 0.0 7.1 879:18.55 ceph-osd
2161 ceph 20 0 5755288 4.421g 29332 S 0.0 7.0 743:29.45 ceph-osd
2504 ceph 20 0 5862844 4.389g 29288 S 6.2 7.0 917:50.79 ceph-osd
2340 ceph 20 0 5804500 4.341g 29396 S 6.2 6.9 785:43.14 ceph-osd
2856 ceph 20 0 5796268 4.244g 29452 S 0.0 6.8 717:14.39 ceph-osd
1957 root 20 0 4232972 3.483g 6344 S 0.0 5.5 340:16.43 glusterfs
top - 06:37:57 up 9 days, 7:36, 1 user, load average: 0.53, 0.56, 0.58
Tasks: 417 total, 2 running, 415 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 0.8 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 9377700 free, 53625548 used, 2823192 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11153976 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2340 ceph 20 0 5804500 4.536g 29396 S 6.9 7.2 820:04.66 ceph-osd
3361 ceph 20 0 5841036 4.489g 29416 S 6.3 7.2 843:55.98 ceph-osd
2161 ceph 20 0 5820824 4.475g 29332 S 5.9 7.1 776:23.41 ceph-osd
2504 ceph 20 0 5895612 4.458g 29288 S 8.2 7.1 954:37.42 ceph-osd
2245 ceph 20 0 5885920 4.375g 29588 S 6.6 7.0 916:32.48 ceph-osd
2856 ceph 20 0 5796268 4.283g 29452 S 6.2 6.8 750:19.23 ceph-osd
1957 root 20 0 4495116 3.758g 6344 S 2.7 6.0 366:16.19 glusterfs
top - 11:22:58 up 9 days, 12:21, 1 user, load average: 2.67, 2.69, 2.58
Tasks: 416 total, 3 running, 413 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.5 us, 2.5 sy, 0.0 ni, 91.9 id, 0.5 wa, 0.0 hi, 0.6 si, 0.0 st
KiB Mem : 65826440 total, 9206748 free, 53747900 used, 2871792 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11026764 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2504 ceph 20 0 5895612 4.600g 29288 S 28.6 7.3 1013:52 ceph-osd
2245 ceph 20 0 5918688 4.546g 29588 S 25.9 7.2 971:24.57 ceph-osd
2340 ceph 20 0 5804500 4.422g 29396 S 24.9 7.0 871:51.41 ceph-osd
2856 ceph 20 0 5796268 4.384g 29452 S 20.2 7.0 794:34.80 ceph-osd
2161 ceph 20 0 5820824 4.346g 29332 S 22.7 6.9 825:44.07 ceph-osd
3361 ceph 20 0 5939340 4.295g 29416 S 24.3 6.8 895:14.77 ceph-osd
1957 root 20 0 4626188 3.839g 6344 S 2.7 6.1 374:18.16 glusterfs
top - 15:07:59 up 9 days, 16:06, 1 user, load average: 1.32, 1.46, 1.58
Tasks: 412 total, 2 running, 410 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.6 us, 2.0 sy, 0.0 ni, 93.6 id, 0.4 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem : 65826440 total, 8921292 free, 53994352 used, 2910796 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 10775720 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2161 ceph 20 0 5820824 4.508g 29332 S 17.6 7.2 871:03.41 ceph-osd
2504 ceph 20 0 5961148 4.470g 29288 S 21.5 7.1 1068:31 ceph-osd
2340 ceph 20 0 5870036 4.451g 29396 S 19.8 7.1 920:06.44 ceph-osd
2856 ceph 20 0 5960108 4.450g 29452 S 16.0 7.1 835:20.51 ceph-osd
2245 ceph 20 0 5951456 4.386g 29588 S 19.8 7.0 1022:26 ceph-osd
3361 ceph 20 0 5939340 4.367g 29416 S 18.2 7.0 942:55.13 ceph-osd
1957 root 20 0 4626188 3.907g 6344 S 3.0 6.2 381:07.24 glusterfs
Also the glusterfs uses more and more memory.
When is the maximum use per osd reached?
When is the maximum for glusterfs reached?
We are still using the default cache size of 3GB per osd.
If we increase the memory to 80Gig RAM is this then definitly enough for our configuration?
Thx for your help, Carsten
Hello,
the system has currently 64Gig RAM. But the memory goes less and less.
top - 14:37:52 up 8 days, 15:36, 1 user, load average: 0.27, 0.35, 0.41
Tasks: 407 total, 1 running, 406 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.4 us, 0.9 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 10335688 free, 52682724 used, 2808028 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 12097032 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3361 ceph 20 0 5742732 4.490g 29416 S 0.0 7.2 811:15.15 ceph-osd
2245 ceph 20 0 5787616 4.483g 29588 S 0.0 7.1 879:18.55 ceph-osd
2161 ceph 20 0 5755288 4.421g 29332 S 0.0 7.0 743:29.45 ceph-osd
2504 ceph 20 0 5862844 4.389g 29288 S 6.2 7.0 917:50.79 ceph-osd
2340 ceph 20 0 5804500 4.341g 29396 S 6.2 6.9 785:43.14 ceph-osd
2856 ceph 20 0 5796268 4.244g 29452 S 0.0 6.8 717:14.39 ceph-osd
1957 root 20 0 4232972 3.483g 6344 S 0.0 5.5 340:16.43 glusterfs
top - 06:37:57 up 9 days, 7:36, 1 user, load average: 0.53, 0.56, 0.58
Tasks: 417 total, 2 running, 415 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 0.8 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 9377700 free, 53625548 used, 2823192 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11153976 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2340 ceph 20 0 5804500 4.536g 29396 S 6.9 7.2 820:04.66 ceph-osd
3361 ceph 20 0 5841036 4.489g 29416 S 6.3 7.2 843:55.98 ceph-osd
2161 ceph 20 0 5820824 4.475g 29332 S 5.9 7.1 776:23.41 ceph-osd
2504 ceph 20 0 5895612 4.458g 29288 S 8.2 7.1 954:37.42 ceph-osd
2245 ceph 20 0 5885920 4.375g 29588 S 6.6 7.0 916:32.48 ceph-osd
2856 ceph 20 0 5796268 4.283g 29452 S 6.2 6.8 750:19.23 ceph-osd
1957 root 20 0 4495116 3.758g 6344 S 2.7 6.0 366:16.19 glusterfs
top - 11:22:58 up 9 days, 12:21, 1 user, load average: 2.67, 2.69, 2.58
Tasks: 416 total, 3 running, 413 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.5 us, 2.5 sy, 0.0 ni, 91.9 id, 0.5 wa, 0.0 hi, 0.6 si, 0.0 st
KiB Mem : 65826440 total, 9206748 free, 53747900 used, 2871792 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11026764 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2504 ceph 20 0 5895612 4.600g 29288 S 28.6 7.3 1013:52 ceph-osd
2245 ceph 20 0 5918688 4.546g 29588 S 25.9 7.2 971:24.57 ceph-osd
2340 ceph 20 0 5804500 4.422g 29396 S 24.9 7.0 871:51.41 ceph-osd
2856 ceph 20 0 5796268 4.384g 29452 S 20.2 7.0 794:34.80 ceph-osd
2161 ceph 20 0 5820824 4.346g 29332 S 22.7 6.9 825:44.07 ceph-osd
3361 ceph 20 0 5939340 4.295g 29416 S 24.3 6.8 895:14.77 ceph-osd
1957 root 20 0 4626188 3.839g 6344 S 2.7 6.1 374:18.16 glusterfs
top - 15:07:59 up 9 days, 16:06, 1 user, load average: 1.32, 1.46, 1.58
Tasks: 412 total, 2 running, 410 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.6 us, 2.0 sy, 0.0 ni, 93.6 id, 0.4 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem : 65826440 total, 8921292 free, 53994352 used, 2910796 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 10775720 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2161 ceph 20 0 5820824 4.508g 29332 S 17.6 7.2 871:03.41 ceph-osd
2504 ceph 20 0 5961148 4.470g 29288 S 21.5 7.1 1068:31 ceph-osd
2340 ceph 20 0 5870036 4.451g 29396 S 19.8 7.1 920:06.44 ceph-osd
2856 ceph 20 0 5960108 4.450g 29452 S 16.0 7.1 835:20.51 ceph-osd
2245 ceph 20 0 5951456 4.386g 29588 S 19.8 7.0 1022:26 ceph-osd
3361 ceph 20 0 5939340 4.367g 29416 S 18.2 7.0 942:55.13 ceph-osd
1957 root 20 0 4626188 3.907g 6344 S 3.0 6.2 381:07.24 glusterfs
Also the glusterfs uses more and more memory.
When is the maximum use per osd reached?
When is the maximum for glusterfs reached?
We are still using the default cache size of 3GB per osd.
If we increase the memory to 80Gig RAM is this then definitly enough for our configuration?
Thx for your help, Carsten
admin
2,930 Posts
April 2, 2019, 1:09 pmQuote from admin on April 2, 2019, 1:09 pmFor gluster do a
umount /opt/petasan/config/shared
this will clear gluster memory, it is safe to unmount do as we automatically remount it, the gluster share is used for stats.
If the ceph daemons do not stabilize, do a
ceph daemon osd.X dump_mempools --cluster XX
to see where the memory is taken by the OSD, typically the OSD will take more than the cache assigned to it which you can lower as per prior post ( 3G for ssd). If you use compression or ec pools or the OSD is doing backfills, it could overshoot this by a large amount.
For gluster do a
umount /opt/petasan/config/shared
this will clear gluster memory, it is safe to unmount do as we automatically remount it, the gluster share is used for stats.
If the ceph daemons do not stabilize, do a
ceph daemon osd.X dump_mempools --cluster XX
to see where the memory is taken by the OSD, typically the OSD will take more than the cache assigned to it which you can lower as per prior post ( 3G for ssd). If you use compression or ec pools or the OSD is doing backfills, it could overshoot this by a large amount.
netatvision
10 Posts
April 2, 2019, 2:14 pmQuote from netatvision on April 2, 2019, 2:14 pmThx a lot.
It seem´s to be a problem with glusterfs on only one node. On all other nodes the glusterfs daemon uses ~500MB. On the failing host glusterfs has used at least 5Gig. I´ve unmounted the volume on the problematic host. I will monitor it and will send you an update the next days.
The osd daemons using 4,5 Gig RAM up to 4,9 Gig RAM. It looks like that there is no increase any more.
Question: Does it make sense to increase the physical RAM in case of a failure, that there are enough ressources not get trouble because of less memory? And what is your recommendation 128Gig maybe?
Thx a lot, Carsten
Thx a lot.
It seem´s to be a problem with glusterfs on only one node. On all other nodes the glusterfs daemon uses ~500MB. On the failing host glusterfs has used at least 5Gig. I´ve unmounted the volume on the problematic host. I will monitor it and will send you an update the next days.
The osd daemons using 4,5 Gig RAM up to 4,9 Gig RAM. It looks like that there is no increase any more.
Question: Does it make sense to increase the physical RAM in case of a failure, that there are enough ressources not get trouble because of less memory? And what is your recommendation 128Gig maybe?
Thx a lot, Carsten
Memory leak
netatvision
10 Posts
Quote from netatvision on March 30, 2019, 4:51 pmHello Hatem,
it seems, that there is a memory leak. we have four HPG9 servers, each with 64gig, 6 ssd osds and 2 2*10ten gig nics. The free memory is decreasing every day. As we started with PetaSAN, we´ve had 32gig and after a few days some osds where crashing and we could see, that it was a problem of to less memory. We increased the memory to 64gig, we are now scared about loose every day mb´s of free ram and the osd crash will start again.
Regards, Carsten
Hello Hatem,
it seems, that there is a memory leak. we have four HPG9 servers, each with 64gig, 6 ssd osds and 2 2*10ten gig nics. The free memory is decreasing every day. As we started with PetaSAN, we´ve had 32gig and after a few days some osds where crashing and we could see, that it was a problem of to less memory. We increased the memory to 64gig, we are now scared about loose every day mb´s of free ram and the osd crash will start again.
Regards, Carsten
admin
2,930 Posts
Quote from admin on March 30, 2019, 5:47 pmsee what processes are taking memory
atop -m
32GB is on the low side, if you find your osds are talking too much memory, lower it via
bluestore_cache_size_ssd = 1073741824
in your conf file ( default is 3G ) and restart the OSDs
see what processes are taking memory
atop -m
32GB is on the low side, if you find your osds are talking too much memory, lower it via
bluestore_cache_size_ssd = 1073741824
in your conf file ( default is 3G ) and restart the OSDs
netatvision
10 Posts
Quote from netatvision on March 30, 2019, 6:18 pmThere is a screenshot and a free.log from the past days. It seems that every osd is taking about 5.5Gig. In the moment it´s still enough free memory.
There is a screenshot and a free.log from the past days. It seems that every osd is taking about 5.5Gig. In the moment it´s still enough free memory.
admin
2,930 Posts
Quote from admin on March 30, 2019, 6:43 pmIf it stops increasing you are ok.
there has been recent changes to osd caching to improve performance by increasing the cache size, specially for ssds, in 2.2 it is 3 G, in 2.3 it is 4 G for cache size aside from the actual osd memory. it should be reflected in the docs which state 2 G.
you can adjust it via the conf value posted earlier,
If it stops increasing you are ok.
there has been recent changes to osd caching to improve performance by increasing the cache size, specially for ssds, in 2.2 it is 3 G, in 2.3 it is 4 G for cache size aside from the actual osd memory. it should be reflected in the docs which state 2 G.
you can adjust it via the conf value posted earlier,
netatvision
10 Posts
Quote from netatvision on April 1, 2019, 1:21 pmHello,
the system has currently 64Gig RAM. But the memory goes less and less.
top - 14:37:52 up 8 days, 15:36, 1 user, load average: 0.27, 0.35, 0.41
Tasks: 407 total, 1 running, 406 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.4 us, 0.9 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 10335688 free, 52682724 used, 2808028 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 12097032 avail MemPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3361 ceph 20 0 5742732 4.490g 29416 S 0.0 7.2 811:15.15 ceph-osd
2245 ceph 20 0 5787616 4.483g 29588 S 0.0 7.1 879:18.55 ceph-osd
2161 ceph 20 0 5755288 4.421g 29332 S 0.0 7.0 743:29.45 ceph-osd
2504 ceph 20 0 5862844 4.389g 29288 S 6.2 7.0 917:50.79 ceph-osd
2340 ceph 20 0 5804500 4.341g 29396 S 6.2 6.9 785:43.14 ceph-osd
2856 ceph 20 0 5796268 4.244g 29452 S 0.0 6.8 717:14.39 ceph-osd
1957 root 20 0 4232972 3.483g 6344 S 0.0 5.5 340:16.43 glusterfstop - 06:37:57 up 9 days, 7:36, 1 user, load average: 0.53, 0.56, 0.58
Tasks: 417 total, 2 running, 415 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 0.8 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 9377700 free, 53625548 used, 2823192 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11153976 avail MemPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2340 ceph 20 0 5804500 4.536g 29396 S 6.9 7.2 820:04.66 ceph-osd
3361 ceph 20 0 5841036 4.489g 29416 S 6.3 7.2 843:55.98 ceph-osd
2161 ceph 20 0 5820824 4.475g 29332 S 5.9 7.1 776:23.41 ceph-osd
2504 ceph 20 0 5895612 4.458g 29288 S 8.2 7.1 954:37.42 ceph-osd
2245 ceph 20 0 5885920 4.375g 29588 S 6.6 7.0 916:32.48 ceph-osd
2856 ceph 20 0 5796268 4.283g 29452 S 6.2 6.8 750:19.23 ceph-osd
1957 root 20 0 4495116 3.758g 6344 S 2.7 6.0 366:16.19 glusterfstop - 11:22:58 up 9 days, 12:21, 1 user, load average: 2.67, 2.69, 2.58
Tasks: 416 total, 3 running, 413 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.5 us, 2.5 sy, 0.0 ni, 91.9 id, 0.5 wa, 0.0 hi, 0.6 si, 0.0 st
KiB Mem : 65826440 total, 9206748 free, 53747900 used, 2871792 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11026764 avail MemPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2504 ceph 20 0 5895612 4.600g 29288 S 28.6 7.3 1013:52 ceph-osd
2245 ceph 20 0 5918688 4.546g 29588 S 25.9 7.2 971:24.57 ceph-osd
2340 ceph 20 0 5804500 4.422g 29396 S 24.9 7.0 871:51.41 ceph-osd
2856 ceph 20 0 5796268 4.384g 29452 S 20.2 7.0 794:34.80 ceph-osd
2161 ceph 20 0 5820824 4.346g 29332 S 22.7 6.9 825:44.07 ceph-osd
3361 ceph 20 0 5939340 4.295g 29416 S 24.3 6.8 895:14.77 ceph-osd
1957 root 20 0 4626188 3.839g 6344 S 2.7 6.1 374:18.16 glusterfstop - 15:07:59 up 9 days, 16:06, 1 user, load average: 1.32, 1.46, 1.58
Tasks: 412 total, 2 running, 410 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.6 us, 2.0 sy, 0.0 ni, 93.6 id, 0.4 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem : 65826440 total, 8921292 free, 53994352 used, 2910796 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 10775720 avail MemPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2161 ceph 20 0 5820824 4.508g 29332 S 17.6 7.2 871:03.41 ceph-osd
2504 ceph 20 0 5961148 4.470g 29288 S 21.5 7.1 1068:31 ceph-osd
2340 ceph 20 0 5870036 4.451g 29396 S 19.8 7.1 920:06.44 ceph-osd
2856 ceph 20 0 5960108 4.450g 29452 S 16.0 7.1 835:20.51 ceph-osd
2245 ceph 20 0 5951456 4.386g 29588 S 19.8 7.0 1022:26 ceph-osd
3361 ceph 20 0 5939340 4.367g 29416 S 18.2 7.0 942:55.13 ceph-osd
1957 root 20 0 4626188 3.907g 6344 S 3.0 6.2 381:07.24 glusterfs
Also the glusterfs uses more and more memory.
When is the maximum use per osd reached?
When is the maximum for glusterfs reached?
We are still using the default cache size of 3GB per osd.
If we increase the memory to 80Gig RAM is this then definitly enough for our configuration?
Thx for your help, Carsten
Hello,
the system has currently 64Gig RAM. But the memory goes less and less.
top - 14:37:52 up 8 days, 15:36, 1 user, load average: 0.27, 0.35, 0.41
Tasks: 407 total, 1 running, 406 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.4 us, 0.9 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 10335688 free, 52682724 used, 2808028 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 12097032 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3361 ceph 20 0 5742732 4.490g 29416 S 0.0 7.2 811:15.15 ceph-osd
2245 ceph 20 0 5787616 4.483g 29588 S 0.0 7.1 879:18.55 ceph-osd
2161 ceph 20 0 5755288 4.421g 29332 S 0.0 7.0 743:29.45 ceph-osd
2504 ceph 20 0 5862844 4.389g 29288 S 6.2 7.0 917:50.79 ceph-osd
2340 ceph 20 0 5804500 4.341g 29396 S 6.2 6.9 785:43.14 ceph-osd
2856 ceph 20 0 5796268 4.244g 29452 S 0.0 6.8 717:14.39 ceph-osd
1957 root 20 0 4232972 3.483g 6344 S 0.0 5.5 340:16.43 glusterfs
top - 06:37:57 up 9 days, 7:36, 1 user, load average: 0.53, 0.56, 0.58
Tasks: 417 total, 2 running, 415 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 0.8 sy, 0.0 ni, 97.4 id, 0.2 wa, 0.0 hi, 0.2 si, 0.0 st
KiB Mem : 65826440 total, 9377700 free, 53625548 used, 2823192 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11153976 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2340 ceph 20 0 5804500 4.536g 29396 S 6.9 7.2 820:04.66 ceph-osd
3361 ceph 20 0 5841036 4.489g 29416 S 6.3 7.2 843:55.98 ceph-osd
2161 ceph 20 0 5820824 4.475g 29332 S 5.9 7.1 776:23.41 ceph-osd
2504 ceph 20 0 5895612 4.458g 29288 S 8.2 7.1 954:37.42 ceph-osd
2245 ceph 20 0 5885920 4.375g 29588 S 6.6 7.0 916:32.48 ceph-osd
2856 ceph 20 0 5796268 4.283g 29452 S 6.2 6.8 750:19.23 ceph-osd
1957 root 20 0 4495116 3.758g 6344 S 2.7 6.0 366:16.19 glusterfs
top - 11:22:58 up 9 days, 12:21, 1 user, load average: 2.67, 2.69, 2.58
Tasks: 416 total, 3 running, 413 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.5 us, 2.5 sy, 0.0 ni, 91.9 id, 0.5 wa, 0.0 hi, 0.6 si, 0.0 st
KiB Mem : 65826440 total, 9206748 free, 53747900 used, 2871792 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 11026764 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2504 ceph 20 0 5895612 4.600g 29288 S 28.6 7.3 1013:52 ceph-osd
2245 ceph 20 0 5918688 4.546g 29588 S 25.9 7.2 971:24.57 ceph-osd
2340 ceph 20 0 5804500 4.422g 29396 S 24.9 7.0 871:51.41 ceph-osd
2856 ceph 20 0 5796268 4.384g 29452 S 20.2 7.0 794:34.80 ceph-osd
2161 ceph 20 0 5820824 4.346g 29332 S 22.7 6.9 825:44.07 ceph-osd
3361 ceph 20 0 5939340 4.295g 29416 S 24.3 6.8 895:14.77 ceph-osd
1957 root 20 0 4626188 3.839g 6344 S 2.7 6.1 374:18.16 glusterfs
top - 15:07:59 up 9 days, 16:06, 1 user, load average: 1.32, 1.46, 1.58
Tasks: 412 total, 2 running, 410 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.6 us, 2.0 sy, 0.0 ni, 93.6 id, 0.4 wa, 0.0 hi, 0.5 si, 0.0 st
KiB Mem : 65826440 total, 8921292 free, 53994352 used, 2910796 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 10775720 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2161 ceph 20 0 5820824 4.508g 29332 S 17.6 7.2 871:03.41 ceph-osd
2504 ceph 20 0 5961148 4.470g 29288 S 21.5 7.1 1068:31 ceph-osd
2340 ceph 20 0 5870036 4.451g 29396 S 19.8 7.1 920:06.44 ceph-osd
2856 ceph 20 0 5960108 4.450g 29452 S 16.0 7.1 835:20.51 ceph-osd
2245 ceph 20 0 5951456 4.386g 29588 S 19.8 7.0 1022:26 ceph-osd
3361 ceph 20 0 5939340 4.367g 29416 S 18.2 7.0 942:55.13 ceph-osd
1957 root 20 0 4626188 3.907g 6344 S 3.0 6.2 381:07.24 glusterfs
Also the glusterfs uses more and more memory.
When is the maximum use per osd reached?
When is the maximum for glusterfs reached?
We are still using the default cache size of 3GB per osd.
If we increase the memory to 80Gig RAM is this then definitly enough for our configuration?
Thx for your help, Carsten
admin
2,930 Posts
Quote from admin on April 2, 2019, 1:09 pmFor gluster do a
umount /opt/petasan/config/shared
this will clear gluster memory, it is safe to unmount do as we automatically remount it, the gluster share is used for stats.
If the ceph daemons do not stabilize, do a
ceph daemon osd.X dump_mempools --cluster XX
to see where the memory is taken by the OSD, typically the OSD will take more than the cache assigned to it which you can lower as per prior post ( 3G for ssd). If you use compression or ec pools or the OSD is doing backfills, it could overshoot this by a large amount.
For gluster do a
umount /opt/petasan/config/shared
this will clear gluster memory, it is safe to unmount do as we automatically remount it, the gluster share is used for stats.
If the ceph daemons do not stabilize, do a
ceph daemon osd.X dump_mempools --cluster XX
to see where the memory is taken by the OSD, typically the OSD will take more than the cache assigned to it which you can lower as per prior post ( 3G for ssd). If you use compression or ec pools or the OSD is doing backfills, it could overshoot this by a large amount.
netatvision
10 Posts
Quote from netatvision on April 2, 2019, 2:14 pmThx a lot.
It seem´s to be a problem with glusterfs on only one node. On all other nodes the glusterfs daemon uses ~500MB. On the failing host glusterfs has used at least 5Gig. I´ve unmounted the volume on the problematic host. I will monitor it and will send you an update the next days.
The osd daemons using 4,5 Gig RAM up to 4,9 Gig RAM. It looks like that there is no increase any more.
Question: Does it make sense to increase the physical RAM in case of a failure, that there are enough ressources not get trouble because of less memory? And what is your recommendation 128Gig maybe?
Thx a lot, Carsten
Thx a lot.
It seem´s to be a problem with glusterfs on only one node. On all other nodes the glusterfs daemon uses ~500MB. On the failing host glusterfs has used at least 5Gig. I´ve unmounted the volume on the problematic host. I will monitor it and will send you an update the next days.
The osd daemons using 4,5 Gig RAM up to 4,9 Gig RAM. It looks like that there is no increase any more.
Question: Does it make sense to increase the physical RAM in case of a failure, that there are enough ressources not get trouble because of less memory? And what is your recommendation 128Gig maybe?
Thx a lot, Carsten