More memory (RAM) better performance?
trexman
60 Posts
July 25, 2019, 7:37 amQuote from trexman on July 25, 2019, 7:37 amHi,
we are planning to build a new PetaSAN and/or maybe improve the actual running one.
The actual running nodes have 8 SSDs (is going to be 10 or 12 in the next month).
At the moment every node have 64GB RAM, which is depending of your hardware requirement document enough.
4GB RAM per OSD (8x4=32 GB)
+
16GB RAM iSCSI
+
2GB RAM Management and Monitoring
=
50GB RAM --> OK
But with the planned extension to 12 OSD per node...
I checked the memory statistics of every node.
Node1: 75% RAM util.
Node2: 50% RAM util.
Node3: 50% RAM util.
The summary of all the calculation and statistics is that we need more. The question is: More RAM more performance like the slogan "the more, the better"?
So we could easy twice the RAM of every node. This would be enough also if we use every drive slot.
Or would it increase the performance if we build every node to 192 or 256GB?
As I read in the ceph docs and learned from ZFS: more is better. But can PetaSAN use it relay?
Thanks
Trexman
Hi,
we are planning to build a new PetaSAN and/or maybe improve the actual running one.
The actual running nodes have 8 SSDs (is going to be 10 or 12 in the next month).
At the moment every node have 64GB RAM, which is depending of your hardware requirement document enough.
4GB RAM per OSD (8x4=32 GB)
+
16GB RAM iSCSI
+
2GB RAM Management and Monitoring
=
50GB RAM --> OK
But with the planned extension to 12 OSD per node...
I checked the memory statistics of every node.
Node1: 75% RAM util.
Node2: 50% RAM util.
Node3: 50% RAM util.
The summary of all the calculation and statistics is that we need more. The question is: More RAM more performance like the slogan "the more, the better"?
So we could easy twice the RAM of every node. This would be enough also if we use every drive slot.
Or would it increase the performance if we build every node to 192 or 256GB?
As I read in the ceph docs and learned from ZFS: more is better. But can PetaSAN use it relay?
Thanks
Trexman
admin
2,930 Posts
July 25, 2019, 9:14 amQuote from admin on July 25, 2019, 9:14 amThe memory is controlled by osd_memory_target config parameter which we set to 4GB ( unless you choose low end profile when building cluster, it is set to 1 GB, or if you had built your cluster long time ago we upgrade to 2GB ), 4 GB is the default value in Ceph for some time now.
This memory is used as cache for caching data as well as caching metadata, there are config value to control this ratio but the defaults are good. Caching is required with Bluestore since it cannot rely on file system cache as did Filestore.
caching of data will improve read performance., of course it depend on your workload pattern, but if is not completely random and does access frequently used objects, then increasing the cache will increase your hit ration to read from RAM rather than disk.
caching of metada will improve both read and write, metada lookup is used for every io to lookup metadata (where object is stored, crc ..) from the rocksdb database. A cache hit will prevent a small size read io hence improve latency in most setups. In other setups where you have a flash journal/db and you have slower hdds for data, a write op to the hdds will take much longer than the latency saved from metada lookup so it will not affect in this case.
so it depends on your workload and setup. If you have the cash/cache, you can shoot for 8 GB, some drive vendors who benchmarks disks can use 32 GB which is too much for real cases. If you do have the cash, it probably is better to invest in enterprise class ssds, 40 GB network, more power cpus.
The memory is controlled by osd_memory_target config parameter which we set to 4GB ( unless you choose low end profile when building cluster, it is set to 1 GB, or if you had built your cluster long time ago we upgrade to 2GB ), 4 GB is the default value in Ceph for some time now.
This memory is used as cache for caching data as well as caching metadata, there are config value to control this ratio but the defaults are good. Caching is required with Bluestore since it cannot rely on file system cache as did Filestore.
caching of data will improve read performance., of course it depend on your workload pattern, but if is not completely random and does access frequently used objects, then increasing the cache will increase your hit ration to read from RAM rather than disk.
caching of metada will improve both read and write, metada lookup is used for every io to lookup metadata (where object is stored, crc ..) from the rocksdb database. A cache hit will prevent a small size read io hence improve latency in most setups. In other setups where you have a flash journal/db and you have slower hdds for data, a write op to the hdds will take much longer than the latency saved from metada lookup so it will not affect in this case.
so it depends on your workload and setup. If you have the cash/cache, you can shoot for 8 GB, some drive vendors who benchmarks disks can use 32 GB which is too much for real cases. If you do have the cash, it probably is better to invest in enterprise class ssds, 40 GB network, more power cpus.
Last edited on July 25, 2019, 9:19 am by admin · #2
trexman
60 Posts
July 30, 2019, 2:34 pmQuote from trexman on July 30, 2019, 2:34 pmThanks for your detailed answer!
I see that putting money into the "cheaper" solution and buy a lot or memory just moves the bottleneck to a different part like CPU, network etc.
But i appreciate that you clarify this!
About the osd_memory_target option:
On our cluster it is set to 2GB, so I'm going to switch this to 4GB. Would 6GB also be a possible value or do i have to use a "power of two" value?
I guess, I can play a little bit with this value to see if there is any noticeable impact.
I read in a different thread, that you have to change this in the config file on every node.
Is there no global way to push this new option value to all nodes?
Because you explained the caching of data and caching of metadata. PetaSAN handle the RAM usage for these 2 points or do I have to/can change the RAM usage for caching of data or metadata?
Last question:
As I mention before, one node is using 25% more RAM than the other 2.
The top 1 process on this node is this one (the other don't have this on top 1):
/usr/sbin/glusterfs --volfile-server=172.30.11.101 --volfile-server=172.30.11.102 --volfile-id=gfs-vol /opt/petasan/config/shared
Why is the usage of this process on one node so high? Is this always on the first node or the same node? (than I would provide more RAM for this node)
Thanks
Trexman
Thanks for your detailed answer!
I see that putting money into the "cheaper" solution and buy a lot or memory just moves the bottleneck to a different part like CPU, network etc.
But i appreciate that you clarify this!
About the osd_memory_target option:
On our cluster it is set to 2GB, so I'm going to switch this to 4GB. Would 6GB also be a possible value or do i have to use a "power of two" value?
I guess, I can play a little bit with this value to see if there is any noticeable impact.
I read in a different thread, that you have to change this in the config file on every node.
Is there no global way to push this new option value to all nodes?
Because you explained the caching of data and caching of metadata. PetaSAN handle the RAM usage for these 2 points or do I have to/can change the RAM usage for caching of data or metadata?
Last question:
As I mention before, one node is using 25% more RAM than the other 2.
The top 1 process on this node is this one (the other don't have this on top 1):
/usr/sbin/glusterfs --volfile-server=172.30.11.101 --volfile-server=172.30.11.102 --volfile-id=gfs-vol /opt/petasan/config/shared
Why is the usage of this process on one node so high? Is this always on the first node or the same node? (than I would provide more RAM for this node)
Thanks
Trexman
admin
2,930 Posts
July 30, 2019, 5:09 pmQuote from admin on July 30, 2019, 5:09 pmHi there,
You can use 6 GB RAM,does not have to be 8.
You can use injectargs to set the parameter at run time but you still need to change the config file on all nodes to make make changes persistent. PetaSAN 2.3.1 (which will be released soon) will include latest Ceph Nautilius whuch supports a distributed configuration system.
You do not need to configure the ratio of data/metedata, using the default should be good for the vast majority of cases.
One of the first 3 management nodes will run extra services such as statistic collection + notifications, so it could use more memory. Just monitor the ram usage is not increasing with time by the gluster system.
Hi there,
You can use 6 GB RAM,does not have to be 8.
You can use injectargs to set the parameter at run time but you still need to change the config file on all nodes to make make changes persistent. PetaSAN 2.3.1 (which will be released soon) will include latest Ceph Nautilius whuch supports a distributed configuration system.
You do not need to configure the ratio of data/metedata, using the default should be good for the vast majority of cases.
One of the first 3 management nodes will run extra services such as statistic collection + notifications, so it could use more memory. Just monitor the ram usage is not increasing with time by the gluster system.
trexman
60 Posts
July 31, 2019, 10:10 amQuote from trexman on July 31, 2019, 10:10 amOK sound good. I'm looking forward to the new version 🙂
About the node with the extra services. Can I control with one it is? If yes, how?
About the RAM usage monitoring. So it increased over the last month by over 25% (of 64 GB total)
Is this what I have to be concerned about it? All other nodes don't show such memory raise.
OK sound good. I'm looking forward to the new version 🙂
About the node with the extra services. Can I control with one it is? If yes, how?
About the RAM usage monitoring. So it increased over the last month by over 25% (of 64 GB total)
Is this what I have to be concerned about it? All other nodes don't show such memory raise.
Last edited on July 31, 2019, 10:13 am by trexman · #5
admin
2,930 Posts
July 31, 2019, 1:16 pmQuote from admin on July 31, 2019, 1:16 pmtry to check which process is taking memory via atop -m
if it is the gluster, do a umount /opt/petasan/config/shared
which restarts the gluster client and clears its cache, if this fixes it then you can do this in a cron job ( version 2.3.0 should have this already )
No you cannot control which management node is running the stats services, it is determined by consul and is dynamic to achieve high availability.
try to check which process is taking memory via atop -m
if it is the gluster, do a umount /opt/petasan/config/shared
which restarts the gluster client and clears its cache, if this fixes it then you can do this in a cron job ( version 2.3.0 should have this already )
No you cannot control which management node is running the stats services, it is determined by consul and is dynamic to achieve high availability.
trexman
60 Posts
August 1, 2019, 8:05 amQuote from trexman on August 1, 2019, 8:05 amOK the memory usage is now OK after the umount:
There is no need of a remount right?
We are using PetaSAN 2.3.0 but i can't find a cron job. Where can I check this?
OK the memory usage is now OK after the umount:
There is no need of a remount right?
We are using PetaSAN 2.3.0 but i can't find a cron job. Where can I check this?
admin
2,930 Posts
August 1, 2019, 9:19 amQuote from admin on August 1, 2019, 9:19 amyep, the gluster client component that comes with ubuntu 16 can leak memory in some cases over time.
no need to re-mount, it will be automatically done. the stats will freeze for 30 sec or so, other that this there is no impact.
it is good to run the unmount periodically. you can put it yourself in a cron job. PetaSAN version 2.3.0 should have this run daily in /etc/cron.daily/cron-1d.py but maybe it is not running due to a bug,
yep, the gluster client component that comes with ubuntu 16 can leak memory in some cases over time.
no need to re-mount, it will be automatically done. the stats will freeze for 30 sec or so, other that this there is no impact.
it is good to run the unmount periodically. you can put it yourself in a cron job. PetaSAN version 2.3.0 should have this run daily in /etc/cron.daily/cron-1d.py but maybe it is not running due to a bug,
More memory (RAM) better performance?
trexman
60 Posts
Quote from trexman on July 25, 2019, 7:37 amHi,
we are planning to build a new PetaSAN and/or maybe improve the actual running one.
The actual running nodes have 8 SSDs (is going to be 10 or 12 in the next month).
At the moment every node have 64GB RAM, which is depending of your hardware requirement document enough.4GB RAM per OSD (8x4=32 GB)
+
16GB RAM iSCSI
+
2GB RAM Management and Monitoring
=
50GB RAM --> OKBut with the planned extension to 12 OSD per node...
I checked the memory statistics of every node.
Node1: 75% RAM util.
Node2: 50% RAM util.
Node3: 50% RAM util.The summary of all the calculation and statistics is that we need more. The question is: More RAM more performance like the slogan "the more, the better"?
So we could easy twice the RAM of every node. This would be enough also if we use every drive slot.
Or would it increase the performance if we build every node to 192 or 256GB?As I read in the ceph docs and learned from ZFS: more is better. But can PetaSAN use it relay?
Thanks
Trexman
Hi,
we are planning to build a new PetaSAN and/or maybe improve the actual running one.
The actual running nodes have 8 SSDs (is going to be 10 or 12 in the next month).
At the moment every node have 64GB RAM, which is depending of your hardware requirement document enough.
4GB RAM per OSD (8x4=32 GB)
+
16GB RAM iSCSI
+
2GB RAM Management and Monitoring
=
50GB RAM --> OKBut with the planned extension to 12 OSD per node...
I checked the memory statistics of every node.
Node1: 75% RAM util.
Node2: 50% RAM util.
Node3: 50% RAM util.
The summary of all the calculation and statistics is that we need more. The question is: More RAM more performance like the slogan "the more, the better"?
So we could easy twice the RAM of every node. This would be enough also if we use every drive slot.
Or would it increase the performance if we build every node to 192 or 256GB?
As I read in the ceph docs and learned from ZFS: more is better. But can PetaSAN use it relay?
Thanks
Trexman
admin
2,930 Posts
Quote from admin on July 25, 2019, 9:14 amThe memory is controlled by osd_memory_target config parameter which we set to 4GB ( unless you choose low end profile when building cluster, it is set to 1 GB, or if you had built your cluster long time ago we upgrade to 2GB ), 4 GB is the default value in Ceph for some time now.
This memory is used as cache for caching data as well as caching metadata, there are config value to control this ratio but the defaults are good. Caching is required with Bluestore since it cannot rely on file system cache as did Filestore.
caching of data will improve read performance., of course it depend on your workload pattern, but if is not completely random and does access frequently used objects, then increasing the cache will increase your hit ration to read from RAM rather than disk.
caching of metada will improve both read and write, metada lookup is used for every io to lookup metadata (where object is stored, crc ..) from the rocksdb database. A cache hit will prevent a small size read io hence improve latency in most setups. In other setups where you have a flash journal/db and you have slower hdds for data, a write op to the hdds will take much longer than the latency saved from metada lookup so it will not affect in this case.
so it depends on your workload and setup. If you have the cash/cache, you can shoot for 8 GB, some drive vendors who benchmarks disks can use 32 GB which is too much for real cases. If you do have the cash, it probably is better to invest in enterprise class ssds, 40 GB network, more power cpus.
The memory is controlled by osd_memory_target config parameter which we set to 4GB ( unless you choose low end profile when building cluster, it is set to 1 GB, or if you had built your cluster long time ago we upgrade to 2GB ), 4 GB is the default value in Ceph for some time now.
This memory is used as cache for caching data as well as caching metadata, there are config value to control this ratio but the defaults are good. Caching is required with Bluestore since it cannot rely on file system cache as did Filestore.
caching of data will improve read performance., of course it depend on your workload pattern, but if is not completely random and does access frequently used objects, then increasing the cache will increase your hit ration to read from RAM rather than disk.
caching of metada will improve both read and write, metada lookup is used for every io to lookup metadata (where object is stored, crc ..) from the rocksdb database. A cache hit will prevent a small size read io hence improve latency in most setups. In other setups where you have a flash journal/db and you have slower hdds for data, a write op to the hdds will take much longer than the latency saved from metada lookup so it will not affect in this case.
so it depends on your workload and setup. If you have the cash/cache, you can shoot for 8 GB, some drive vendors who benchmarks disks can use 32 GB which is too much for real cases. If you do have the cash, it probably is better to invest in enterprise class ssds, 40 GB network, more power cpus.
trexman
60 Posts
Quote from trexman on July 30, 2019, 2:34 pmThanks for your detailed answer!
I see that putting money into the "cheaper" solution and buy a lot or memory just moves the bottleneck to a different part like CPU, network etc.
But i appreciate that you clarify this!About the osd_memory_target option:
On our cluster it is set to 2GB, so I'm going to switch this to 4GB. Would 6GB also be a possible value or do i have to use a "power of two" value?
I guess, I can play a little bit with this value to see if there is any noticeable impact.I read in a different thread, that you have to change this in the config file on every node.
Is there no global way to push this new option value to all nodes?Because you explained the caching of data and caching of metadata. PetaSAN handle the RAM usage for these 2 points or do I have to/can change the RAM usage for caching of data or metadata?
Last question:
As I mention before, one node is using 25% more RAM than the other 2.
The top 1 process on this node is this one (the other don't have this on top 1):/usr/sbin/glusterfs --volfile-server=172.30.11.101 --volfile-server=172.30.11.102 --volfile-id=gfs-vol /opt/petasan/config/shared
Why is the usage of this process on one node so high? Is this always on the first node or the same node? (than I would provide more RAM for this node)
Thanks
Trexman
Thanks for your detailed answer!
I see that putting money into the "cheaper" solution and buy a lot or memory just moves the bottleneck to a different part like CPU, network etc.
But i appreciate that you clarify this!
About the osd_memory_target option:
On our cluster it is set to 2GB, so I'm going to switch this to 4GB. Would 6GB also be a possible value or do i have to use a "power of two" value?
I guess, I can play a little bit with this value to see if there is any noticeable impact.
I read in a different thread, that you have to change this in the config file on every node.
Is there no global way to push this new option value to all nodes?
Because you explained the caching of data and caching of metadata. PetaSAN handle the RAM usage for these 2 points or do I have to/can change the RAM usage for caching of data or metadata?
Last question:
As I mention before, one node is using 25% more RAM than the other 2.
The top 1 process on this node is this one (the other don't have this on top 1):
/usr/sbin/glusterfs --volfile-server=172.30.11.101 --volfile-server=172.30.11.102 --volfile-id=gfs-vol /opt/petasan/config/shared
Why is the usage of this process on one node so high? Is this always on the first node or the same node? (than I would provide more RAM for this node)
Thanks
Trexman
admin
2,930 Posts
Quote from admin on July 30, 2019, 5:09 pmHi there,
You can use 6 GB RAM,does not have to be 8.
You can use injectargs to set the parameter at run time but you still need to change the config file on all nodes to make make changes persistent. PetaSAN 2.3.1 (which will be released soon) will include latest Ceph Nautilius whuch supports a distributed configuration system.
You do not need to configure the ratio of data/metedata, using the default should be good for the vast majority of cases.
One of the first 3 management nodes will run extra services such as statistic collection + notifications, so it could use more memory. Just monitor the ram usage is not increasing with time by the gluster system.
Hi there,
You can use 6 GB RAM,does not have to be 8.
You can use injectargs to set the parameter at run time but you still need to change the config file on all nodes to make make changes persistent. PetaSAN 2.3.1 (which will be released soon) will include latest Ceph Nautilius whuch supports a distributed configuration system.
You do not need to configure the ratio of data/metedata, using the default should be good for the vast majority of cases.
One of the first 3 management nodes will run extra services such as statistic collection + notifications, so it could use more memory. Just monitor the ram usage is not increasing with time by the gluster system.
trexman
60 Posts
Quote from trexman on July 31, 2019, 10:10 amOK sound good. I'm looking forward to the new version 🙂
About the node with the extra services. Can I control with one it is? If yes, how?
About the RAM usage monitoring. So it increased over the last month by over 25% (of 64 GB total)
Is this what I have to be concerned about it? All other nodes don't show such memory raise.
OK sound good. I'm looking forward to the new version 🙂
About the node with the extra services. Can I control with one it is? If yes, how?
About the RAM usage monitoring. So it increased over the last month by over 25% (of 64 GB total)
Is this what I have to be concerned about it? All other nodes don't show such memory raise.
admin
2,930 Posts
Quote from admin on July 31, 2019, 1:16 pmtry to check which process is taking memory via atop -m
if it is the gluster, do a umount /opt/petasan/config/shared
which restarts the gluster client and clears its cache, if this fixes it then you can do this in a cron job ( version 2.3.0 should have this already )
No you cannot control which management node is running the stats services, it is determined by consul and is dynamic to achieve high availability.
try to check which process is taking memory via atop -m
if it is the gluster, do a umount /opt/petasan/config/shared
which restarts the gluster client and clears its cache, if this fixes it then you can do this in a cron job ( version 2.3.0 should have this already )
No you cannot control which management node is running the stats services, it is determined by consul and is dynamic to achieve high availability.
trexman
60 Posts
Quote from trexman on August 1, 2019, 8:05 amOK the memory usage is now OK after the umount:
There is no need of a remount right?
We are using PetaSAN 2.3.0 but i can't find a cron job. Where can I check this?
OK the memory usage is now OK after the umount:
There is no need of a remount right?
We are using PetaSAN 2.3.0 but i can't find a cron job. Where can I check this?
admin
2,930 Posts
Quote from admin on August 1, 2019, 9:19 amyep, the gluster client component that comes with ubuntu 16 can leak memory in some cases over time.
no need to re-mount, it will be automatically done. the stats will freeze for 30 sec or so, other that this there is no impact.
it is good to run the unmount periodically. you can put it yourself in a cron job. PetaSAN version 2.3.0 should have this run daily in /etc/cron.daily/cron-1d.py but maybe it is not running due to a bug,
yep, the gluster client component that comes with ubuntu 16 can leak memory in some cases over time.
no need to re-mount, it will be automatically done. the stats will freeze for 30 sec or so, other that this there is no impact.
it is good to run the unmount periodically. you can put it yourself in a cron job. PetaSAN version 2.3.0 should have this run daily in /etc/cron.daily/cron-1d.py but maybe it is not running due to a bug,