Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Caching using SSDs or NVRAM cards

Pages: 1 2

Before reading this I went with 4 SSDs per server totaling 4% of raw (remembering that size=2). But I'm not sure how the data on SSD is handled for the new cache method. I'm more used to NAS or object stores where it's about metadata and the calc is per file or object and when I now look back over previous work I see it's usually 2.5% of raw (size=3) for object or 2.5% of data for NAS. I have also used SSD caching layers at a pool level where again 2.5% of data seems to work well although many vendors recommend up to 10% to be safe.

Are those figures of 20-25% for SSD and 5-6% for NVMe as a percentage of raw capacity? but clearly performance-based since the ratio varies from SSD to NVMe? When it comes to performance there can be other variables like number of SSD devices making up the capacity etc.

Can anyone tell me more about how it's actually calculated for the new caching method?

Thanks, Jim

The ratios are for the number of physical drivers and not for storage capacity. So if you had 8 hdds, you'd need 2 SSDs for your write journals. They are based on  ratio of raw device speed/throughput.

For current Filestore, the typical size of the journal partition is about  5 -10 GB. Technically it should be large enough to store your highest expected device throughput x max sync time to flush from journal.

For the upcoming Bluestore, there are no journals, but instead there is rocksdb database for metadata and transactions that is recommended to be placed on SSD/NVME.  So far i have not seen a recommendation for its size, but it will depend on size of the data partition of hdd.

It's Bluestore that I'm interested in. I found this from Sage Weil at  http://events.linuxfoundation.org/sites/events/files/slides/20170323%20bluestore.pdf

This the bit I was looking for (page 16). It looks like Bluestore caching doesn't take a lot of SSD capacity.

  • A few GB of SSD
    • bluefs db.wal/ (rocksdb wal)
    • bluefs db/ (warm sst files)
  • Big device
    • bluefs db.slow/ (cold sst files)
    • object data blobs

 

Excellent link, thanks 🙂

Note you may be interested in page 46, in the future the plan is to add a block level tier on top of the data block device using dm-cache or bcache. This will be more in-line with the caching you had in mind, and will involve recommendations on ratios based on capacity. We hope to support this once supported in Ceph.

Pages: 1 2