Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Bluestore_Block_DB_Size fails back to default value 60 GB

Pages: 1 2

I'm running Petasan 2.6.2 and adding right now manually OSD drives to the cluster. Because the size of my Journal SSDs is a bit lower I changed to above value to 28 GB. Adding disks then with having the Journal on SSD went well up to a point where it suddenly changed back to 60 GB and then later I got the message that the Journal drive is full.

To be more precise: I already added 7 OSDs with 28 GB Journal partition on the SSD, paused for around 90 minutes and then started again with adding OSDs. In the logs I can see that the system suddenly takes 60 GB partition for Journal. But I can't see a reason for it.

Is this a known behaviour?

Regards

28GB is tool low, and will most probably result in only using 3GB on the partition and not using the rest, this is based on how rocksdb does its layers. You need at least 30GB + some mote space to be used during rocksdb database compaction, this is why we use 60GB.

Did you change the value via the ui under Ceph Configuration or did you change the conf file on all nodes ?

When you say you are adding OSD manually, you mean via the ui and not cli correct ?

what is the output of

parted /dev/sdX print
on journal device

Note for a SATA SSD journal to hdd OSD, you should use 4 or 5 partitions max.

 

I know that the size and amount of partition is not optimal but for a Test/Dev environment it works well.

I changed the value under the UI/Ceph Configuration. OSD were added via UI.

Here is the output:

root@petasan-04:~# parted /dev/sdv print
Model: SAMSUNG MZ6ER400HAGL/0K0 (scsi)
Disk /dev/sdv: 400GB
Sector size (logical/physical): 512B/4096B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
1 1049kB 30.1GB 30.1GB ceph-journal-db
2 30.1GB 60.1GB 30.1GB ceph-journal-db
3 60.1GB 90.2GB 30.1GB ceph-journal-db
4 90.2GB 120GB 30.1GB ceph-journal-db
5 120GB 150GB 30.1GB ceph-journal-db
6 150GB 180GB 30.1GB ceph-journal-db
7 180GB 210GB 30.1GB ceph-journal-db
8 210GB 241GB 30.1GB ceph-journal-db
9 241GB 271GB 30.1GB ceph-journal-db
10 271GB 301GB 30.1GB ceph-journal-db
11 301GB 331GB 30.1GB ceph-journal-db
12 331GB 361GB 30.1GB ceph-journal-db
13 361GB 391GB 30.1GB ceph-journal-db

The partitions are the correct size you specified. However there are 13 of them rather than 7, did you try add 13 times ? were the OSDs added successfully each time ?

In the logs I can see that the system suddenly takes 60 GB partition for Journal.

did you see this log after the journal was already filled as an error or do you see this log while you are adding each 30G partition ?

I saw that phenomena a couple of times. Especially after a reboot of a node the bluestore_block_db_size value went back to 60 GB. This time it was during adding the OSDs without a reboot between. I created 7 OSDs, paused for 90 minutes and created two more. Those two got 60 GB partitions.

I deleted the two partitions on both HDD and SSD, changed the value back to 28 GB in UI and created the next OSD. It's a 1:13 relationship. Just for Test/Dev, so no performance expectations. All Ceph changes done via UI.

On the one hand I saw the error coming up when the Journal was filled. On the other because I know that there are some problems I check from time to time the Journal to see what's going on.

I had that problem twice today. But I also saw the problem in former times when I re-installed the cluster or added another node.

Things are a bit more clear than original post, but still not totally clear. So let me state some points that may help.

It is possible that if the cluster is not healthy, the custom partition size lookup will fail being read from the monitor database, in this case we fail back to using the default 60G.It is quite possible from the kind of tests you are doing that this is the case, this is also consistent with what you say it happens most specially after reboots. So i recommend before testing the add osd function, make sure the cluster is healthy and Ok status.

Since you did re-install more than once: either wipe the journals before creating new cluster or correctly delete the journal ( and OSDs ) of the old cluster before re-installing a new one as this will mark the already created partitions as free and can be re-used else the old journal partitions will be reserved and not re-used. If you end up in such case, we have 2 maintenance scripts which can help you flag old journal partitions as free:
/opt/petasan/scripts/util/make_journal_partition_free.py
/opt/petasan/scripts/util/journal_active_partitions.py

If with the above you still see an issue, you can describe the case in more detail but please make sure your cluster is Ok and you are not using journals already flagged as active from a prev cluster build, have enough resources such as RAM so we can try to reproduce it.

Lastly as stated earlier the 60G size and the 1:4 ratio are what we recommend, in a real deployment please use them.

Thanks for the last answer. I guess that was the problem that the cluster was not in a healthy state and too much things were going on.

Quote from admin on September 10, 2020, 7:29 pm

Lastly as stated earlier the 60G size and the 1:4 ratio are what we recommend, in a real deployment please use them.

Unfortunately, the setting always jumps back to 60 GB. We also use 2.6.2 and our setting of 300 GB is back to 60 GB after a few hours.

Can you check if it is defined in /etc/ceph/ceph.conf on any of the nodes ?

Is the Ceph cluster state healthy / Ok ? if not it will use the default 60 MB value as the custom value cannot be read.

Note the value affects new OSDs to be created not current ones.

Hello,

unfortunately it is not stored in ceph.conf. The cluster is of course clean. And yes, I installed 12 disks yesterday and today. This morning the value was back to 60 GB.

We have 3 clusters running here with 2.6.2 and it is the same problem everywhere. The value is not saved permanently. After a few hours it jumps back again. Is of course unpleasant, if I may always check the NVME's with fdisk.

But I have another problem. I'm currently trying to migrate a VM from one CEPH cluster to another CEPH cluster. Offline this works wonderfully, but online it does not work.

2021-07-15T14:16:08.184Z cpu32:5148113)WARNING: NMP: nmpCompleteRetryForPath:358: Retry cmd 0x83 (0x45c0c0665440) to dev "naa.60014054166abbe00002000000000000" failed on path "vmhba64:C0:T9:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x8 0x0.
2021-07-15T14:16:08.184Z cpu32:5148113)WARNING: NMP: nmpCompleteRetryForPath:388: Logical device "naa.60014054166abbe00002000000000000": awaiting fast path state update before retrying failed command again...
2021-07-15T14:16:09.184Z cpu12:2098166)WARNING: NMP: nmpDeviceAttemptFailover:641: Retry world failover device "naa.60014054166abbe00002000000000000" - issuing command 0x45c0c0665440
2021-07-15T14:16:09.185Z cpu12:5148113)WARNING: NMP: nmpCompleteRetryForPath:358: Retry cmd 0x83 (0x45c0c0665440) to dev "naa.60014054166abbe00002000000000000" failed on path "vmhba64:C0:T9:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x8 0x0.
2021-07-15T14:16:09.185Z cpu12:5148113)WARNING: NMP: nmpCompleteRetryForPath:388: Logical device "naa.60014054166abbe00002000000000000": awaiting fast path state update before retrying failed command again...
2021-07-15T14:16:10.184Z cpu10:2098166)WARNING: NMP: nmpDeviceAttemptFailover:641: Retry world failover device "naa.60014054166abbe00002000000000000" - issuing command 0x45c0c0665440
2021-07-15T14:16:10.184Z cpu10:5148113)WARNING: NMP: nmpCompleteRetryForPath:358: Retry cmd 0x83 (0x45c0c0665440) to dev "naa.60014054166abbe00002000000000000" failed on path "vmhba64:C0:T9:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x8 0x0.
2021-07-15T14:16:10.184Z cpu10:5148113)WARNING: NMP: nmpCompleteRetryForPath:388: Logical device "naa.60014054166abbe00002000000000000": awaiting fast path state update before retrying failed command again...
2021-07-15T14:16:11.184Z cpu36:2098166)WARNING: NMP: nmpDeviceAttemptFailover:641: Retry world failover device "naa.60014054166abbe00002000000000000" - issuing command 0x45c0c0665440
2021-07-15T14:16:11.185Z cpu30:5148113)WARNING: NMP: nmpCompleteRetryForPath:358: Retry cmd 0x83 (0x45c0c0665440) to dev "naa.60014054166abbe00002000000000000" failed on path "vmhba64:C0:T9:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x8 0x0.
2021-07-15T14:16:11.185Z cpu30:5148113)WARNING: NMP: nmpCompleteRetryForPath:388: Logical device "naa.60014054166abbe00002000000000000": awaiting fast path state update before retrying failed command again...
2021-07-15T14:16:11.412Z cpu57:2097294)ScsiDeviceIO: 4257: Cmd(0x45e0c4549d00) 0x89, cmdId.initiator=0x4305ea494500 CmdSN 0x111b1 from world 2097225 to dev "naa.60014054166abbe00002000000000000" failed H:0x5 D:0x0 P:0x0 Aborted at device layer. Cmd
2021-07-15T14:16:11.412Z cpu57:2097294)count Active:16 Queued:4
2021-07-15T14:16:12.184Z cpu62:2098166)WARNING: NMP: nmpDeviceAttemptFailover:641: Retry world failover device "naa.60014054166abbe00002000000000000" - issuing command 0x45c0c0665440
2021-07-15T14:16:12.184Z cpu30:5148113)WARNING: NMP: nmpCompleteRetryForPath:358: Retry cmd 0x83 (0x45c0c0665440) to dev "naa.60014054166abbe00002000000000000" failed on path "vmhba64:C0:T9:L0" H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x8 0x0.
2021-07-15T14:16:12.184Z cpu30:5148113)WARNING: NMP: nmpCompleteRetryForPath:388: Logical device "naa.60014054166abbe00002000000000000": awaiting fast path state update before retrying failed command again...
2021-07-15T14:16:12.408Z cpu10:2098864)ScsiDeviceIO: 4205: Cmd(0x45c0c0665440) 0x83, cmdId.initiator=0x4305ea494500 CmdSN 0x111a0 from world 5165224 to dev "naa.60014054166abbe00002000000000000" failed H:0x8 D:0x0 P:0x0 Aborted at NMP layer
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)ScsiDeviceIO: 4205: Cmd(0x45c0c06d6640) 0x83, cmdId.initiator=0x4305ea494500 CmdSN 0x111a1 from world 5165224 to dev "naa.60014054166abbe00002000000000000" failed H:0x8 D:0x0 P:0x0 Aborted at NMP layer
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)ScsiDeviceIO: 4205: Cmd(0x45c0c07eaa40) 0x83, cmdId.initiator=0x4305ea494500 CmdSN 0x111a2 from world 5165224 to dev "naa.60014054166abbe00002000000000000" failed H:0x8 D:0x0 P:0x0 Aborted at NMP layer
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)ScsiDeviceIO: 4205: Cmd(0x45c0c07e5540) 0x83, cmdId.initiator=0x4305ea494500 CmdSN 0x111a3 from world 5165224 to dev "naa.60014054166abbe00002000000000000" failed H:0x8 D:0x0 P:0x0 Aborted at NMP layer
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)ScsiDeviceIO: 4205: Cmd(0x45c0c064ce40) 0x83, cmdId.initiator=0x4305ea494500 CmdSN 0x111a4 from world 5165224 to dev "naa.60014054166abbe00002000000000000" failed H:0x8 D:0x0 P:0x0 Aborted at NMP layer
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu10:2098864)WARNING: NMP: nmp_DeviceStartLoop:740: NMP Device "naa.60014054166abbe00002000000000000" is blocked. Not starting I/O from device.
2021-07-15T14:16:12.408Z cpu12:5171243)FS3DM: 2322: status Busy copying 1 extents between two files, bytesTransferred = 0 extentsTransferred: 0
2021-07-15T14:16:12.408Z cpu57:2097294)ScsiDeviceIO: 4257: Cmd(0x45e0cee682c0) 0x89, cmdId.initiator=0x4305ea494500 CmdSN 0x111b5 from world 2097225 to dev "naa.60014054166abbe00002000000000000" failed H:0x5 D:0x0 P:0x0 Aborted at device layer. Cmd
2021-07-15T14:16:12.408Z cpu57:2097294)count Active:0 Queued:3
2021-07-15T14:16:12.408Z cpu12:5171243)SVM: 2907: scsi0:0 Completed copy in 160028 ms. vmmLeaderID = 5165224.
2021-07-15T14:16:12.408Z cpu18:5165240)SVM: 2569: SVM Mirrored mode IO stats for device: 6b6f19fd-a7880475-svmmirror
2021-07-15T14:16:12.408Z cpu18:5165240)SVM: 2573: Total # IOs mirrored: 0, Total # IOs sent only to source: 367, Total # IO deferred by lock: 0
2021-07-15T14:16:12.408Z cpu18:5165240)SVM: 2577: Deferred IO stats - Max: 0, Total: 0, Avg: 1 (msec)
2021-07-15T14:16:12.408Z cpu18:5165240)SVM: 2591: Destroyed device 6b6f19fd-a7880475-svmmirror
2021-07-15T14:16:12.408Z cpu8:2097244)ScsiDeviceIO: 4205: Cmd(0x45d0cc58e740) 0x28, cmdId.initiator=0x4305ea494500 CmdSN 0x111b0 from world 2100504 to dev "naa.60014054166abbe00002000000000000" failed H:0x8 D:0x0 P:0x0 Aborted at device layer
2021-07-15T14:16:12.408Z cpu8:2097244)ScsiDeviceIO: 4205: Cmd(0x45d8c3886700) 0x2a, cmdId.initiator=0x4305ea494500 CmdSN 0x111b2 from world 5167949 to dev "naa.60014054166abbe00002000000000000" failed H:0x8 D:0x0 P:0x0 Aborted at device layer
2021-07-15T14:16:12.409Z cpu8:2097244)ScsiDeviceIO: 4205: Cmd(0x45d8c38d0d00) 0x2a, cmdId.initiator=0x4305ea494500 CmdSN 0x111b3 from world 2100242 to dev "naa.60014054166abbe00002000000000000" failed H:0x8 D:0x0 P:0x0 Aborted at device layer
2021-07-15T14:16:12.409Z cpu33:5167949)J6: 2360: FS3_FSSyncIO failed on vol ceph3-vsphere-disk1 with IO was aborted by VMFS via a virt-reset on the device
2021-07-15T14:16:12.409Z cpu33:5167949)HBX: 3056: 'ceph3-vsphere-disk1': HB at offset 3866624 - Waiting for timed out HB:
2021-07-15T14:16:12.409Z cpu33:5167949) [HB state abcdef02 offset 3866624 gen 31 stampUS 12695924067309 uuid 602e8968-5bf3269d-fb9d-ac1f6b63df30 jrnl <FB 0> drv 24.82 lockImpl 4 ip 172.16.150.41]
2021-07-15T14:16:12.409Z cpu8:2097244)ScsiDeviceIO: 4205: Cmd(0x45c0cb257340) 0x28, cmdId.initiator=0x4305ea494500 CmdSN 0x111b4 from world 0 to dev "naa.60014054166abbe00002000000000000" failed H:0x8 D:0x0 P:0x0 Aborted at device layer
2021-07-15T14:16:12.507Z cpu18:5165240)SVM: 2569: SVM Mirrored mode IO stats for device: 72121a1a-5f620477-svmmirror
2021-07-15T14:16:12.507Z cpu18:5165240)SVM: 2573: Total # IOs mirrored: 0, Total # IOs sent only to source: 0, Total # IO deferred by lock: 0
2021-07-15T14:16:12.507Z cpu18:5165240)SVM: 2577: Deferred IO stats - Max: 0, Total: 0, Avg: 1 (msec)
2021-07-15T14:16:12.507Z cpu18:5165240)SVM: 2591: Destroyed device 72121a1a-5f620477-svmmirror
2021-07-15T14:16:12.507Z cpu12:5165231)HBX: 3056: 'ceph3-vsphere-disk1': HB at offset 3866624 - Waiting for timed out HB:
2021-07-15T14:16:12.507Z cpu12:5165231) [HB state abcdef02 offset 3866624 gen 31 stampUS 12695924067309 uuid 602e8968-5bf3269d-fb9d-ac1f6b63df30 jrnl <FB 0> drv 24.82 lockImpl 4 ip 172.16.150.41]
2021-07-15T14:16:12.520Z cpu16:5165240)VSCSI: 4180: handle 8262(vscsi0:0):Creating Virtual Device for world 5165224 (FSS handle 1802443261) numBlocks=209715200 (bs=512)
2021-07-15T14:16:12.520Z cpu16:5165240)VSCSI: 273: handle 8262(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000
2021-07-15T14:16:12.520Z cpu16:5165240)Vmxnet3: 19530: indLROPktToGuest: 1, vcd->umkShared->vrrsSelected: 3 port 0x4000047
2021-07-15T14:16:12.520Z cpu16:5165240)Vmxnet3: 19798: Using default queue delivery for vmxnet3 for port 0x4000047
2021-07-15T14:16:13.183Z cpu62:2098166)WARNING: NMP: nmpDeviceAttemptFailover:604: Retry world restore device "naa.60014054166abbe00002000000000000" - no more commands to retry
2021-07-15T14:16:13.192Z cpu12:2098864)HBX: 5758: Reclaiming HB at 3866624 on vol 'ceph3-vsphere-disk1' replayHostHB: 0 replayHostHBgen: 0 replayHostUUID: (00000000-00000000-0000-000000000000).

Pages: 1 2