Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Newbie questions on drives/speed/redundancy

I am happy to say that I got a new PetaSAN cluster up and running (version 2.6.0).   Upon the original configuration attempt, I got:

Disk sdl prepare failure on node nc-san3.
Disk sdh prepare failure on node nc-san3.
Disk sdg prepare failure on node nc-san1.
Disk sde prepare failure on node nc-san1.
Disk sdj prepare failure on node nc-san1.
Disk sdi prepare failure on node nc-san1.
Disk sdr prepare failure on node nc-san1.
Disk sdg prepare failure on node nc-san2.
Disk sdc prepare failure on node nc-san2.
Disk sdl prepare failure on node nc-san2.

I was able to add all of these drives after the fact.   Their health seems fine.  Should I be concerned or maybe it just timed out due to the 54 large drives I was adding?   I made them all OSDs..   Coming from experience with RAID and ZFS (RAIDZ), I am trying to figure out how speed and redundancy works.   In this case, we have 3 nodes, each with 18x 2TB SSD disks.   Should I be using any for journeling or cache?  I didn't think so unless some were different (better performance) than others.

How can I calculate the overall capacity and how many drives can fail before data would be threatened?   What is the process to replace a failed drive?   Also, with RAID and RAIDZ, we can use multiple drives to increase I/O performance.    I need to read up on the inner workings of CEPH I suppose but at first glance, I do not see how to predict this with PetaSAN.   Any direction towards useful reading on the subject would be appreciated!

Thanks once more in advance!

Unlike RAID, when you do 1 read operation, you accessing 1 OSD disk, when writing you first write to 1 disk then to its replicas. The speed will be less that the speed of a single raw disk speed.

Adding a lot of disks in the cluster does allow having many operations/threads to be done concurrently, so the total bandwidth and iops of all these operations does increase with more disks. but it does not increase the performance of a single operation. The system is good at scaling with many reads/writes at the same time.

Redundancy is very flexible you can define your own. the default pool in PetaSAN does 3x replication so data is not not lost of up to 2 storage nodes die, the system will heal itself.

Replacing disks: When an OSD fails you can delete it from ui. You can add new OSD(s) manually just like you did. It is not a true replacement, it is more like the cluster disk count shrunk then later was augmented. If you do not replace/add more OSDs things will still keep working as if the existing OSD count is the correct value.

If your cluster is giving OK status now and things are working well. If you had earlier issues with OSD adding, you can look at the /opt/petasan/log/ceph-volume.log and see what it was complaining from, but generally things are Ok now.

The storage cluster will be used as the primary storage for OpenNebula KVM VPS (and also some LXD container hosting).     I was assuming iSCSI target is the way to go for this but I'd love to hear any other thoughts on this.   How would you set it up?   What is the best way to utilize the storage from my PetaSAN cluster?     Should we be using large iSCSI volumes split into individual LVM logical volumes for KVM use (this is how it worked in SolusVM and OnApp) or should each VPS block device be a  separate pool?      Can it skip iSCSI and utilize RDB directly somehow?   Excuse my ignorance on this.   I am learning as I go.

I heard there was a PetaSAN plugin for OpenNebula.   I was unable to find it though.

Not aware of OpenNebula specific plugin. You can connect kvm/qemu/libvirt to iSCSI or you can have them directly access rbd image. Do not see too much into LVM approach, probably creating individual images make more sense.

Which would be preferred in terms of performance and reliability iSCSI vs rbd?   Over which network would I be accessing rbd typically?   I currently have devoted 2x 10GigE to iSCSI 1 and 2.

 

Thanks again!

Accessing rbd directly from KVM makes more sense as iSCSI is an extra gateway step. You can also have KVM access the rbd image directly and also map the image as iSCSI in case you need to access the data from some other system.

talking to rbd image uses backend network, even if you use iSCSI the gateway will access the rbd via backend network. The backend network needs to be at least as fast as your iSCSI. You can have both iSCSI and  backend subnets share the same interface/bond if you want.

I wonder if I should bond 4x 10GigE for back end and also share it for iSCSI and anything else NFS/CIFS related (though I don't plan to use it).   I might as well use rbd for everything, no?   The major purpose will be for KVM virtual machines and container hosting (though I may reserve some portion for client backup storage on-demand, etc).

The backend should be at least as wide as the iSCSI 1 and 2 combined.

So either as stated, create a 4x10G bond and have all subnets on it or have 2x10G bond for backend and 2x10G for iSCIS. The later is better for segregation, the former is better if most of your traffic would be direct rbd