Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Thoughts on deploying inside of ESXi 5.5

So I have a few Dell R610 servers, they each have 6 500GB 7.2k SATA drives with a RAID controller that can be configured as needed. I would like to run a PetaSan VM from within an ESXi host on each box that way we can utilize the local storage perhaps as a cold storage array for the time being while our high speed SAN dishes out VMs to each ESXi host. So my question is how can this best be implemented for speed and redundancy?

For performance, the local disks should be accessed by VMs as pass-through storage. Ceph manages its own redundancy and performs better the more disks it uses, so it is better to configure them as JBOD if this is supported, or you can use RAID 0, either single disk (if supported) or 2 disks. If you have battery backed controller you can use write back caching, else disable write caching.

For redundancy, if a data disk (non system disk) fails it can be replaced with a new one from the ui . For ESX node failure : it may be useful to know that in Ceph you can remove data disks from one node and put them into another (cold or hotswap) and the disks will be continue serving data from the new node, the new node automatically detects the disks and spawn the correct processes to serve them, the PetaSAN ui also will pick this re-assignment automatically. So for a ESX node failure, just remove the data disks and add them to another ESX ( new or existing ).

The exception case is dealing with redundancy for the system disk which contains the PetaSAN and Ceph apps,  if you can afford 2 disks for a RAID 1 system disk then do so. Also the first 3 nodes (the Management nodes) contain the brains of the cluster and at least 2 need to be up for the entire cluster to work, if you can setup a  VM replication using an external tool ( such as Veeam ) will be best. Note that if you do not do this the cluster is quite safe since PetaSAN allows you to replace a failed Management node with a fresh machine via the deployment wizard, but a concurrent failure of 2 Management nodes will result in downtime to restart the cluster manually.

To clarify my previous statement

it may be useful to know that in Ceph you can remove data disks from one node and put them into another (cold or hotswap)

should be:

it may be useful to know that in Ceph you can remove data disks from one node and put them into other nodes (cold or hotswap)

Thanks for your reply, based on your input it seems that the best way to go about this would be to have the first two disks handle the ESXi base as raid 1 and the other four in JBOD.

For anyone curious about mapping physical drives to a VM in ESXi 5.5 here is a good video, https://www.youtube.com/watch?v=XchoXCwpNGw&ytbChannel=Andrew%20Q%20Power

For whatever reason, my petasan nodes kept dropping disks when storage disks were raw mapped to the vm. I am not sure if this is an ESX thing or an issue with PetaSAN, so in the meantime I have gone with a raid 5 on ESX and have a large virtual disk mapped to each VM. I am getting roughly 6.5TB of space in a three node environment with 7.2k spinners and was able to install a windows 7 VM from the PetaSAN datastore in under 10 minutes.

Glad you have it working and happy with the performance. Note that the more OSD you add the faster it gets, also the Windows installation is a single threaded task, Ceph excels in concurrent load such as many installs going on at the same time, serving request for a hypervisor with many VMs ..etc.

I am not sure why you issues with the raw drive map setup. I will try test this here but not in the near future before 1.3 release.  Some things that might help:

for the SCSI Controller type use VMWare Paravirtual (instead of LSI Logic Parallel which is slow). for the network adaptor us VMXNET3 (instead of e1000 which is slow). Also when you create your raw device mapping, can you add the "-a pvscsi" at end:

vmkfstools -z /vmfs/devices/disks/<device> /vmfs/volumes/<datastore>/<vm>/<file>.vmdk -a pvscsi

Aside from raw device mapping it is possible to configure your RAID controller as PCI pass through and be managed by the VM.

One of the advantages of having raw mapping or pci pass-through over your existing setup, apart from better performance, is that in case of ESX failure you can remove the disks from the failed node and place them in the other nodes. Of course your setup will work in case of failure, but Ceph will generate a complete replica of the data in the lost disks ( instead of just updating them with changes during their downtime ).

fyi the "-a pvscsi" switch doesn't work with esxi 6.5 I got this msg: "Option --adaptertype is deprecated and hence will be ignored" also found that using the vsphere 6.5 web client you don't need to run those commands at all. There is an option when adding a disk to a vm named "RDM Disk" you select that and it gives you a list of raw ssd's in the server so you can keep repeating this until all are added. Each time you add a RDM Disk this way it adds another .vmdk file in the same folder as the vm just appends a _1, _2, _3 etc to them

some screenshots of this here: http://www.vkernel.ro/blog/adding-a-raw-device-mapping-rdm-to-a-virtual-machine

 

instructions from VMware here: http://pubs.vmware.com/vsphere-60/index.jsp#com.vmware.vsphere.vm_admin.doc/GUID-4236E44E-E11F-4EDD-8CC0-12BA664BB811.html

 

Thanks for sharing this info. We are currently testing PetaSAN hyper-converged setup in  ESXi 6.0, hopefully we will have some results/recommendations soon. Also if you have any recommendations from your setup, it will be great to share.

I got everything up and working on 3 Dell PowerEdge 730's using a 24 port switch for the networking but all the nic's are 1gig this was a POC. 3 petasan nodes one on each esxi 6.5 host with SSD drives. Performance wasn't that good in this POC probably because of the 1G networking limitation but everything seems to work as expected. If we go with it this for production environment then everything will be using 10G networking.  I look forward to checking out version 1.4 as well, keep up the great work.