Forums - PetaSAN

ForumGeneral DiscussionThoughts on deploying inside of E …
You need to log in to create posts and topics. Login · Register
Thoughts on deploying inside of ESXi 5.5

dws88
7 Posts

March 30, 2017, 2:41 pm
Quote from dws88 on March 30, 2017, 2:41 pm
So I have a few Dell R610 servers, they each have 6 500GB 7.2k SATA drives with a RAID controller that can be configured as needed. I would like to run a PetaSan VM from within an ESXi host on each box that way we can utilize the local storage perhaps as a cold storage array for the time being while our high speed SAN dishes out VMs to each ESXi host. So my question is how can this best be implemented for speed and redundancy?

So I have a few Dell R610 servers, they each have 6 500GB 7.2k SATA drives with a RAID controller that can be configured as needed. I would like to run a PetaSan VM from within an ESXi host on each box that way we can utilize the local storage perhaps as a cold storage array for the time being while our high speed SAN dishes out VMs to each ESXi host. So my question is how can this best be implemented for speed and redundancy?

#1

admin
2,930 Posts

March 30, 2017, 9:11 pm
Quote from admin on March 30, 2017, 9:11 pm
For performance, the local disks should be accessed by VMs as pass-through storage. Ceph manages its own redundancy and performs better the more disks it uses, so it is better to configure them as JBOD if this is supported, or you can use RAID 0, either single disk (if supported) or 2 disks. If you have battery backed controller you can use write back caching, else disable write caching.

For redundancy, if a data disk (non system disk) fails it can be replaced with a new one from the ui . For ESX node failure : it may be useful to know that in Ceph you can remove data disks from one node and put them into another (cold or hotswap) and the disks will be continue serving data from the new node, the new node automatically detects the disks and spawn the correct processes to serve them, the PetaSAN ui also will pick this re-assignment automatically. So for a ESX node failure, just remove the data disks and add them to another ESX ( new or existing ).

The exception case is dealing with redundancy for the system disk which contains the PetaSAN and Ceph apps, if you can afford 2 disks for a RAID 1 system disk then do so. Also the first 3 nodes (the Management nodes) contain the brains of the cluster and at least 2 need to be up for the entire cluster to work, if you can setup a VM replication using an external tool ( such as Veeam ) will be best. Note that if you do not do this the cluster is quite safe since PetaSAN allows you to replace a failed Management node with a fresh machine via the deployment wizard, but a concurrent failure of 2 Management nodes will result in downtime to restart the cluster manually.

For performance, the local disks should be accessed by VMs as pass-through storage. Ceph manages its own redundancy and performs better the more disks it uses, so it is better to configure them as JBOD if this is supported, or you can use RAID 0, either single disk (if supported) or 2 disks. If you have battery backed controller you can use write back caching, else disable write caching.

For redundancy, if a data disk (non system disk) fails it can be replaced with a new one from the ui . For ESX node failure : it may be useful to know that in Ceph you can remove data disks from one node and put them into another (cold or hotswap) and the disks will be continue serving data from the new node, the new node automatically detects the disks and spawn the correct processes to serve them, the PetaSAN ui also will pick this re-assignment automatically. So for a ESX node failure, just remove the data disks and add them to another ESX ( new or existing ).

The exception case is dealing with redundancy for the system disk which contains the PetaSAN and Ceph apps, if you can afford 2 disks for a RAID 1 system disk then do so. Also the first 3 nodes (the Management nodes) contain the brains of the cluster and at least 2 need to be up for the entire cluster to work, if you can setup a VM replication using an external tool ( such as Veeam ) will be best. Note that if you do not do this the cluster is quite safe since PetaSAN allows you to replace a failed Management node with a fresh machine via the deployment wizard, but a concurrent failure of 2 Management nodes will result in downtime to restart the cluster manually.

#2

admin
2,930 Posts

March 31, 2017, 8:47 am
Quote from admin on March 31, 2017, 8:47 am
To clarify my previous statement

it may be useful to know that in Ceph you can remove data disks from one node and put them into another (cold or hotswap)

should be:

it may be useful to know that in Ceph you can remove data disks from one node and put them into other nodes (cold or hotswap)

To clarify my previous statement

it may be useful to know that in Ceph you can remove data disks from one node and put them into another (cold or hotswap)

should be:

it may be useful to know that in Ceph you can remove data disks from one node and put them into other nodes (cold or hotswap)

#3

dws88
7 Posts

April 3, 2017, 3:07 pm
Quote from dws88 on April 3, 2017, 3:07 pm
Thanks for your reply, based on your input it seems that the best way to go about this would be to have the first two disks handle the ESXi base as raid 1 and the other four in JBOD.

Thanks for your reply, based on your input it seems that the best way to go about this would be to have the first two disks handle the ESXi base as raid 1 and the other four in JBOD.

#4

dws88
7 Posts

April 4, 2017, 6:52 pm
Quote from dws88 on April 4, 2017, 6:52 pm
For anyone curious about mapping physical drives to a VM in ESXi 5.5 here is a good video, https://www.youtube.com/watch?v=XchoXCwpNGw&ytbChannel=Andrew%20Q%20Power

For anyone curious about mapping physical drives to a VM in ESXi 5.5 here is a good video, https://www.youtube.com/watch?v=XchoXCwpNGw&ytbChannel=Andrew%20Q%20Power

#5

dws88
7 Posts

April 10, 2017, 2:17 pm
Quote from dws88 on April 10, 2017, 2:17 pm
For whatever reason, my petasan nodes kept dropping disks when storage disks were raw mapped to the vm. I am not sure if this is an ESX thing or an issue with PetaSAN, so in the meantime I have gone with a raid 5 on ESX and have a large virtual disk mapped to each VM. I am getting roughly 6.5TB of space in a three node environment with 7.2k spinners and was able to install a windows 7 VM from the PetaSAN datastore in under 10 minutes.

For whatever reason, my petasan nodes kept dropping disks when storage disks were raw mapped to the vm. I am not sure if this is an ESX thing or an issue with PetaSAN, so in the meantime I have gone with a raid 5 on ESX and have a large virtual disk mapped to each VM. I am getting roughly 6.5TB of space in a three node environment with 7.2k spinners and was able to install a windows 7 VM from the PetaSAN datastore in under 10 minutes.

#6

admin
2,930 Posts

April 11, 2017, 11:05 am
Quote from admin on April 11, 2017, 11:05 am
Glad you have it working and happy with the performance. Note that the more OSD you add the faster it gets, also the Windows installation is a single threaded task, Ceph excels in concurrent load such as many installs going on at the same time, serving request for a hypervisor with many VMs ..etc.

I am not sure why you issues with the raw drive map setup. I will try test this here but not in the near future before 1.3 release. Some things that might help:

for the SCSI Controller type use VMWare Paravirtual (instead of LSI Logic Parallel which is slow). for the network adaptor us VMXNET3 (instead of e1000 which is slow). Also when you create your raw device mapping, can you add the "-a pvscsi" at end:

vmkfstools -z /vmfs/devices/disks/<device> /vmfs/volumes/<datastore>/<vm>/<file>.vmdk -a pvscsi

Aside from raw device mapping it is possible to configure your RAID controller as PCI pass through and be managed by the VM.

One of the advantages of having raw mapping or pci pass-through over your existing setup, apart from better performance, is that in case of ESX failure you can remove the disks from the failed node and place them in the other nodes. Of course your setup will work in case of failure, but Ceph will generate a complete replica of the data in the lost disks ( instead of just updating them with changes during their downtime ).

Glad you have it working and happy with the performance. Note that the more OSD you add the faster it gets, also the Windows installation is a single threaded task, Ceph excels in concurrent load such as many installs going on at the same time, serving request for a hypervisor with many VMs ..etc.

I am not sure why you issues with the raw drive map setup. I will try test this here but not in the near future before 1.3 release. Some things that might help:

for the SCSI Controller type use VMWare Paravirtual (instead of LSI Logic Parallel which is slow). for the network adaptor us VMXNET3 (instead of e1000 which is slow). Also when you create your raw device mapping, can you add the "-a pvscsi" at end:

vmkfstools -z /vmfs/devices/disks/<device> /vmfs/volumes/<datastore>/<vm>/<file>.vmdk -a pvscsi

Aside from raw device mapping it is possible to configure your RAID controller as PCI pass through and be managed by the VM.

One of the advantages of having raw mapping or pci pass-through over your existing setup, apart from better performance, is that in case of ESX failure you can remove the disks from the failed node and place them in the other nodes. Of course your setup will work in case of failure, but Ceph will generate a complete replica of the data in the lost disks ( instead of just updating them with changes during their downtime ).

Last edited on April 11, 2017, 11:06 am · #7

philip.shannon
37 Posts

June 9, 2017, 5:30 pm
Quote from philip.shannon on June 9, 2017, 5:30 pm
fyi the "-a pvscsi" switch doesn't work with esxi 6.5 I got this msg: "Option --adaptertype is deprecated and hence will be ignored" also found that using the vsphere 6.5 web client you don't need to run those commands at all. There is an option when adding a disk to a vm named "RDM Disk" you select that and it gives you a list of raw ssd's in the server so you can keep repeating this until all are added. Each time you add a RDM Disk this way it adds another .vmdk file in the same folder as the vm just appends a _1, _2, _3 etc to them

some screenshots of this here: http://www.vkernel.ro/blog/adding-a-raw-device-mapping-rdm-to-a-virtual-machine

instructions from VMware here: http://pubs.vmware.com/vsphere-60/index.jsp#com.vmware.vsphere.vm_admin.doc/GUID-4236E44E-E11F-4EDD-8CC0-12BA664BB811.html

fyi the "-a pvscsi" switch doesn't work with esxi 6.5 I got this msg: "Option --adaptertype is deprecated and hence will be ignored" also found that using the vsphere 6.5 web client you don't need to run those commands at all. There is an option when adding a disk to a vm named "RDM Disk" you select that and it gives you a list of raw ssd's in the server so you can keep repeating this until all are added. Each time you add a RDM Disk this way it adds another .vmdk file in the same folder as the vm just appends a _1, _2, _3 etc to them

some screenshots of this here: http://www.vkernel.ro/blog/adding-a-raw-device-mapping-rdm-to-a-virtual-machine

instructions from VMware here: http://pubs.vmware.com/vsphere-60/index.jsp#com.vmware.vsphere.vm_admin.doc/GUID-4236E44E-E11F-4EDD-8CC0-12BA664BB811.html

#8

admin
2,930 Posts

June 10, 2017, 2:19 pm
Quote from admin on June 10, 2017, 2:19 pm
Thanks for sharing this info. We are currently testing PetaSAN hyper-converged setup in ESXi 6.0, hopefully we will have some results/recommendations soon. Also if you have any recommendations from your setup, it will be great to share.

Thanks for sharing this info. We are currently testing PetaSAN hyper-converged setup in ESXi 6.0, hopefully we will have some results/recommendations soon. Also if you have any recommendations from your setup, it will be great to share.

#9

philip.shannon
37 Posts

June 10, 2017, 7:30 pm
Quote from philip.shannon on June 10, 2017, 7:30 pm
I got everything up and working on 3 Dell PowerEdge 730's using a 24 port switch for the networking but all the nic's are 1gig this was a POC. 3 petasan nodes one on each esxi 6.5 host with SSD drives. Performance wasn't that good in this POC probably because of the 1G networking limitation but everything seems to work as expected. If we go with it this for production environment then everything will be using 10G networking. I look forward to checking out version 1.4 as well, keep up the great work.

I got everything up and working on 3 Dell PowerEdge 730's using a 24 port switch for the networking but all the nic's are 1gig this was a POC. 3 petasan nodes one on each esxi 6.5 host with SSD drives. Performance wasn't that good in this POC probably because of the 1G networking limitation but everything seems to work as expected. If we go with it this for production environment then everything will be using 10G networking. I look forward to checking out version 1.4 as well, keep up the great work.

Last edited on June 10, 2017, 7:31 pm · #10

Post Reply: Thoughts on deploying inside of ESXi 5.5

Cancel