Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Vlan configuration

Pages: 1 2 3 4 5

Great..happy it worked 🙂

Okay, upgrade to 1.5.0 time.  As stated above, I've got a bunch of HP Dl380 and DL385 machines in my cluster (along with various other Dell machines).  With 1.4.0 I was able to use the method provided by admin to get CCISS support working.

However, when I try to boot the 1.5.0 install media (that I've modified in exactly the same way as I modified 1.4.0), I get the following at boot time:

 

PetaSAN 1.5.0

Booting  Linux Kernel 4.4.92-09-petasan

Loading kernel modules

Starting udev daemon for hotplug support.

Detecting PetaSAN CD/USB install device with volume label PETASAN.udevd[174]: rename '/dev/disk/by-partlabel/ceph\%20journal.udev.tmp' '/dev/disk/by-partlabel/ceph\%20journal' failed: No such file or directory

........

The dots appear to continue forever (I waited 30 minutes).  Any suggestions or hints about what might be going wrong?

I know the install media works because it was able to successfully upgrade several Dell machines in the cluster.

Thanks for any pointers.

It is either your creation of new install iso/usb that is missing something or it could be the new kernel does not support the previous tweaks. Can you double check your steps + is does the new media have a volume label "PETASAN"  ?  the installation is failing to find disk with such label, it could also be the disk driver is not working.

One way to get things moving is to use the install usb stick that you modified with v 1.4, then leave the content of boot (v 1.4 kernel)  and replace cde / packages / rootfs directories. This will update to v 1.5 packages but still use the 1.4 kernel. The 1.5 kernel does provide significant iops improvements for iSCSI targets for small block sizes, but for the Ceph OSD storage it is the same. You can put your demanding iSCSI client to connect to Dell targets.

It will also be interesting to do another test and also replace the kernel files: vmlinuz and initrd.gz in the boot directory with the v 1,5 but leave other directories like isolinux/syslinux. If it still works then there was an issue with media creation, else the new kernel broke the tweak.

Okay, sorry for waking up a dead thread...

I was never able to successfully get a G5 machine upgraded to 1.5, so I currently have a hybrid 1.4/1.5 cluster.

I just took the 2.0 install media, and patched the kernel boot parameters as outlined earlier in this thread, and tried an install on a test G5 machine, and I was able to successfully install. This is a good thing. Do you know if there were any deliberate changes between 1.5 and 2.0 that address this, or is it just a fluke?

Do you anticipate any issues upgrading a hybrid 1.4/1.5 cluster to 2.0?

Can I get a clarification of the upgrade document? My monitor nodes also provide storage, so I'm thinking of the 'express installer downtime is acceptable' route. When it says "In this case it is recommended to shut down all nodes, perform upgrade then restart all nodes together." is this referring to 'all management nodes' or 'all ceph nodes' ?

v1.5 and v2.0 have the exact same kernel, it is actually the same build, there should not be a difference how they boot, but strange things do happen.

Since you are using relatively old hardware, the only concern i have is bluestore may not give you better performance. if you have extra nodes i would give v 2.0 a test install to make sure all is ok.

The recommendation is to shut down all Ceph/PetaSAN nodes together and restart together,  as if a power down outage occurred.  If possible "together" should mean relatively close from one another, like all be restarted within a 5 min period or less.

This array is secondary storage (backups of backups) and is the place where old hardware goes to die, so performance isn't a key requirement.

I read that bluestore allocates cache RAM in userspace and is configurable, instead of using the OS's caching of file systems. How is PetaSAN configuring this value, and does this make the amount of RAM in each node more critical? Is there now a hard requirement for a certain amount of RAM?

... also, shutting down together is relatively easy from remote. However re-powering them together is going to be tough - they are spread among two data centres in two adjoining buildings.

We are using default values for cache,  i was thinking of disk latency if you use old hardware with spinning hdds, in some cases  filestore could be faster specially if you use an SSD journal, it will double write but will smooth out latency, an external SSD in bluestore that stores the db/wal data will also reduce latency but by a lesser factor.

I would not worry about the time to restart nodes, if there is a delay between them things will still work fine, it may just take longer for the cluster to report it is back to healthy, may take like 10 min or so. i would try to start the management nodes first ( or at least 2 of them ) before the other nodes.

 

Can I do this?

I have monitor nodes with storage. Can I shut down the entire cluster, then upgrade the monitor nodes. Can I then bring up the cluster with the OSD nodes still 1.4? Then take the OSD nodes down one at a time to re-install them? Otherwise it looks like I have to take down the entire cluster for an extended period to upgrade everything, only to bring the cluster back up and take each OSD down to re-install (to upgrade the storage engine).

Yes you can do this.

Side note : when you re-install an OSD node you are upgrading from Ceph Jewel to Luminous, following this you will need to convert the storage engine one OSD at a time as per the upgrade guide.

Pages: 1 2 3 4 5