Vlan configuration
admin
2,930 Posts
August 30, 2017, 11:41 amQuote from admin on August 30, 2017, 11:41 amGreat..happy it worked 🙂
Great..happy it worked 🙂
erazmus
40 Posts
January 17, 2018, 10:21 pmQuote from erazmus on January 17, 2018, 10:21 pmOkay, upgrade to 1.5.0 time. As stated above, I've got a bunch of HP Dl380 and DL385 machines in my cluster (along with various other Dell machines). With 1.4.0 I was able to use the method provided by admin to get CCISS support working.
However, when I try to boot the 1.5.0 install media (that I've modified in exactly the same way as I modified 1.4.0), I get the following at boot time:
PetaSAN 1.5.0
Booting Linux Kernel 4.4.92-09-petasan
Loading kernel modules
Starting udev daemon for hotplug support.
Detecting PetaSAN CD/USB install device with volume label PETASAN.udevd[174]: rename '/dev/disk/by-partlabel/ceph\%20journal.udev.tmp' '/dev/disk/by-partlabel/ceph\%20journal' failed: No such file or directory
........
The dots appear to continue forever (I waited 30 minutes). Any suggestions or hints about what might be going wrong?
I know the install media works because it was able to successfully upgrade several Dell machines in the cluster.
Thanks for any pointers.
Okay, upgrade to 1.5.0 time. As stated above, I've got a bunch of HP Dl380 and DL385 machines in my cluster (along with various other Dell machines). With 1.4.0 I was able to use the method provided by admin to get CCISS support working.
However, when I try to boot the 1.5.0 install media (that I've modified in exactly the same way as I modified 1.4.0), I get the following at boot time:
PetaSAN 1.5.0
Booting Linux Kernel 4.4.92-09-petasan
Loading kernel modules
Starting udev daemon for hotplug support.
Detecting PetaSAN CD/USB install device with volume label PETASAN.udevd[174]: rename '/dev/disk/by-partlabel/ceph\%20journal.udev.tmp' '/dev/disk/by-partlabel/ceph\%20journal' failed: No such file or directory
........
The dots appear to continue forever (I waited 30 minutes). Any suggestions or hints about what might be going wrong?
I know the install media works because it was able to successfully upgrade several Dell machines in the cluster.
Thanks for any pointers.
admin
2,930 Posts
January 18, 2018, 12:49 pmQuote from admin on January 18, 2018, 12:49 pmIt is either your creation of new install iso/usb that is missing something or it could be the new kernel does not support the previous tweaks. Can you double check your steps + is does the new media have a volume label "PETASAN" ? the installation is failing to find disk with such label, it could also be the disk driver is not working.
One way to get things moving is to use the install usb stick that you modified with v 1.4, then leave the content of boot (v 1.4 kernel) and replace cde / packages / rootfs directories. This will update to v 1.5 packages but still use the 1.4 kernel. The 1.5 kernel does provide significant iops improvements for iSCSI targets for small block sizes, but for the Ceph OSD storage it is the same. You can put your demanding iSCSI client to connect to Dell targets.
It will also be interesting to do another test and also replace the kernel files: vmlinuz and initrd.gz in the boot directory with the v 1,5 but leave other directories like isolinux/syslinux. If it still works then there was an issue with media creation, else the new kernel broke the tweak.
It is either your creation of new install iso/usb that is missing something or it could be the new kernel does not support the previous tweaks. Can you double check your steps + is does the new media have a volume label "PETASAN" ? the installation is failing to find disk with such label, it could also be the disk driver is not working.
One way to get things moving is to use the install usb stick that you modified with v 1.4, then leave the content of boot (v 1.4 kernel) and replace cde / packages / rootfs directories. This will update to v 1.5 packages but still use the 1.4 kernel. The 1.5 kernel does provide significant iops improvements for iSCSI targets for small block sizes, but for the Ceph OSD storage it is the same. You can put your demanding iSCSI client to connect to Dell targets.
It will also be interesting to do another test and also replace the kernel files: vmlinuz and initrd.gz in the boot directory with the v 1,5 but leave other directories like isolinux/syslinux. If it still works then there was an issue with media creation, else the new kernel broke the tweak.
erazmus
40 Posts
March 6, 2018, 7:55 pmQuote from erazmus on March 6, 2018, 7:55 pmOkay, sorry for waking up a dead thread...
I was never able to successfully get a G5 machine upgraded to 1.5, so I currently have a hybrid 1.4/1.5 cluster.
I just took the 2.0 install media, and patched the kernel boot parameters as outlined earlier in this thread, and tried an install on a test G5 machine, and I was able to successfully install. This is a good thing. Do you know if there were any deliberate changes between 1.5 and 2.0 that address this, or is it just a fluke?
Do you anticipate any issues upgrading a hybrid 1.4/1.5 cluster to 2.0?
Can I get a clarification of the upgrade document? My monitor nodes also provide storage, so I'm thinking of the 'express installer downtime is acceptable' route. When it says "In this case it is recommended to shut down all nodes, perform upgrade then restart all nodes together." is this referring to 'all management nodes' or 'all ceph nodes' ?
Okay, sorry for waking up a dead thread...
I was never able to successfully get a G5 machine upgraded to 1.5, so I currently have a hybrid 1.4/1.5 cluster.
I just took the 2.0 install media, and patched the kernel boot parameters as outlined earlier in this thread, and tried an install on a test G5 machine, and I was able to successfully install. This is a good thing. Do you know if there were any deliberate changes between 1.5 and 2.0 that address this, or is it just a fluke?
Do you anticipate any issues upgrading a hybrid 1.4/1.5 cluster to 2.0?
Can I get a clarification of the upgrade document? My monitor nodes also provide storage, so I'm thinking of the 'express installer downtime is acceptable' route. When it says "In this case it is recommended to shut down all nodes, perform upgrade then restart all nodes together." is this referring to 'all management nodes' or 'all ceph nodes' ?
admin
2,930 Posts
March 6, 2018, 8:26 pmQuote from admin on March 6, 2018, 8:26 pmv1.5 and v2.0 have the exact same kernel, it is actually the same build, there should not be a difference how they boot, but strange things do happen.
Since you are using relatively old hardware, the only concern i have is bluestore may not give you better performance. if you have extra nodes i would give v 2.0 a test install to make sure all is ok.
The recommendation is to shut down all Ceph/PetaSAN nodes together and restart together, as if a power down outage occurred. If possible "together" should mean relatively close from one another, like all be restarted within a 5 min period or less.
v1.5 and v2.0 have the exact same kernel, it is actually the same build, there should not be a difference how they boot, but strange things do happen.
Since you are using relatively old hardware, the only concern i have is bluestore may not give you better performance. if you have extra nodes i would give v 2.0 a test install to make sure all is ok.
The recommendation is to shut down all Ceph/PetaSAN nodes together and restart together, as if a power down outage occurred. If possible "together" should mean relatively close from one another, like all be restarted within a 5 min period or less.
Last edited on March 6, 2018, 8:26 pm by admin · #15
erazmus
40 Posts
March 6, 2018, 9:11 pmQuote from erazmus on March 6, 2018, 9:11 pmThis array is secondary storage (backups of backups) and is the place where old hardware goes to die, so performance isn't a key requirement.
I read that bluestore allocates cache RAM in userspace and is configurable, instead of using the OS's caching of file systems. How is PetaSAN configuring this value, and does this make the amount of RAM in each node more critical? Is there now a hard requirement for a certain amount of RAM?
This array is secondary storage (backups of backups) and is the place where old hardware goes to die, so performance isn't a key requirement.
I read that bluestore allocates cache RAM in userspace and is configurable, instead of using the OS's caching of file systems. How is PetaSAN configuring this value, and does this make the amount of RAM in each node more critical? Is there now a hard requirement for a certain amount of RAM?
erazmus
40 Posts
March 6, 2018, 9:13 pmQuote from erazmus on March 6, 2018, 9:13 pm... also, shutting down together is relatively easy from remote. However re-powering them together is going to be tough - they are spread among two data centres in two adjoining buildings.
... also, shutting down together is relatively easy from remote. However re-powering them together is going to be tough - they are spread among two data centres in two adjoining buildings.
admin
2,930 Posts
March 6, 2018, 10:02 pmQuote from admin on March 6, 2018, 10:02 pmWe are using default values for cache, i was thinking of disk latency if you use old hardware with spinning hdds, in some cases filestore could be faster specially if you use an SSD journal, it will double write but will smooth out latency, an external SSD in bluestore that stores the db/wal data will also reduce latency but by a lesser factor.
I would not worry about the time to restart nodes, if there is a delay between them things will still work fine, it may just take longer for the cluster to report it is back to healthy, may take like 10 min or so. i would try to start the management nodes first ( or at least 2 of them ) before the other nodes.
We are using default values for cache, i was thinking of disk latency if you use old hardware with spinning hdds, in some cases filestore could be faster specially if you use an SSD journal, it will double write but will smooth out latency, an external SSD in bluestore that stores the db/wal data will also reduce latency but by a lesser factor.
I would not worry about the time to restart nodes, if there is a delay between them things will still work fine, it may just take longer for the cluster to report it is back to healthy, may take like 10 min or so. i would try to start the management nodes first ( or at least 2 of them ) before the other nodes.
Last edited on March 6, 2018, 10:03 pm by admin · #18
erazmus
40 Posts
March 9, 2018, 2:57 pmQuote from erazmus on March 9, 2018, 2:57 pmCan I do this?
I have monitor nodes with storage. Can I shut down the entire cluster, then upgrade the monitor nodes. Can I then bring up the cluster with the OSD nodes still 1.4? Then take the OSD nodes down one at a time to re-install them? Otherwise it looks like I have to take down the entire cluster for an extended period to upgrade everything, only to bring the cluster back up and take each OSD down to re-install (to upgrade the storage engine).
Can I do this?
I have monitor nodes with storage. Can I shut down the entire cluster, then upgrade the monitor nodes. Can I then bring up the cluster with the OSD nodes still 1.4? Then take the OSD nodes down one at a time to re-install them? Otherwise it looks like I have to take down the entire cluster for an extended period to upgrade everything, only to bring the cluster back up and take each OSD down to re-install (to upgrade the storage engine).
admin
2,930 Posts
March 9, 2018, 5:08 pmQuote from admin on March 9, 2018, 5:08 pmYes you can do this.
Side note : when you re-install an OSD node you are upgrading from Ceph Jewel to Luminous, following this you will need to convert the storage engine one OSD at a time as per the upgrade guide.
Yes you can do this.
Side note : when you re-install an OSD node you are upgrading from Ceph Jewel to Luminous, following this you will need to convert the storage engine one OSD at a time as per the upgrade guide.
Last edited on March 9, 2018, 5:14 pm by admin · #20
Vlan configuration
admin
2,930 Posts
Quote from admin on August 30, 2017, 11:41 amGreat..happy it worked 🙂
Great..happy it worked 🙂
erazmus
40 Posts
Quote from erazmus on January 17, 2018, 10:21 pmOkay, upgrade to 1.5.0 time. As stated above, I've got a bunch of HP Dl380 and DL385 machines in my cluster (along with various other Dell machines). With 1.4.0 I was able to use the method provided by admin to get CCISS support working.
However, when I try to boot the 1.5.0 install media (that I've modified in exactly the same way as I modified 1.4.0), I get the following at boot time:
PetaSAN 1.5.0
Booting Linux Kernel 4.4.92-09-petasan
Loading kernel modules
Starting udev daemon for hotplug support.
Detecting PetaSAN CD/USB install device with volume label PETASAN.udevd[174]: rename '/dev/disk/by-partlabel/ceph\%20journal.udev.tmp' '/dev/disk/by-partlabel/ceph\%20journal' failed: No such file or directory
........
The dots appear to continue forever (I waited 30 minutes). Any suggestions or hints about what might be going wrong?
I know the install media works because it was able to successfully upgrade several Dell machines in the cluster.
Thanks for any pointers.
Okay, upgrade to 1.5.0 time. As stated above, I've got a bunch of HP Dl380 and DL385 machines in my cluster (along with various other Dell machines). With 1.4.0 I was able to use the method provided by admin to get CCISS support working.
However, when I try to boot the 1.5.0 install media (that I've modified in exactly the same way as I modified 1.4.0), I get the following at boot time:
PetaSAN 1.5.0
Booting Linux Kernel 4.4.92-09-petasan
Loading kernel modules
Starting udev daemon for hotplug support.
Detecting PetaSAN CD/USB install device with volume label PETASAN.udevd[174]: rename '/dev/disk/by-partlabel/ceph\%20journal.udev.tmp' '/dev/disk/by-partlabel/ceph\%20journal' failed: No such file or directory
........
The dots appear to continue forever (I waited 30 minutes). Any suggestions or hints about what might be going wrong?
I know the install media works because it was able to successfully upgrade several Dell machines in the cluster.
Thanks for any pointers.
admin
2,930 Posts
Quote from admin on January 18, 2018, 12:49 pmIt is either your creation of new install iso/usb that is missing something or it could be the new kernel does not support the previous tweaks. Can you double check your steps + is does the new media have a volume label "PETASAN" ? the installation is failing to find disk with such label, it could also be the disk driver is not working.
One way to get things moving is to use the install usb stick that you modified with v 1.4, then leave the content of boot (v 1.4 kernel) and replace cde / packages / rootfs directories. This will update to v 1.5 packages but still use the 1.4 kernel. The 1.5 kernel does provide significant iops improvements for iSCSI targets for small block sizes, but for the Ceph OSD storage it is the same. You can put your demanding iSCSI client to connect to Dell targets.
It will also be interesting to do another test and also replace the kernel files: vmlinuz and initrd.gz in the boot directory with the v 1,5 but leave other directories like isolinux/syslinux. If it still works then there was an issue with media creation, else the new kernel broke the tweak.
It is either your creation of new install iso/usb that is missing something or it could be the new kernel does not support the previous tweaks. Can you double check your steps + is does the new media have a volume label "PETASAN" ? the installation is failing to find disk with such label, it could also be the disk driver is not working.
One way to get things moving is to use the install usb stick that you modified with v 1.4, then leave the content of boot (v 1.4 kernel) and replace cde / packages / rootfs directories. This will update to v 1.5 packages but still use the 1.4 kernel. The 1.5 kernel does provide significant iops improvements for iSCSI targets for small block sizes, but for the Ceph OSD storage it is the same. You can put your demanding iSCSI client to connect to Dell targets.
It will also be interesting to do another test and also replace the kernel files: vmlinuz and initrd.gz in the boot directory with the v 1,5 but leave other directories like isolinux/syslinux. If it still works then there was an issue with media creation, else the new kernel broke the tweak.
erazmus
40 Posts
Quote from erazmus on March 6, 2018, 7:55 pmOkay, sorry for waking up a dead thread...
I was never able to successfully get a G5 machine upgraded to 1.5, so I currently have a hybrid 1.4/1.5 cluster.
I just took the 2.0 install media, and patched the kernel boot parameters as outlined earlier in this thread, and tried an install on a test G5 machine, and I was able to successfully install. This is a good thing. Do you know if there were any deliberate changes between 1.5 and 2.0 that address this, or is it just a fluke?
Do you anticipate any issues upgrading a hybrid 1.4/1.5 cluster to 2.0?
Can I get a clarification of the upgrade document? My monitor nodes also provide storage, so I'm thinking of the 'express installer downtime is acceptable' route. When it says "In this case it is recommended to shut down all nodes, perform upgrade then restart all nodes together." is this referring to 'all management nodes' or 'all ceph nodes' ?
Okay, sorry for waking up a dead thread...
I was never able to successfully get a G5 machine upgraded to 1.5, so I currently have a hybrid 1.4/1.5 cluster.
I just took the 2.0 install media, and patched the kernel boot parameters as outlined earlier in this thread, and tried an install on a test G5 machine, and I was able to successfully install. This is a good thing. Do you know if there were any deliberate changes between 1.5 and 2.0 that address this, or is it just a fluke?
Do you anticipate any issues upgrading a hybrid 1.4/1.5 cluster to 2.0?
Can I get a clarification of the upgrade document? My monitor nodes also provide storage, so I'm thinking of the 'express installer downtime is acceptable' route. When it says "In this case it is recommended to shut down all nodes, perform upgrade then restart all nodes together." is this referring to 'all management nodes' or 'all ceph nodes' ?
admin
2,930 Posts
Quote from admin on March 6, 2018, 8:26 pmv1.5 and v2.0 have the exact same kernel, it is actually the same build, there should not be a difference how they boot, but strange things do happen.
Since you are using relatively old hardware, the only concern i have is bluestore may not give you better performance. if you have extra nodes i would give v 2.0 a test install to make sure all is ok.
The recommendation is to shut down all Ceph/PetaSAN nodes together and restart together, as if a power down outage occurred. If possible "together" should mean relatively close from one another, like all be restarted within a 5 min period or less.
v1.5 and v2.0 have the exact same kernel, it is actually the same build, there should not be a difference how they boot, but strange things do happen.
Since you are using relatively old hardware, the only concern i have is bluestore may not give you better performance. if you have extra nodes i would give v 2.0 a test install to make sure all is ok.
The recommendation is to shut down all Ceph/PetaSAN nodes together and restart together, as if a power down outage occurred. If possible "together" should mean relatively close from one another, like all be restarted within a 5 min period or less.
erazmus
40 Posts
Quote from erazmus on March 6, 2018, 9:11 pmThis array is secondary storage (backups of backups) and is the place where old hardware goes to die, so performance isn't a key requirement.
I read that bluestore allocates cache RAM in userspace and is configurable, instead of using the OS's caching of file systems. How is PetaSAN configuring this value, and does this make the amount of RAM in each node more critical? Is there now a hard requirement for a certain amount of RAM?
This array is secondary storage (backups of backups) and is the place where old hardware goes to die, so performance isn't a key requirement.
I read that bluestore allocates cache RAM in userspace and is configurable, instead of using the OS's caching of file systems. How is PetaSAN configuring this value, and does this make the amount of RAM in each node more critical? Is there now a hard requirement for a certain amount of RAM?
erazmus
40 Posts
Quote from erazmus on March 6, 2018, 9:13 pm... also, shutting down together is relatively easy from remote. However re-powering them together is going to be tough - they are spread among two data centres in two adjoining buildings.
... also, shutting down together is relatively easy from remote. However re-powering them together is going to be tough - they are spread among two data centres in two adjoining buildings.
admin
2,930 Posts
Quote from admin on March 6, 2018, 10:02 pmWe are using default values for cache, i was thinking of disk latency if you use old hardware with spinning hdds, in some cases filestore could be faster specially if you use an SSD journal, it will double write but will smooth out latency, an external SSD in bluestore that stores the db/wal data will also reduce latency but by a lesser factor.
I would not worry about the time to restart nodes, if there is a delay between them things will still work fine, it may just take longer for the cluster to report it is back to healthy, may take like 10 min or so. i would try to start the management nodes first ( or at least 2 of them ) before the other nodes.
We are using default values for cache, i was thinking of disk latency if you use old hardware with spinning hdds, in some cases filestore could be faster specially if you use an SSD journal, it will double write but will smooth out latency, an external SSD in bluestore that stores the db/wal data will also reduce latency but by a lesser factor.
I would not worry about the time to restart nodes, if there is a delay between them things will still work fine, it may just take longer for the cluster to report it is back to healthy, may take like 10 min or so. i would try to start the management nodes first ( or at least 2 of them ) before the other nodes.
erazmus
40 Posts
Quote from erazmus on March 9, 2018, 2:57 pmCan I do this?
I have monitor nodes with storage. Can I shut down the entire cluster, then upgrade the monitor nodes. Can I then bring up the cluster with the OSD nodes still 1.4? Then take the OSD nodes down one at a time to re-install them? Otherwise it looks like I have to take down the entire cluster for an extended period to upgrade everything, only to bring the cluster back up and take each OSD down to re-install (to upgrade the storage engine).
Can I do this?
I have monitor nodes with storage. Can I shut down the entire cluster, then upgrade the monitor nodes. Can I then bring up the cluster with the OSD nodes still 1.4? Then take the OSD nodes down one at a time to re-install them? Otherwise it looks like I have to take down the entire cluster for an extended period to upgrade everything, only to bring the cluster back up and take each OSD down to re-install (to upgrade the storage engine).
admin
2,930 Posts
Quote from admin on March 9, 2018, 5:08 pmYes you can do this.
Side note : when you re-install an OSD node you are upgrading from Ceph Jewel to Luminous, following this you will need to convert the storage engine one OSD at a time as per the upgrade guide.
Yes you can do this.
Side note : when you re-install an OSD node you are upgrading from Ceph Jewel to Luminous, following this you will need to convert the storage engine one OSD at a time as per the upgrade guide.