Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Mellanox Infiniband ConnectX-2 VPI native support

Hello All,

WARNING: long-winded post follows, the author is a tad sadistic with his word bloat and chooses to inflict ocular pain in the following manner.

 

Before we start, a quote from PetaSAN-Admin: "PetaSAN does not support Infiniband." In this I infer that the group does not support Infiniband and have taken this to task. After acquiring the headers for another set of drivers I needed to compile, I decided to try getting Infiniband working for my setup. BY FOLLOWING THIS POSTS INSTRUCTIONS, YOU ACCEPT ALL RESPONSIBILITY FOR ANY DAMAGES THAT OCCUR AND HOLD ME BLAMELESS OF ALL LIABILITY. (sorry, required disclaimer to cover my butt incase someone bricks a card/system/etc.)

I thought I would share with you on how to get your ConnectX-2 VPI cards/ports working in PetaSAN 1.5.0 Your cards must be running the latest 2.9.x Mellanox firmware and not the HP/Dell/Other firmwares. These OEM firmwares have issues with compatibility, both in the drivers and with fabric devices. It is best to just take the time and flash the cards to the Mellanox firmware. They have a wonderful how-to on their site and there are several on Google that though are dated for the ConnectX-1 cards, still work for the newer cards. The correct firmware to download depends on the actual card and not what it reports to the OS. You must use mstflint to get the correct information to get the correct replacement firmware. WARNING: THIS PROCESS CAN BRICK YOUR CARD and I assume no liability nor responsibility for your actions in this regard. If you want I can flash your cards for you, but I do this on request only so PM me.

Normally once you have access to the root login, you update apt-get and then mount the OFED iso and start compiling. You do not need to do this with PetaSAN 1.5.0 as the correct drivers and basic OFED support is already available and installed, just have to turn it on.

Do note that RDMA/SRP/SDP support does not work with the basic OFED drivers.

I realize most whom will read this is an experienced Linux user, some even experienced Debian/Ubuntu users. I do not mean to offend, but to be clear to those that will try to follow, I am writing this as more of a hold-your-hand-step-by-step and assume the user knows nothing at all.

First, Install PetaSAN like normal and forget about Infiniband issues for now, just make sure that there is no network connections plugged in during the install. For some reason PetaSAN Installer takes a long time to find the USB boot drive (which after you create the boot drive, you must name it: PETASAN or it wont work) and continue with the install.

Second, once PetaSAN has rebooted and your able to see the information console: press enter to get to the options, arrow down to bash shell and arrow left once then press enter. This should give you a shell prompt.

Third: check that your card is detected: lspci | grep Mellanox  if you get no output then just lspci and look for your card. Make sure it is there or it wont work.

Fourth: nano /etc/modules and add two lines (must be on separate lines, no quotes) "mlx4_ib" and "ib_ipoib" then exit using ctrl-X, y, enter and it is saved and your at the prompt again. Type exit to get back to the information console, then in options select reboot, arrow left once and enter. On reboot your Infiniband ports will be available for cluster use in the web tools. Add each node the same way, you must enable Infiniband and reboot before trying to add the node to the cluster.

This is using the low latency, high bandwidth Infiniband fabric to act like an Ethernet fabric but instead of ~1ms between nodes that rises to ~200/~400ms, you will see the Infiniband speeds of 0.06ms or less that rises to ~10/20ms and have 40Gbps of (cables and switches dependent, real bandwidth after overhead is 8.0Gbps/lane, down from the raw speed of 10Gbps/lane, see https://en.wikipedia.org/wiki/InfiniBand for more information.)

Errata (you can skip this unless you feel you need to know some pointless information):

My test system: Node1 is a HP xw9400 mainboard with two Opteron 22xx CPUs and 16GB ram each, an 74GB SATA 7200 16MB cache HDD for server boot and 4 1TB SATA 7200 64MB cache HDD for OSDs, No journal device installed. Node 2 and 3 are Sunfire X4100M2 servers with dual Opteron 22xx and 32GB total ram, a pair of 74GB SAS 10k drives in raid1(no raid pass through and raid0 requires 2 drives on these servers) for boot HDD and an LSI SAS-HBA connected to a Rackable Systems JBOD enclosure with 4 1TB SATA 7200 64MB cache HDD for OSDs. All nodes have a Mellanox  MT26428 Rev B card. All nodes use one 1Gbps Ethernet port for management, all iSCSI and Backend networks are over the Infiniband ports. No bonding was used due to issues with PetaSAN nodes joining the bonded Master node. This mess of spare parts was a proof of concept that could not be done in a set of VMs. This also proves that PetaSAN can run on almost any junk, yes I will be testing this setup for some time to see what kind of performance I can get and any issues that come up. Subject for another post.

ConnectX-1 cards are not supported with this version of the driver. They show up and initialize but do not configure properly. This is because the kernel driver does not support these older cards anymore. Ubuntu 15.x does support the older ConnectX-1 cards.

Unable to test ConnectX-3 cards as I do not have enough of them yet. I am willing to continue testing and even adding further support once I have enough cards to convert my entire rack to the same card. Donations welcome.

ConnectX-2 EN cards do not work with Infiniband fabric, even when properly flashed with the VPI firmware. There seems to be a difference in the chips that though they can be flashed to the VPI firmware, they do not join the fabric.

ConnectX-2 IB cards do work but have issues with this driver. They do work in Debian with the full MLNX_OFED package compiled and installed, so I am assuming that this is a installed driver issue.

Installing the MLNX_OFED package does work to a point, the various steps cause some of the custom code to be overwritten with "updated" versions and this breaks PetaSAN. Simply running the required "apt-get update && apt-get -f install" causes the majority of issues and missing kernel headers and improperly installed (though in a nice deb package) headers and after the MLNX_OFED package downloads the remainder dependencies, compile the drivers and installed PetaSAN is unable to control the node properly forcing you to reinstall. (I had to wipe the drives partition table to get PetaSAN to install a second time, this could be because of old partition table data still in the table, after this first clearing PetaSAN found the previous installs and reinstalled over top without issues.)

I have killed over 60 PetaSAN installs over the last week of working on this, mostly because of the minor differences in Ubuntu from its upstream Debian (not an Ubuntu user, but do use Debian a lot, it is my preferred flavor). And the fact that when I searched the available kernel modules, I could not find the mlx4 module so I went about this from the wrong end first. Once I stumbled on the fact that Ubuntu had a kernel module in the base install, everything just worked. Meanwhile, I learned a lot about how PetaSAN actually works and how it interconnects various subsystems. Still not enough to save these installs due to ignorance. I expect to have a petition sent against me for the amount of electrons I horribly inconvenienced and the unnecessary wear and tear of the hard drives and USB sticks involved.

Many thanks for this info. it is nice to know things worked in the end 🙂  this will be great for anyone thinking to use infiniband.

It is not the best way to Infiniband but it does work. The best way would be to use SDP/SRP/RDMA but the included driver and support tools do not support this and compiling this set of drivers and tools on PetaSAN breaks the PetaSAN system.

Still working on it though, probably going to need to build a deb package with all the dependencies included so that the required apt-get requests do not have to be there. But this also limits whom can use the packages I create as I do not have any available Xeon systems to build on.

Upgraded my testing cluster to 2.0 and found an issue with the auto-updater. It does not handle infiniband properly. But on the bright side a clean install of 2.0 and following the same steps as for 1.5.0 and everything works like it should. No change on the compiling of the ofed drivers issue.

 

On a side note, my hardware which I had to compile drivers for version 1.5 0 seem to work properly in 2.0 so there's a bonus 🙂

Yes it will work since we have not changed kernels between 1.5 and 2.0

While updating, the auto installer will erase everything except for configuration and data directories., so it is likely your changes were erased and you had to re-do them. In your case it is probably better to use the update package and follow the steps in the update guide, this does live upgrade  for specific packages, so it will not overwrite your changes.

UPDATE:

Over 18months later and my pile-o-parts test cluster is still working near flawlessly (loosing osd's due to service life limitations)

Been using this cluster in a semi production environment and yes the test hardware is still the same junk that should have been recycled into paperclips. This cluster serves three BL465cg7 and two g8 servers out of 22 blades total. Average workload it 12.5k-iops continuous over the mentioned infiniband network. Still no journal osd in any node, though we did limit the nodes to 1 osd per gb of ram minus 3gb ram for the os/network buffer. So a node with two 6core cpus and 24gb ram can handle 21 osds but we limited a node so configured to 20 osds. That 1gb of ram breathing room has proven important.

We are looking at rebuilding our production cluster to accommodate the different options that we have been exploring/destroying in our test cluster. Because the updater does not like infiniband, a clean upgrade to 2.3.1 is planned once we are finished commissioning 2.3.1's viability. 2.4.0 looks like it will have some nice features too. But now that we are tied to infiniband for our cluster network, we must make sure it stays working and without issues. Starting in October 2019, we will be building the new 2.3.1 cluster and commissioning the newer connectx-3-vpi cards. A requirement due to the newer kernels not carrying native support for the older and known to work cards.

These are simplified instructions to get Mellanox cards working in PetaSAN.
This works for both ConnectX cards and the older Inifihost III cards.

First I want to say thank you to the PetaSAN guys! Great work!

Thank god that PetaSAN 2.3.1 is based on Ubuntu 18.04 LTS as it makes doing this much easier.

Install PetaSAN 2.3.1
Login to web interface at http://ip:5001
Complete Step 1 and stop (this will set the password so you can login as root)
ssh onto node and use root and the password you specified above
apt-get update
lspci | grep Mell - confirm that the OS sees your card

nano /etc/modules

It should look like the example below:

# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.

ib_mthca
ib_umad
ib_ipoib
The ib_mthca module is for older Infinihost III cards (MT25208)
change ib_mthca to mlx4_ib if using a newer Connect X card

apt-get install opensm
reboot

Now restart the configuration process by going to http://ip:5001 and this time, the IB
interfaces should be available for use.

That's it!

Jim

Thanks Jim,

I can confirm that in 2.3.1 this is all that is needed, though you dont need opensm if your switch alread runs subnet manager.

You can also set the root password from the console after the install does its reboot and at the same time run the appropriate commands to enable infiniband in PetaSAN.