Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Benchmark fails with "Error loading benchmark test report" on a new petasan 2.0.0 cluster

Pages: 1 2

Hi there Admin,

Cluster reinstalled used "mid-range" this time. All settings double-verified. Sadly, same results:

  • Benchmark reports fails.
  • Path Assignment List not showing any path on the hosts.

The errors on the logs are exactly the same. The iscsi disk was created 300GB with two paths, using IQN based access.

  1. Is the working lab system the exact same configuration, if not what is different ?
  2. Are your iSCSI disk started ? Can you see them in the iSCSI Disk List ? From the iSCSI Disk List if you click on active paths, are the paths assigned to nodes ? do they remain on nodes or do they frequently switch nodes ?
  3. For each node: if you go to the Node List -> Physical Disks: Does it correctly show the disks OSDs as up  ?
  4. Do the iSCSI disks function correctly: can you connect from client initiators and perform io ? do you see about the same performance as your lab system as seen by client io ?
  5. Do you think all is working apart from the benchmark and path re-assignment page or is the system not stable ?
  6. Can all nodes ping each other using their node names ?
  7. Can you go to the path re-assign page, where it lists 0 paths while you know you have running paths, do a couple of page refreshes then send us the PetaSAN logs on all machines, you can email them to contact-us at petasan.org
  8. Please also send the /etc/hosts and output of ceph osd tree --cluster xx
  1. Is the working lab system the exact same configuration, if not what is different ?: In networking configuration yes, in hardware not exactly. The LAB is virtualized and is not the first one. We have been mounting virtualized labs on both OpenStack and virtualbox since 1.5.0. With petasan 2.0.0 this is our fourth LAB. All of them worked flawlessly. That's why we finally decided to purchase the servers for Petasan and go full-production. We tested many different configs including spinning + journal and all ssd setups. All of them worked in all scenarios including MPIO.
  2. Are your iSCSI disk started ? Can you see them in the iSCSI Disk List ? From the iSCSI Disk List if you click on active paths, are the paths assigned to nodes ? do they remain on nodes or do they frequently switch nodes ?. Yup: iscsi disk OK, and they remain on nodes. No switching unless we intentionally reboot a server (one of our many tests). Tested with more than one disk before reinstalling. Also, this is the third reinstall. In all tries the iscsi part worked flawlessly but the same two issues are always there: The benchmark report and the path reassignment.
  3. For each node: if you go to the Node List -> Physical Disks: Does it correctly show the disks OSDs as up ? Yup. All four on each node and the O/S disk.
  4. Do the iSCSI disks function correctly: can you connect from client initiators and perform io ? do you see about the same performance as your lab system as seen by client io ? They connect OK to the vmware platform but we suspended all performance tests until we sort out the issues we are having.
  5. Do you think all is working apart from the benchmark and path re-assignment page or is the system not stable ? The system is usable from a "ceph" perspective but without path reassigment we can't go full production. We need iscsi for the vmware platform (not ceph). Nevertheless, I've been working with ceph and openstack many years and I can tell you: Your ceph implementation is quite good. Also the way you use consul, gluster and LIO is very smart. Whatever is happening I'm beginning to think that python is "exploding" somewhere.
  6. Can all nodes ping each other using their node names ? Yup. Also checked /etc/hosts and all 3 servers with their FQDN's are there.
  7. Can you go to the path re-assign page, where it lists 0 paths while you know you have running paths, do a couple of page refreshes then send us the PetaSAN logs on all machines, you can email them to contact-us at petasan.org. Sure thing I'll do at once. I'll use my corporate email on UBXCloud.com to contact you. I was in contact short time ago with one of you team members about this project and the commercial support option.
  8. Please also send the /etc/hosts and output of ceph osd tree --cluster xx. Sure thing. Anything you need. We are taking this project very very seriouslly and plan to adquire full commercial support once we are up and running so whatever info you need please tell me!

I'll send you all the information you requested and the general hardware setup.

One last thing. One of my workmates asked me to start two disks, both 300 gb in order to run a full performance test (two paths each). I'll post the results here

For all nodes, do you get charts for disk  throughput/iops/%utilization ?

Yup. That's working fine. Also one of our test was to ensure the reporting services and ceph manager (ports 3000, 200x, 700x) moved from node to node after a node reboot. We did that test on LAB in order to ensure that the petasan cluster part was working OK and reproduced it on the production machines. After an intentional reboot those services go to any of the surviving nodes. All metrics are working (cpu, iops, etc.).

Hi there again Admin!

I've just sent the mail from my "@ubxcloud.com" email with the information you asked and some other relevant things. I hope this help to sort all issues. Anything else you need, please contact me at any time!

Hi there Forum,

With the invaluable help of Petasan staff we could work out and finally solve both issues (the benchmark and the path reassignment). When we deployed our nodes we used full FQDN's on the node names (xxxx.domain.tld) instead of short names (just xxxx). It happens that this breaks thinks on Petasan so as a warning to anyone experiencing the same issues, use short hostnames (the "hostname -s" part of your server name) instead of full FQDN's for node names at install time.

The moral of the history: Use short hostnames without any domain or "." (dots) on your node names.

Thanks @petasan staff for all your help!. Our cluster is running perfectly and ready for production loads!

Pages: 1 2