Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Monitoring.

Pages: 1 2

is this happening all  the time ? is it consistent when joining from a specific node ?

Well it keeps happening, but I noticed when I add the backend IP's it takes sometime for it to reflect and that is why it fails, If I wait for like 1 min or so it takes it.

https://ibb.co/pXRG8GY

Now another issue is and I keep noticing this, is adding the disks in the backend always gives an error running the dd command.

https://ibb.co/NF6RY66

The network delay does not sound normal. Even if we did bypass the connection check, the node will fail during deployment since it cannot connect. Waiting a minute may be masking a real networking issue, for example it may show a problem during path failover or node boot. Also even after the ping responded, the latencies are higher than normal.

For the disk issue, if the disk gets added correctly as an OSD, i would ignore this, the dd command is known to output warning messages as errors + the disk is be-ing wiped out several times over and the os may be updating partition table info while the dd is running. So if it adds fine then just ignore this message.

 

We only noticed that with the new 2.2, the old version we have installed it many times for our POC on the same servers and it works fine.

Now I managed to add the nodes, however I randomly get OSD's dropping and I cannot re add them, we will install CentOS or other server and just wipe out all the disks again and try the installation. hoping it will fix this issue.

 

Dear PetaSAN user;

Cluster has one or more osd failures, please check the following osd(s):

- osd.20/srocceph2

Since pings also have a problem, this is a low level network issue: it is either something with the networking : nics/cables/switches/switch setup...or less likely, if it only happens with 2.2 then maybe the nic driver in the new kernel ( v 4.12 based on SUSE SLE 15)  driver for the nic.

Can you try the setup using a different switch that is isolated from your production and just connect the PetaSAN nodes to it, check cables, make sure the switch connected to something else then do a fresh 2.2 setup with no jumbo frames/bonding/vlans and see if you have issues. If you still do, do you see an errors in the kernel log via dmesg ?

Sure thing I would like to start with the networking Layer but for us to be 100% sure.

 

I will need the steps from you to clean the cluster again - so I can reproduce the issue on the old and new switch please.

 

Thank you

I am not sure what you mean by clean the cluster, if you want you can start from scratch you can re-install and select option to install a new cluster. Else if you want to retain the cluster but wipe the data, you can just delete the pools and add new pools.

Pages: 1 2