Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Help start troubleshooting with iSCSI / ESXi

Hello,
we have a 3-nodes PetaSAN 3.1.0 system. It offers storage space to a 3-nodes ESXi system, with the storage mounted via iSCSI.

One repeating issue we detected is that (apparently exactly every month, i.e. every 30 days) the access from one specific ESXi node is kind of stuck.
The only way to quickly resolve the issue is to restart the ESXi server.

I won't ask for a detailed troubleshooting, but I'd like some suggestions in which direction should I investigate in order to find out the cause of this issue?
At least from the metrics shown in the PetasSAN dashboard, I can't spot any clear problem (there an unusual peak in the commit time for some of OSDs around the time when the storage access is very slow, but I'd think it's more the effect of the outstanding access requests than the cause of it)

The periodicity of the issue looks also suspicious: does PetaSAN has some kind of monthly scrub/cleaning job that might interfere with disk access?

Thanks

We have a large number ESXi installations running for years with no issues. If you need to restart the ESXi side not PetaSAN side, i would think the PetaSAN side is working as expected. We do not have a monthly scheduled process.

In dashboard Node Statistics, do you see high disk % busy during the time of problem? If so it could be high load compared to available hardware. What type of disk setup do you have:  ssd (model, enterprise/consumer type), pure hdd, hdd with journal ? How many total disks ? In Cluster Statistics what is the cluster Throughput and IOPS load during that time ?  Did you set your scrub speed and.or backfill speed too high from the UI Maintenance page ? Generally ESXi is less forgiving to i/o delay (letancy) than other clients like Windows or Linux and may stopped the datastore if the delay is too high, if you do not have sufficient hardware, you may run into issues.