Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

NFS issues (closed port, restart ganesha container) after update from 3.0.2 to 3.0.3

Pages: 1 2

did you post the PetaSAN.log file ?

Yes, last link to pastebin 🙂

is this the syslog or the PetaSAN.log ? i would like the later

also it is not clear what is the result of this single node server: is the server up but ips come on/off ? does the grace status show up ?

Sorry, my mistake.

Summary:

All nodes service off:
status -> Down, assigned ip -> blank

One node service start:
Status Up (no grace period), IP assigned.
Can ping this IP, assigned from nfs pool.
NFS port closed.

It look like podman / docker / container won't start NFS service.

Logs from PetaSAN.log below, corelated with logs from syslog, from pastebin.

27/05/2022 16:23:41 INFO     NFSServer : clean all local resources
27/05/2022 16:23:41 INFO     NFSServer : clean local resource : NFS-172-30-0-172
27/05/2022 16:23:41 INFO     NFSServer : clean local resource : 1) Stopping NFS Exports service of resource : NFS-172-30-0-172
27/05/2022 16:23:42 INFO     Stopping NFS Exports Service
27/05/2022 16:23:42 INFO     NFSServer : clean local resource : 2) Delete ip address of resource : NFS-172-30-0-172
27/05/2022 16:23:42 INFO     NFSServer : clean local resource : 3) Stop and delete container of resource : NFS-172-30-0-172
27/05/2022 16:23:43 INFO     LockBase : unlock_all_consul_resources : Unlock key of resource : NFS-172-30-0-172
27/05/2022 16:23:43 INFO     NFSServer : sync Consul settings
27/05/2022 16:23:43 INFO     NFSServer : sync Consul settings -> done
27/05/2022 16:23:54 INFO     Starting NFSServer Service.
27/05/2022 16:23:58 INFO     Container Manager : deleting all old containers
27/05/2022 16:23:58 INFO     Container Manager : loading /opt/petasan/container-images/petasan-nfs-ganesha-3.2.0.tar.gz image into podman
27/05/2022 16:24:10 INFO     LockBase : Dropping old sessions
27/05/2022 16:24:20 INFO     LockBase : Successfully dropped old sessions
27/05/2022 16:24:20 INFO     Clean all old local resources.
27/05/2022 16:24:22 INFO     NFSServer : sync Consul settings
27/05/2022 16:24:22 INFO     NFSServer : sync Consul settings -> done
27/05/2022 16:24:23 INFO     LockBase : Try to acquire the resource = NFS-172-30-0-172.
27/05/2022 16:24:23 INFO     LockBase : Succeeded on acquiring the resource = NFS-172-30-0-172
27/05/2022 16:24:24 INFO     NFSServer : waiting for the container NFS-172-30-0-172 to be up.
27/05/2022 16:24:24 INFO     Starting NFS Exports Service
27/05/2022 16:24:24 INFO     Container Manager : creating NFS-172-30-0-172 container
27/05/2022 16:24:27 INFO     NFSServer : sync Consul settings
27/05/2022 16:24:27 INFO     NFSServer : sync Consul settings -> done
27/05/2022 16:25:40 INFO     Stopping NFS Exports Service
27/05/2022 16:26:48 INFO     NFSServer : sync Consul settings
27/05/2022 16:26:48 INFO     NFSServer : sync Consul settings -> done
27/05/2022 16:26:48 INFO     LockBase : Try to acquire the resource = NFS-172-30-0-172.
27/05/2022 16:26:49 INFO     LockBase : Succeeded on acquiring the resource = NFS-172-30-0-172
27/05/2022 16:26:50 INFO     NFSServer : waiting for the container NFS-172-30-0-172 to be up.
27/05/2022 16:26:50 INFO     Starting NFS Exports Service
27/05/2022 16:26:51 INFO     Container Manager : creating NFS-172-30-0-172 container
27/05/2022 16:26:53 INFO     NFSServer : sync Consul settings
27/05/2022 16:26:53 INFO     NFSServer : sync Consul settings -> done
27/05/2022 16:28:05 INFO     Stopping NFS Exports Service
27/05/2022 16:29:12 INFO     NFSServer : sync Consul settings
27/05/2022 16:29:12 INFO     NFSServer : sync Consul settings -> done
27/05/2022 16:29:12 INFO     LockBase : Try to acquire the resource = NFS-172-30-0-172.
27/05/2022 16:29:14 INFO     LockBase : Succeeded on acquiring the resource = NFS-172-30-0-172
27/05/2022 16:29:14 INFO     NFSServer : waiting for the container NFS-172-30-0-172 to be up.
27/05/2022 16:29:15 INFO     Starting NFS Exports Service
27/05/2022 16:29:15 INFO     Container Manager : creating NFS-172-30-0-172 container
27/05/2022 16:29:19 INFO     NFSServer : sync Consul settings
27/05/2022 16:29:19 INFO     NFSServer : sync Consul settings -> done

The PetaSAN nfs service logs looks normal. The ssh security issue from syslog posted previously look strange. Note that we do test NFS connectivity during upgrades,  i do not believe the issue is directly related to upgrades

1)  Have you done any customisation to the system ? any added packages ?

2) Did you enable Restricted Access for the export in the UI ? If so can try with this disabled

3) On the node with service, can you test connecting to the service

example
mkdir -p /mnt/export1
mount -t nfs -o nfsvers=4.1,proto=tcp 10.0.3.100:/export1 /mnt/export1/
echo "Hello" > /mnt/export1/hi.log
cat /mnt/cephfs/nfs/export1/hi.log
where 10.0.3.100 is one of the NFS public ips
4) On same node, check containers are running:
podman ps -a
do you see the NFS containers running ?
do they map port 2049 from host ip, such as:
10.0.3.100:2049->2049/tcp
5) Are the NFS public ips listening on port 2049 listening ?

netstat -nl | grep 2049
6)
Can you grab the ganesha confi file and the log file from container
example:
podman cp NFS-10-0-3-100:/etc/ganesha/ganesha.conf .
podman cp NFS-10-0-3-100:/var/log/ganesha/ganesha.log .

 

Hi,

 

It worked, NFS is alive.

The logs from inside the container helped. What a pity they are not redirected to PetaSAN.log 🙁
Would have saved 3 days of searching. (I know, I could have looked there myself, but I didn't know where the logs were yet)

Cause:
When the system was in R/O mode after the upgrade, I added another IP to 'Read/Write Clients'. I used the wrong format.
It should be: 10.0.0.1, 10.0.0.2 and I gave 10.0.0.1; 10.0.0.2; (semicolon instead of comma).

Since I already have the answer ready, I paste it, because maybe someone will look through the logs:

1 No, there are no changes, other than installing the 'htop' package, 'glances'. (I just uninstalled glances for testing)

2. yes, I have it enabled, disabling it didn't help. (I leave restriction disabled for testing)

3. i am getting 'connection refused' already at mount stage, i added -vv.

4. there is a conterner that lives a minute and is created again.

5. yes, as long as conterner lives that minute 😉

Thank You for Your help 🙂

Pages: 1 2