Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

CIFS services not up following cluster powerdown.

Hello, I'm really enjoying PetaSAN and digging into learning CEPH through its more approachable interface. I've setup an 8 node cluster with each role MGMT/iSCSI/NFS/CIFS spread out across 3 of the 8, with some obvious overlaps. So far it's been working great, but last night I had an unexpected power outage. Due to very little battery remaining on the UPS I decided to initiate shutdowns on all nodes via a power button press. Post power restore, the the cluster was brought back up and returned to all green within minutes. The one exception is the CIFS service which remains unavailable. The CIFS Status page display the red banner with "Cannot get CIFS Status." and any attempt to add a CIFS share displays the banner "CIFS services not up."

The nodes which run the CIFS role repeat two events over and over in their logs: "ERROR WatchBase Exception :" and "INFO CIFSService key change action." The petasan-cifs service shows as running on the CIFS nodes, and I've placed the cluster in maintenance and cleanly rebooted each node one at a time. Where else can I begin troubleshooting this to restore the service?

On CIFS node, what is the output of

ceph status
ceph fs status
mount | grep mnt
mount | grep shared
ctdb status

ceph status

cluster:
id: 987c9aea-fc2f-4a20-88a8-ac7bf626e7e9
health: HEALTH_OK

services:
mon: 3 daemons, quorum ceph-8,ceph-1,ceph-4 (age 11h)
mgr: ceph-8(active, since 11h), standbys: ceph-4, ceph-1
mds: cephfs:1 {0=ceph-4=up:active} 2 up:standby
osd: 58 osds: 58 up (since 109m), 58 in (since 6d)

task status:
scrub status:
mds.ceph-4: idle

data:
pools: 7 pools, 576 pgs
objects: 423.80k objects, 1.5 TiB
usage: 4.4 TiB used, 159 TiB / 164 TiB avail
pgs: 576 active+clean

io:
client: 19 KiB/s rd, 3.8 KiB/s wr, 11 op/s rd, 3 op/s wr

ceph fs status

cephfs - 39 clients
======
+------+--------+--------+---------------+-------+-------+
| Rank | State  |  MDS   |    Activity   |  dns  |  inos |
+------+--------+--------+---------------+-------+-------+
|  0   | active | ceph-4 | Reqs:    0 /s | 19.8k | 13.5k |
+------+--------+--------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   |  used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata |  727M | 10.6T |
|   cephfs_root   |   data   | 12.1k | 10.6T |
|  cephfs_ec_hdd  |   data   |  247M | 76.9T |
|  cephfs_ec_ssd  |   data   | 2368G | 21.3T |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|    ceph-1   |
|    ceph-8   |
+-------------+
MDS version: ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)

mount | grep mnt

10.1.110.81,10.1.110.84,10.1.110.88:/ on /mnt/cephfs type ceph (rw,relatime,name=admin,secret=<hidden>,acl,mds_namespace=cephfs)

mount | grep shared

10.1.110.81:gfs-vol on /opt/petasan/config/shared type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

ctdb status

connect() failed, errno=2
Failed to connect to CTDB daemon (/var/run/ctdb/ctdbd.socket)

on 1 of the CIFS nodes :

systemctl stop petasan-cifs
systemctl start ctdb

wait 1 min then

systemctl status smbd
systemctl status ctdb
ctdb status

if you get error on screen, what error do you get ?
you can get more logs from:
/var/log/samba/log.ctdb
/var/log/samba/log.smbd

Thanks Admin!

That was the nudge I needed, I found the ctdb service wasn't running, so I started it which allowed the CIFS Status page to show "down" on all three nodes. Once I reapplied the CIFS settings from the Configuration section all nodes came online and started serving. now ctdb status displays:

Number of nodes:3
pnn:0 10.1.110.82 OK (THIS NODE)
pnn:1 10.1.110.85 OK
pnn:2 10.1.110.87 OK
Generation:63243887
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:2

@admin How do i avoid this as i notice this occurs every time there is a power down. I am doing this in homelab and testing envirnoment and this is definitely not ideal. Technically it should auto connect by itself. Is there anything i can do to resolve this? No worries if it is too much for you guys. But thanks a lot if there is any suggestions!!!

not really sure since we do not see this. it could be related to your environment, if you can test in a different setup it would be great.. Else make sure that ceph/cephfs have no issues when you restart and if all ok, look at the ctdb/samba logs.

The suggested code you provide definitely work just that every time a reboot occur I need to run code below. Just wondering is this because ISCSI, CIFS, NFS, S3 are all on the same subnet? Should it split the subnet? The backend and management subnet are on its unique/own subnets. Total I have 3 subnet.

systemctl stop petasan-cifs
systemctl start ctdb