Forums - PetaSAN

ForumGeneral DiscussionCIFS services not up following cl …
You need to log in to create posts and topics. Login · Register
CIFS services not up following cluster powerdown.

sparxalt
3 Posts

December 30, 2020, 2:50 pm
Quote from sparxalt on December 30, 2020, 2:50 pm
Hello, I'm really enjoying PetaSAN and digging into learning CEPH through its more approachable interface. I've setup an 8 node cluster with each role MGMT/iSCSI/NFS/CIFS spread out across 3 of the 8, with some obvious overlaps. So far it's been working great, but last night I had an unexpected power outage. Due to very little battery remaining on the UPS I decided to initiate shutdowns on all nodes via a power button press. Post power restore, the the cluster was brought back up and returned to all green within minutes. The one exception is the CIFS service which remains unavailable. The CIFS Status page display the red banner with "Cannot get CIFS Status." and any attempt to add a CIFS share displays the banner "CIFS services not up."

The nodes which run the CIFS role repeat two events over and over in their logs: "ERROR WatchBase Exception :" and "INFO CIFSService key change action." The petasan-cifs service shows as running on the CIFS nodes, and I've placed the cluster in maintenance and cleanly rebooted each node one at a time. Where else can I begin troubleshooting this to restore the service?

Hello, I'm really enjoying PetaSAN and digging into learning CEPH through its more approachable interface. I've setup an 8 node cluster with each role MGMT/iSCSI/NFS/CIFS spread out across 3 of the 8, with some obvious overlaps. So far it's been working great, but last night I had an unexpected power outage. Due to very little battery remaining on the UPS I decided to initiate shutdowns on all nodes via a power button press. Post power restore, the the cluster was brought back up and returned to all green within minutes. The one exception is the CIFS service which remains unavailable. The CIFS Status page display the red banner with "Cannot get CIFS Status." and any attempt to add a CIFS share displays the banner "CIFS services not up."

The nodes which run the CIFS role repeat two events over and over in their logs: "ERROR WatchBase Exception :" and "INFO CIFSService key change action." The petasan-cifs service shows as running on the CIFS nodes, and I've placed the cluster in maintenance and cleanly rebooted each node one at a time. Where else can I begin troubleshooting this to restore the service?

#1

admin
2,959 Posts

December 30, 2020, 3:31 pm
Quote from admin on December 30, 2020, 3:31 pm
On CIFS node, what is the output of

ceph status
ceph fs status
mount | grep mnt
mount | grep shared
ctdb status

On CIFS node, what is the output of

ceph status
ceph fs status
mount | grep mnt
mount | grep shared
ctdb status

#2

sparxalt
3 Posts

December 30, 2020, 3:54 pm
Quote from sparxalt on December 30, 2020, 3:54 pm
ceph status

cluster:
id: 987c9aea-fc2f-4a20-88a8-ac7bf626e7e9
health: HEALTH_OK

services:
mon: 3 daemons, quorum ceph-8,ceph-1,ceph-4 (age 11h)
mgr: ceph-8(active, since 11h), standbys: ceph-4, ceph-1
mds: cephfs:1 {0=ceph-4=up:active} 2 up:standby
osd: 58 osds: 58 up (since 109m), 58 in (since 6d)

task status:
scrub status:
mds.ceph-4: idle

data:
pools: 7 pools, 576 pgs
objects: 423.80k objects, 1.5 TiB
usage: 4.4 TiB used, 159 TiB / 164 TiB avail
pgs: 576 active+clean

io:
client: 19 KiB/s rd, 3.8 KiB/s wr, 11 op/s rd, 3 op/s wr

ceph fs status

cephfs - 39 clients
======
+------+--------+--------+---------------+-------+-------+
| Rank | State | MDS   |    Activity   | dns | inos |
+------+--------+--------+---------------+-------+-------+
| 0   | active | ceph-4 | Reqs:    0 /s | 19.8k | 13.5k |
+------+--------+--------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   | used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 727M | 10.6T |
|   cephfs_root   |   data   | 12.1k | 10.6T |
| cephfs_ec_hdd |   data   | 247M | 76.9T |
| cephfs_ec_ssd |   data   | 2368G | 21.3T |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|    ceph-1   |
|    ceph-8   |
+-------------+
MDS version: ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)

mount | grep mnt

10.1.110.81,10.1.110.84,10.1.110.88:/ on /mnt/cephfs type ceph (rw,relatime,name=admin,secret=<hidden>,acl,mds_namespace=cephfs)

mount | grep shared

10.1.110.81:gfs-vol on /opt/petasan/config/shared type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

ctdb status

connect() failed, errno=2
Failed to connect to CTDB daemon (/var/run/ctdb/ctdbd.socket)

ceph status

cluster:
id: 987c9aea-fc2f-4a20-88a8-ac7bf626e7e9
health: HEALTH_OK

services:
mon: 3 daemons, quorum ceph-8,ceph-1,ceph-4 (age 11h)
mgr: ceph-8(active, since 11h), standbys: ceph-4, ceph-1
mds: cephfs:1 {0=ceph-4=up:active} 2 up:standby
osd: 58 osds: 58 up (since 109m), 58 in (since 6d)

task status:
scrub status:
mds.ceph-4: idle

data:
pools: 7 pools, 576 pgs
objects: 423.80k objects, 1.5 TiB
usage: 4.4 TiB used, 159 TiB / 164 TiB avail
pgs: 576 active+clean

io:
client: 19 KiB/s rd, 3.8 KiB/s wr, 11 op/s rd, 3 op/s wr

ceph fs status

cephfs - 39 clients
======
+------+--------+--------+---------------+-------+-------+
| Rank | State | MDS   |    Activity   | dns | inos |
+------+--------+--------+---------------+-------+-------+
| 0   | active | ceph-4 | Reqs:    0 /s | 19.8k | 13.5k |
+------+--------+--------+---------------+-------+-------+
+-----------------+----------+-------+-------+
|       Pool      |   type   | used | avail |
+-----------------+----------+-------+-------+
| cephfs_metadata | metadata | 727M | 10.6T |
|   cephfs_root   |   data   | 12.1k | 10.6T |
| cephfs_ec_hdd |   data   | 247M | 76.9T |
| cephfs_ec_ssd |   data   | 2368G | 21.3T |
+-----------------+----------+-------+-------+
+-------------+
| Standby MDS |
+-------------+
|    ceph-1   |
|    ceph-8   |
+-------------+
MDS version: ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)

mount | grep mnt

10.1.110.81,10.1.110.84,10.1.110.88:/ on /mnt/cephfs type ceph (rw,relatime,name=admin,secret=<hidden>,acl,mds_namespace=cephfs)

mount | grep shared

10.1.110.81:gfs-vol on /opt/petasan/config/shared type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)

ctdb status

connect() failed, errno=2
Failed to connect to CTDB daemon (/var/run/ctdb/ctdbd.socket)

#3

admin
2,959 Posts

December 30, 2020, 4:22 pm
Quote from admin on December 30, 2020, 4:22 pm
on 1 of the CIFS nodes :

systemctl stop petasan-cifs
systemctl start ctdb

wait 1 min then

systemctl status smbd
systemctl status ctdb
ctdb status

if you get error on screen, what error do you get ?
you can get more logs from:
/var/log/samba/log.ctdb
/var/log/samba/log.smbd

on 1 of the CIFS nodes :

systemctl stop petasan-cifs
systemctl start ctdb

wait 1 min then

systemctl status smbd
systemctl status ctdb
ctdb status

if you get error on screen, what error do you get ?
you can get more logs from:
/var/log/samba/log.ctdb
/var/log/samba/log.smbd

#4

sparxalt
3 Posts

December 30, 2020, 4:26 pm
Quote from sparxalt on December 30, 2020, 4:26 pm
Thanks Admin!

That was the nudge I needed, I found the ctdb service wasn't running, so I started it which allowed the CIFS Status page to show "down" on all three nodes. Once I reapplied the CIFS settings from the Configuration section all nodes came online and started serving. now ctdb status displays:

Number of nodes:3
pnn:0 10.1.110.82 OK (THIS NODE)
pnn:1 10.1.110.85 OK
pnn:2 10.1.110.87 OK
Generation:63243887
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:2

Thanks Admin!

That was the nudge I needed, I found the ctdb service wasn't running, so I started it which allowed the CIFS Status page to show "down" on all three nodes. Once I reapplied the CIFS settings from the Configuration section all nodes came online and started serving. now ctdb status displays:

Number of nodes:3
pnn:0 10.1.110.82 OK (THIS NODE)
pnn:1 10.1.110.85 OK
pnn:2 10.1.110.87 OK
Generation:63243887
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:2
Recovery mode:NORMAL (0)
Recovery master:2

#5

the.only.chaos.lucifer
31 Posts

December 21, 2023, 8:33 am
Quote from the.only.chaos.lucifer on December 21, 2023, 8:33 am
@admin How do i avoid this as i notice this occurs every time there is a power down. I am doing this in homelab and testing envirnoment and this is definitely not ideal. Technically it should auto connect by itself. Is there anything i can do to resolve this? No worries if it is too much for you guys. But thanks a lot if there is any suggestions!!!

@admin How do i avoid this as i notice this occurs every time there is a power down. I am doing this in homelab and testing envirnoment and this is definitely not ideal. Technically it should auto connect by itself. Is there anything i can do to resolve this? No worries if it is too much for you guys. But thanks a lot if there is any suggestions!!!

Last edited on December 21, 2023, 8:34 am by the.only.chaos.lucifer · #6

admin
2,959 Posts

December 21, 2023, 3:58 pm
Quote from admin on December 21, 2023, 3:58 pm
not really sure since we do not see this. it could be related to your environment, if you can test in a different setup it would be great.. Else make sure that ceph/cephfs have no issues when you restart and if all ok, look at the ctdb/samba logs.

not really sure since we do not see this. it could be related to your environment, if you can test in a different setup it would be great.. Else make sure that ceph/cephfs have no issues when you restart and if all ok, look at the ctdb/samba logs.

#7

the.only.chaos.lucifer
31 Posts

December 22, 2023, 6:36 pm
Quote from the.only.chaos.lucifer on December 22, 2023, 6:36 pm
The suggested code you provide definitely work just that every time a reboot occur I need to run code below. Just wondering is this because ISCSI, CIFS, NFS, S3 are all on the same subnet? Should it split the subnet? The backend and management subnet are on its unique/own subnets. Total I have 3 subnet.

systemctl stop petasan-cifs
systemctl start ctdb

The suggested code you provide definitely work just that every time a reboot occur I need to run code below. Just wondering is this because ISCSI, CIFS, NFS, S3 are all on the same subnet? Should it split the subnet? The backend and management subnet are on its unique/own subnets. Total I have 3 subnet.

systemctl stop petasan-cifs
systemctl start ctdb

Last edited on December 23, 2023, 9:06 am by the.only.chaos.lucifer · #8

Post Reply: CIFS services not up following cluster powerdown.

Cancel