Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

How to add mds service

Hi,

we've expanded our cluster with some additional 16TB disks and we wanted to create another EC cephfs pool but I got a warning that there are not enough mds servers. Before there were 3 installled (still are), 1 active/2 standby, now I've stopped the new pool but there are still 2 active+1 standby. At the time all 3 were in active state..

I've tried to restart the installation on one of the new (and unused) nodes to add MDS service but the mgmt services option was grayed out. I didn't destroy the node beforehand, just restarted the installation on previously joined node (set noout+norebalance first).. Is there a way to add MDS service to an already installed node somehow?

Thanks..

Jure

Having 1 active and 2 standby is correct. Is your existing cephfs working now ? what is exact warning you get ?

That was until I added another cephs pool. The original one is REP3 and the new one is supposed to be an EC low-performance/big-capacity for testing first.. Our existing one is working ok, I stopped the new one for the moment. Currently I have:

# ceph fs status

cephfs - 32 clients

======

RANK  STATE    MDS       ACTIVITY     DNS    INOS   DIRS   CAPS 

0    active  CEPH02  Reqs:    0 /s  1504k  1496k  4124   4838   

1    active  CEPH01  Reqs:    0 /s  3139   3121     78    327   

      POOL         TYPE     USED  AVAIL 

cephfs_metadata  metadata  12.0G  63.4T 

  cephfs_data      data     156T  63.4T 

cephfs16 - 0 clients

========

           POOL               TYPE     USED  AVAIL 

cephfs_16t_nocach_metadata  metadata   760k  63.4T 

    cephfs_16t_nocache        data       0   61.4T 

STANDBY MDS 

   CEPH03    

MDS version: ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable)

 

The cephfs_16t ones are the new one.. At the time I wanted to mount the new fs on a client I got the warning on the client:

# mount error: no mds server is up or the cluster is laggy

and on the ceph -s

# ceph -s

  cluster:

    id:     2e7a0a56-89a1-481d-b78b-7ed5a44f1881

    health: HEALTH_WARN

            insufficient standby MDS daemons available

            207 pgs not deep-scrubbed in time

  services:

    mon: 3 daemons, quorum CEPH03,CEPH01,CEPH02 (age 3M)

    mgr: CEPH01(active, since 3M), standbys: CEPH02, CEPH03

    mds: 3/3 daemons up

    osd: 70 osds: 70 up (since 5d), 70 in (since 5d); 1374 remapped pgs

At the moment (with the new fs stopped) I have:

# ceph -s

cluster:

id: 2e7a0a56-89a1-481d-b78b-7ed5a44f1881

health: HEALTH_OK

services:

mon: 3 daemons, quorum CEPH03,CEPH01,CEPH02 (age 3M)

mgr: CEPH01(active, since 3M), standbys: CEPH02, CEPH03

mds: 2/2 daemons up, 1 standby

osd: 70 osds: 70 up (since 3d), 70 in (since 11d)

What useful info can I provide?

Thanx

 

Jure

Hi, I've been digging into this a bit. In ceph docs it says: Each CephFS file system requires at least one MDS...

So having 2 active at the moment seems ok. I went by the instructions and added another MDS on a new host (our hosts are named CEPH{id} for location reference):

mkdir -p /var/lib/ceph/mds/ceph-CEPH{id}
ceph auth get-or-create mds.CEPH{id} mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-CEPH{id}/keyring
chown -R ceph:ceph /var/lib/ceph/mds
chmod 600 /var/lib/ceph/mds/ceph-CEPH{id}/keyring

Afterwards you need to start the service

systemctl start ceph-mds@CEPH{id}

At first it wouldn't start saying Failed. Start request repeated too quickly.

But after a while I tried again and it started.. Don't know why and how, rather annoying but..

Afterwards you have to enable the service for later I presume:

systemctl enable ceph-mds.target
systemctl enable ceph-mds\@CEPH{id}.service

And now I have 4 MDSs running:

# ceph -s
cluster:
id: 2e7a0a56-89a1-481d-b78b-7ed5a44f1881
health: HEALTH_OK

services:
mon: 3 daemons, quorum CEPH03,CEPH01,CEPH02 (age 3M)
mgr: CEPH01(active, since 3M), standbys: CEPH02, CEPH03
mds: 2/2 daemons up, 2 standby
osd: 70 osds: 70 up (since 2d), 70 in (since 2d)

data:
volumes: 1/2 healthy, 1 stopped
pools: 6 pools, 2241 pgs
objects: 37.06M objects, 52 TiB
usage: 161 TiB used, 398 TiB / 560 TiB avail
pgs: 2238 active+clean
3 active+clean+scrubbing+deep

I think I need to fiddle a bit with the fs settings as ceph fs dump shows some differences in some params (namely mds related)..

Which seems OK I think. Do you think I have missed something regarding Petasan distro compliance (like future upgrades proof)?

Thanks a lot..

Jure