Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

GlusterFS mount attempt every 30 seconds

Hey,

I have a GLusterFS mount attempt in the logs every 30 seconds.

Is this the correct action?

Where can I see what triggers such a mount and whether the mount status is ok or terminated with an error?

Since the last update, I have been having an adjustable problem with:
1 clients failing to respond to cache pressure

I checked, cache usage is at 100 MB.
Increasing the cache from 4 GB to 8 GB did not help.

I also gave more time for the client to empty the caches:
mds_cache_trim_interval = 2

But this did not give a resulat.

Interestingly, I have only NFS exposed from the cluster and it is only the NFS client that connects to the cluster.

/opt/petasan/log/PetaSAN.log
06/03/2023 17:52:55 INFO     GlusterFS mount attempt
06/03/2023 17:53:25 INFO     GlusterFS mount attempt
06/03/2023 17:53:55 INFO     GlusterFS mount attempt
06/03/2023 17:54:25 INFO     GlusterFS mount attempt
06/03/2023 17:54:55 INFO     GlusterFS mount attempt
06/03/2023 17:55:25 INFO     GlusterFS mount attempt
06/03/2023 17:55:55 INFO     GlusterFS mount attempt
06/03/2023 17:56:25 INFO     GlusterFS mount attempt
06/03/2023 17:56:55 INFO     GlusterFS mount attempt
06/03/2023 17:57:25 INFO     GlusterFS mount attempt
06/03/2023 17:57:55 INFO     GlusterFS mount attempt
06/03/2023 17:58:25 INFO     GlusterFS mount attempt
06/03/2023 17:58:55 INFO     GlusterFS mount attempt
06/03/2023 17:59:25 INFO     GlusterFS mount attempt
06/03/2023 17:59:56 INFO     GlusterFS mount attempt
06/03/2023 18:00:26 INFO     GlusterFS mount attempt
06/03/2023 18:00:56 INFO     GlusterFS mount attempt
06/03/2023 18:01:26 INFO     GlusterFS mount attempt
06/03/2023 18:01:56 INFO     GlusterFS mount attempt
06/03/2023 18:02:26 INFO     GlusterFS mount attempt
06/03/2023 18:02:56 INFO     GlusterFS mount attempt
06/03/2023 18:03:26 INFO     GlusterFS mount attempt
06/03/2023 18:03:56 INFO     GlusterFS mount attempt
06/03/2023 18:04:26 INFO     GlusterFS mount attempt
06/03/2023 18:04:56 INFO     GlusterFS mount attempt
06/03/2023 18:05:26 INFO     GlusterFS mount attempt
06/03/2023 18:05:56 INFO     GlusterFS mount attempt

ceph tell mds.* client ls | grep hostname
2023-03-06T18:16:50.990+0100 7f3dca7fc700  0 client.67696816 ms_handle_reset on v2:172.30.0.43:6800/352678348
2023-03-06T18:16:51.018+0100 7f3dcb7fe700  0 client.67696822 ms_handle_reset on v2:172.30.0.43:6800/352678348
Error ENOSYS:
2023-03-06T18:16:51.022+0100 7f3dca7fc700  0 client.67696828 ms_handle_reset on v2:172.30.0.41:6800/1134846623
2023-03-06T18:16:51.126+0100 7f3dcb7fe700  0 client.67696834 ms_handle_reset on v2:172.30.0.41:6800/1134846623
"hostname": "NFS-172-30-0-142",
"hostname": "NFS-172-30-0-142",
"hostname": "NFS-172-30-0-141",
"hostname": "NFS-172-30-0-142",
"hostname": "NFS-172-30-0-141",
"hostname": "NFS-172-30-0-141",
"hostname": "ceph03",
"hostname": "NFS-172-30-0-143",
"hostname": "NFS-172-30-0-143",
"hostname": "NFS-172-30-0-143",
"hostname": "ceph01",
2023-03-06T18:16:51.262+0100 7f3dca7fc700  0 client.67696840 ms_handle_reset on v2:172.30.0.42:6800/1102300439
2023-03-06T18:16:51.290+0100 7f3dcb7fe700  0 client.67696852 ms_handle_reset on v2:172.30.0.42:6800/1102300439
"hostname": "ceph02",
"hostname": "NFS-172-30-0-142",
"hostname": "NFS-172-30-0-141",
"hostname": "NFS-172-30-0-142",
"hostname": "NFS-172-30-0-141",
"hostname": "NFS-172-30-0-142",
"hostname": "NFS-172-30-0-141",
"hostname": "NFS-172-30-0-143",
"hostname": "NFS-172-30-0-143",
"hostname": "NFS-172-30-0-143",
"hostname": "ceph01",

 

glusterfs is used as shared file system for shared configuration and for storing stats for graphs

on any 1 of first 3 nodes, what is:
gluster vol status
gluster peer status

on first 3 nodes, what is output of:
systemctl status glusterd

It's 3 node cluster,

You made perfect shot, as always 🙂

gluster vol status
Staging failed on ceph03. Please check log file for details.

gluster peer status
Number of Peers: 2

Hostname: 172.30.0.122
Uuid: fda17bf0-1fa8-4874-ae87-4be8f1f503a8
State: Peer in Cluster (Connected)

Hostname: ceph03
Uuid: 5197c531-dff4-44e8-bbd3-9ac4090bef4f
State: Peer in Cluster (Connected)

tail -f /var/log/glusterfs/glusterd.log
[2023-03-06 21:15:44.066365] E [MSGID: 106165] [glusterd-handshake.c:2060:__glusterd_mgmt_hndsk_version_cbk] 0-management: failed to get the 'versions' from peer (172.30.0.122:24007) [Invalid argument]
[2023-03-06 21:15:44.066416] I [MSGID: 106004] [glusterd-handler.c:6200:__glusterd_peer_rpc_notify] 0-management: Peer <172.30.0.122> (<fda17bf0-1fa8-4874-ae87-4be8f1f503a8>), in state <Peer in Cluster>, has disconnected from glusterd.
[2023-03-06 21:15:47.022417] E [MSGID: 106170] [glusterd-handshake.c:1264:gd_validate_mgmt_hndsk_req] 0-management: Request from peer 172.30.0.43:49139 has an entry in peerinfo, but uuid does not match
[2023-03-06 21:15:47.022471] E [MSGID: 106170] [glusterd-handshake.c:1274:gd_validate_mgmt_hndsk_req] 0-management: Rejecting management handshake request from unknown peer 172.30.0.43:49139
[2023-03-06 21:15:47.022595] E [MSGID: 106165] [glusterd-handshake.c:2060:__glusterd_mgmt_hndsk_version_cbk] 0-management: failed to get the 'versions' from peer (172.30.0.43:24007) [Invalid argument]
[2023-03-06 21:15:47.022696] I [MSGID: 106004] [glusterd-handler.c:6200:__glusterd_peer_rpc_notify] 0-management: Peer <ceph03> (<5197c531-dff4-44e8-bbd3-9ac4090bef4f>), in state <Peer Rejected>, has disconnected from glusterd.
[2023-03-06 21:15:47.022834] E [MSGID: 106165] [glusterd-handshake.c:2060:__glusterd_mgmt_hndsk_version_cbk] 0-management: failed to get the 'versions' from peer (172.30.0.121:24007) [Invalid argument]
[2023-03-06 21:15:47.022879] I [MSGID: 106004] [glusterd-handler.c:6200:__glusterd_peer_rpc_notify] 0-management: Peer <172.30.0.121> (<1331e99e-83bb-45c7-a726-bdc99483b70a>), in state <Peer in Cluster>, has disconnected from glusterd.
[2023-03-06 21:15:47.067863] E [MSGID: 106165] [glusterd-handshake.c:2060:__glusterd_mgmt_hndsk_version_cbk] 0-management: failed to get the 'versions' from peer (172.30.0.122:24007) [Invalid argument]
[2023-03-06 21:15:47.067946] I [MSGID: 106004] [glusterd-handler.c:6200:__glusterd_peer_rpc_notify] 0-management: Peer <172.30.0.122> (<fda17bf0-1fa8-4874-ae87-4be8f1f503a8>), in state <Peer in Cluster>, has disconnected from glusterd.

Ceph01
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; disabled; vendor preset: enabled)
Active: active (running) since Sun 2022-12-04 14:36:31 CET; 3 months 1 days ago
Docs: man:glusterd(8)
Main PID: 2389 (glusterd)
Tasks: 9 (limit: 115873)
Memory: 18.2M
CGroup: /system.slice/glusterd.service
└─2389 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Mar 06 22:10:55 ceph01 glusterd[2389]: [2023-03-06 21:10:55.958491] C [MSGID: 106147] [glusterd-syncop.c:783:_gd_syncop_stage_op_cbk] 0-management: Staging response for 'Volume Status' received from unknown peer: 9f85c001-5f22-4868-aea9-c8c9d85b0b5b
Mar 06 22:11:12 ceph01 glusterd[2389]: [2023-03-06 21:11:12.243447] C [MSGID: 106147] [glusterd-syncop.c:783:_gd_syncop_stage_op_cbk] 0-management: Staging response for 'Volume Status' received from unknown peer: 9f85c001-5f22-4868-aea9-c8c9d85b0b5b
Mar 06 22:11:32 ceph01 glusterd[2389]: [2023-03-06 21:11:32.804939] C [MSGID: 106147] [glusterd-syncop.c:783:_gd_syncop_stage_op_cbk] 0-management: Staging response for 'Volume Status' received from unknown peer: 9f85c001-5f22-4868-aea9-c8c9d85b0b5b
Mar 06 22:11:34 ceph01 glusterd[2389]: [2023-03-06 21:11:34.397162] C [MSGID: 106147] [glusterd-syncop.c:783:_gd_syncop_stage_op_cbk] 0-management: Staging response for 'Volume Status' received from unknown peer: 9f85c001-5f22-4868-aea9-c8c9d85b0b5b
Mar 06 22:11:34 ceph01 glusterd[2389]: [2023-03-06 21:11:34.964503] C [MSGID: 106147] [glusterd-syncop.c:783:_gd_syncop_stage_op_cbk] 0-management: Staging response for 'Volume Status' received from unknown peer: 9f85c001-5f22-4868-aea9-c8c9d85b0b5b
Mar 06 22:11:35 ceph01 glusterd[2389]: [2023-03-06 21:11:35.428406] C [MSGID: 106147] [glusterd-syncop.c:783:_gd_syncop_stage_op_cbk] 0-management: Staging response for 'Volume Status' received from unknown peer: 9f85c001-5f22-4868-aea9-c8c9d85b0b5b
Mar 06 22:11:35 ceph01 glusterd[2389]: [2023-03-06 21:11:35.845524] C [MSGID: 106147] [glusterd-syncop.c:783:_gd_syncop_stage_op_cbk] 0-management: Staging response for 'Volume Status' received from unknown peer: 9f85c001-5f22-4868-aea9-c8c9d85b0b5b

CEPH02

● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; disabled; vendor preset: enabled)
Active: active (running) since Fri 2023-01-13 18:04:59 CET; 1 months 21 days ago
Docs: man:glusterd(8)
Main PID: 3956 (glusterd)
Tasks: 9 (limit: 115873)
Memory: 24.8M
CGroup: /system.slice/glusterd.service
└─3956 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Jan 13 18:04:56 ceph02 systemd[1]: Starting GlusterFS, a clustered file-system server...
Jan 13 18:04:59 ceph02 systemd[1]: Started GlusterFS, a clustered file-system server.

CEPH03

● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; disabled; vendor preset: enabled)
Active: active (running) since Mon 2023-03-06 22:14:49 CET; 3min 54s ago
Docs: man:glusterd(8)
Process: 52546 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 52547 (glusterd)
Tasks: 9 (limit: 115614)
Memory: 6.6M
CGroup: /system.slice/glusterd.service
└─52547 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Mar 06 22:14:47 ceph03 systemd[1]: Starting GlusterFS, a clustered file-system server...
Mar 06 22:14:49 ceph03 systemd[1]: Started GlusterFS, a clustered file-system server.

 

 

can you run

gluster vol status
gluster peer status

on all 3 nodes

=== [ ceph01 ] ===
Staging failed on ceph03. Please check log file for details.

Number of Peers: 2

Hostname: 172.30.0.122
Uuid: fda17bf0-1fa8-4874-ae87-4be8f1f503a8
State: Peer in Cluster (Connected)

Hostname: ceph03
Uuid: 5197c531-dff4-44e8-bbd3-9ac4090bef4f
State: Peer in Cluster (Connected)

=== [ ceph02 ] ===
Staging failed on ceph03. Please check log file for details.

Number of Peers: 2

Hostname: ceph03
Uuid: 5197c531-dff4-44e8-bbd3-9ac4090bef4f
State: Peer in Cluster (Connected)

Hostname: 172.30.0.121
Uuid: 1331e99e-83bb-45c7-a726-bdc99483b70a
State: Peer in Cluster (Connected)

=== [ ceph03 ] ===
No volumes present
Number of Peers: 3

Hostname: 172.30.0.122
Uuid: fda17bf0-1fa8-4874-ae87-4be8f1f503a8
State: Peer in Cluster (Disconnected)

Hostname: ceph03
Uuid: 5197c531-dff4-44e8-bbd3-9ac4090bef4f
State: Peer Rejected (Disconnected)

Hostname: 172.30.0.121
Uuid: 1331e99e-83bb-45c7-a726-bdc99483b70a
State: Peer in Cluster (Disconnected)

Hi,

is my information sufficient for diagnostics?

Thank You

 

I assume the error you posted are from running the 2 commands i had posted. if so, the gluster server configuration is screwed up, the error says to look at the log which is located in

/var/log/glusterfs/

it may be better to just re-configure gluster server on the first 3 nodes using the backend ip, delete existing configuration in:

/var/lib/glusterd/peers
/var/lib/glusterd/vols/

and setup a 3x replicated volume using gluster documentation:

volume name: gfs-vol
brick path: /opt/petasan/config/gfs-brick

make sure you run gluster on the backend ip

it case you need the old statistics/charts data, you may want to backup /opt/petasan/config/gfs-brick on one server before deleting the brick, you can then write the old data once the servers are up on the mounted point

/opt/petasan/config/shared

You can also buy support from us if you need. Good luck.

It work's, thank You!

 

Glad things are working 🙂