No graphs statistics on all nodes after upgrade from 3.0.1 to 3.1.0 (RESOLVED)
JG
26 Posts
July 25, 2022, 11:39 amQuote from JG on July 25, 2022, 11:39 amHi,
I updated our 3 nodes cluster yesterday from 3.0.1 to 3.1.0, upgrade was smoth and I rebooted all nodes (one at a time) and no issues, however I noticed that the graph statistics do not work after the upgrade (statistics of more than 1 day are shown). Cluster is on production so unfortunately I can't reboot the nodes. Posting here for awareness of the issue, thanks in advance for any idea about where to look.
Currently the Node leader of the cluster is Node #3 all the following outputs are from this node.
tail /opt/petasan/log/PetaSAN.log
24/07/2022 10:17:53 INFO Start reassigning resource "S3-x-x-x-9" from node "node2" to node "node3".
24/07/2022 10:17:55 INFO S3Server : sync Consul settings
24/07/2022 10:17:55 INFO S3Server : sync Consul settings -> done
24/07/2022 10:17:55 INFO LockBase : Try to acquire the resource = S3-x-x-x-9.
24/07/2022 10:17:59 INFO LockBase : Succeeded on acquiring the resource = S3-x-x-x-9
24/07/2022 10:18:02 INFO S3Server : sync Consul settings
24/07/2022 10:18:02 INFO S3Server : sync Consul settings -> done
24/07/2022 10:18:06 INFO LockBase : Reassignment job finished.
24/07/2022 10:18:46 INFO Success saving application config
25/07/2022 06:25:03 INFO GlusterFS mount attempt
./check_interfaces_match.py
cluster management interface eth0
node management interface eth0
management interface match
cluster eth count 6
node eth count 8
Error: eth count mis-match !!
detected interfaces
eth0
eth1
eth2
eth3
eth4
eth5
bond0
bond1
cat /etc/udev/rules.d/70-persistent-net.rules
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp6s0f0, ASSIGNED_NAME=eth0
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp6s0f1, ASSIGNED_NAME=eth1
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp1s0f0, ASSIGNED_NAME=eth2
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp1s0f1, ASSIGNED_NAME=eth3
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp129s0f0, ASSIGNED_NAME=eth4
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp129s0f1, ASSIGNED_NAME=eth5
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f4", ATTR{type}=="1", NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f5", ATTR{type}=="1", NAME="eth1"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f0", ATTR{type}=="1", NAME="eth2"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f2", ATTR{type}=="1", NAME="eth3"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="a0:36:9f:EDITED:c4", ATTR{type}=="1", NAME="eth4"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="a0:36:9f:EDITED:c6", ATTR{type}=="1", NAME="eth5"
./get_cluster_leader.py
{'node3': 'MGMT_IP_NODE_#3'}
systemctl list-units -t service | grep petasan
petasan-admin.service loaded active running PetaSAN Web Management and Administration
petasan-cifs.service loaded active running PetaSAN CIFS Service
petasan-cluster-leader.service loaded active running PetaSAN Cluster Leader
petasan-config-upload.service loaded active exited PetaSAN Config Upload
petasan-console.service loaded active running PetaSAN Node Console
petasan-deploy.service loaded active running PetaSAN Node Deployment
petasan-file-sync.service loaded active running PetaSAN File Sync Service
petasan-iscsi.service loaded active running PetaSAN iSCSI Service
petasan-mount-sharedfs.service loaded active running PetaSAN Mount SharedFS
petasan-node-stats.service loaded active running PetaSAN Node Stats Service
petasan-notification.service loaded active running PetaSAN Notification Service
petasan-qperf.service loaded active running PetaSAN qperf server
petasan-s3.service loaded active running PetaSAN S3 Service
petasan-start-osds.service loaded active exited PetaSAN Start All OSDs service
petasan-start-services.service loaded active exited PetaSAN Start Services
petasan-sync-replication-node.service loaded active exited PetaSAN Sync Replication Node
petasan-tuning.service loaded active exited PetaSAN Tuning Service
petasan-update-node-info.service loaded active exited PetaSAN Update Node Info
systemctl status petasan-node-stats
● petasan-node-stats.service - PetaSAN Node Stats Service
Loaded: loaded (/lib/systemd/system/petasan-node-stats.service; static; vendor preset: enabled)
Active: active (running) since Mon 2022-07-25 10:54:43 EDT; 1h 56min ago
Main PID: 1390975 (node_stats.py)
Tasks: 1 (limit: 154527)
Memory: 34.5M
CGroup: /system.slice/petasan-node-stats.service
└─1390975 /usr/bin/python3 /opt/petasan/scripts/node_stats.py
Jul 25 10:54:43 node3 systemd[1]: Started PetaSAN Node Stats Service.
Also I noticed I started to get the following winbindd errors after the upgrade:
journalctl -u winbind
Jul 24 09:09:29 node3 winbindd[1561334]: [2022/07/24 09:09:29.094735, 0] ../../source3/winbindd/winbindd_cache.c:3203(initialize_winbindd_cache)
Jul 24 09:09:29 node3 winbindd[1561334]: initialize_winbindd_cache: clearing cache and re-creating with version number 2
Jul 24 09:09:29 node3 winbindd[1561334]: [2022/07/24 09:09:29.111228, 0] ../../lib/util/become_daemon.c:135(daemon_ready)
Jul 24 09:09:29 node3 winbindd[1561334]: daemon_ready: daemon 'winbindd' finished starting up and ready to serve connections
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.116379, 0] ../../source3/winbindd/winbindd_cm.c:1873(wb_open_internal_pipe)
Jul 24 09:09:29 node3 winbindd[1561336]: open_internal_pipe: Could not connect to dssetup pipe: NT_STATUS_RPC_INTERFACE_NOT_FOUND
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.116603, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:2E - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.119170, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.119949, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:46 node3 winbindd[1561334]: [2022/07/24 09:09:46.370550, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:09:46 node3 winbindd[1561334]: Got sig[15] terminate (is_parent=1)
Jul 24 09:10:30 node3 winbindd[1570749]: [2022/07/24 09:10:30.866997, 0] ../../source3/winbindd/winbindd_cache.c:3203(initialize_winbindd_cache)
Jul 24 09:10:30 node3 winbindd[1570749]: initialize_winbindd_cache: clearing cache and re-creating with version number 2
Jul 24 09:10:30 node3 winbindd[1570749]: [2022/07/24 09:10:30.868686, 0] ../../lib/util/become_daemon.c:135(daemon_ready)
Jul 24 09:10:30 node3 winbindd[1570749]: daemon_ready: daemon 'winbindd' finished starting up and ready to serve connections
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.882919, 0] ../../source3/winbindd/winbindd_cm.c:1873(wb_open_internal_pipe)
Jul 24 09:10:30 node3 winbindd[1570752]: open_internal_pipe: Could not connect to dssetup pipe: NT_STATUS_RPC_INTERFACE_NOT_FOUND
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.883118, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:2E - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.883937, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.884747, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:15:30 node3 winbindd[1570752]: [2022/07/24 09:15:30.887676, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:15:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:15:30 node3 winbindd[1570752]: [2022/07/24 09:15:30.888530, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:15:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:16:02 node3 winbindd[1570749]: [2022/07/24 09:16:02.944054, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:16:02 node3 winbindd[1570752]: [2022/07/24 09:16:02.944059, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:16:02 node3 winbindd[1570794]: [2022/07/24 09:16:02.944054, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
...
Jul 24 10:05:38 node3 winbindd[57625]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120613, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120754, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57625]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120792, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57625]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120817, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120842, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: reinit_after_fork() failed
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.122917, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57628]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123088, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123226, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57628]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123263, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57628]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123282, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123300, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: reinit_after_fork() failed
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125280, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57631]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125442, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125539, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57631]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125565, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57631]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125583, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125600, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: reinit_after_fork() failed
...
Jul 25 12:35:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:40:38 node3 winbindd[73959]: [2022/07/25 12:40:38.712010, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:40:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:40:38 node3 winbindd[73959]: [2022/07/25 12:40:38.712873, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:40:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:45:38 node3 winbindd[73959]: [2022/07/25 12:45:38.713649, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:45:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:45:38 node3 winbindd[73959]: [2022/07/25 12:45:38.714526, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:45:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:50:38 node3 winbindd[73959]: [2022/07/25 12:50:38.715140, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:50:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:50:38 node3 winbindd[73959]: [2022/07/25 12:50:38.716001, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:50:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:55:38 node3 winbindd[73959]: [2022/07/25 12:55:38.715959, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:55:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:55:38 node3 winbindd[73959]: [2022/07/25 12:55:38.716811, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:55:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Tested also write a fake stat:
echo "PetaSAN.NodeStats.node3.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 IP_NODE3 2003
Connection to IP_NODE3 2003 port [tcp/cfinger] succeeded!
Edit1: Found the following log about stats in one of the nodes: /opt/petasan/log/stats.log
[2022-07-24 10:01:50] plugin_load: plugin "write_graphite" successfully loaded.
[2022-07-24 10:01:50] plugin_load: plugin "python" successfully loaded.
[2022-07-24 10:01:50] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2022-07-24 10:01:50] Systemd detected, trying to signal readiness.
Edit2: I noticed the following on the config of collectd at /opt/petasan/config/stats/collectd/collectd.conf
On line #39 the import is commented as show below, however I double checked my backup of "/opt/petasan" before the upgrade and I the configs files seems are the same.
<Plugin "python">
ModulePath "/usr/lib/collectd/plugins/ceph"
Import "ceph_pool_plugin"
Import "ceph_monitor_plugin"
Import "ceph_osd_plugin"
Import "ceph_pg_plugin"
#Import "ceph_latency_plugin"
<Module "ceph_pool_plugin">
Verbose "False"
Cluster "CLUSTER_NAME"
Interval "60"
</Module>
Edit2: Followed the following threads: [1] [2] but either one didn't resolv the issue.
[1] http://www.petasan.org/forums/?view=thread&id=1043
[2] http://www.petasan.org/forums/?view=thread&id=1022
Hi,
I updated our 3 nodes cluster yesterday from 3.0.1 to 3.1.0, upgrade was smoth and I rebooted all nodes (one at a time) and no issues, however I noticed that the graph statistics do not work after the upgrade (statistics of more than 1 day are shown). Cluster is on production so unfortunately I can't reboot the nodes. Posting here for awareness of the issue, thanks in advance for any idea about where to look.
Currently the Node leader of the cluster is Node #3 all the following outputs are from this node.
tail /opt/petasan/log/PetaSAN.log
24/07/2022 10:17:53 INFO Start reassigning resource "S3-x-x-x-9" from node "node2" to node "node3".
24/07/2022 10:17:55 INFO S3Server : sync Consul settings
24/07/2022 10:17:55 INFO S3Server : sync Consul settings -> done
24/07/2022 10:17:55 INFO LockBase : Try to acquire the resource = S3-x-x-x-9.
24/07/2022 10:17:59 INFO LockBase : Succeeded on acquiring the resource = S3-x-x-x-9
24/07/2022 10:18:02 INFO S3Server : sync Consul settings
24/07/2022 10:18:02 INFO S3Server : sync Consul settings -> done
24/07/2022 10:18:06 INFO LockBase : Reassignment job finished.
24/07/2022 10:18:46 INFO Success saving application config
25/07/2022 06:25:03 INFO GlusterFS mount attempt
./check_interfaces_match.py
cluster management interface eth0
node management interface eth0
management interface match
cluster eth count 6
node eth count 8
Error: eth count mis-match !!
detected interfaces
eth0
eth1
eth2
eth3
eth4
eth5
bond0
bond1
cat /etc/udev/rules.d/70-persistent-net.rules
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp6s0f0, ASSIGNED_NAME=eth0
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp6s0f1, ASSIGNED_NAME=eth1
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp1s0f0, ASSIGNED_NAME=eth2
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp1s0f1, ASSIGNED_NAME=eth3
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp129s0f0, ASSIGNED_NAME=eth4
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp129s0f1, ASSIGNED_NAME=eth5
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f4", ATTR{type}=="1", NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f5", ATTR{type}=="1", NAME="eth1"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f0", ATTR{type}=="1", NAME="eth2"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f2", ATTR{type}=="1", NAME="eth3"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="a0:36:9f:EDITED:c4", ATTR{type}=="1", NAME="eth4"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="a0:36:9f:EDITED:c6", ATTR{type}=="1", NAME="eth5"
./get_cluster_leader.py
{'node3': 'MGMT_IP_NODE_#3'}
systemctl list-units -t service | grep petasan
petasan-admin.service loaded active running PetaSAN Web Management and Administration
petasan-cifs.service loaded active running PetaSAN CIFS Service
petasan-cluster-leader.service loaded active running PetaSAN Cluster Leader
petasan-config-upload.service loaded active exited PetaSAN Config Upload
petasan-console.service loaded active running PetaSAN Node Console
petasan-deploy.service loaded active running PetaSAN Node Deployment
petasan-file-sync.service loaded active running PetaSAN File Sync Service
petasan-iscsi.service loaded active running PetaSAN iSCSI Service
petasan-mount-sharedfs.service loaded active running PetaSAN Mount SharedFS
petasan-node-stats.service loaded active running PetaSAN Node Stats Service
petasan-notification.service loaded active running PetaSAN Notification Service
petasan-qperf.service loaded active running PetaSAN qperf server
petasan-s3.service loaded active running PetaSAN S3 Service
petasan-start-osds.service loaded active exited PetaSAN Start All OSDs service
petasan-start-services.service loaded active exited PetaSAN Start Services
petasan-sync-replication-node.service loaded active exited PetaSAN Sync Replication Node
petasan-tuning.service loaded active exited PetaSAN Tuning Service
petasan-update-node-info.service loaded active exited PetaSAN Update Node Info
systemctl status petasan-node-stats
● petasan-node-stats.service - PetaSAN Node Stats Service
Loaded: loaded (/lib/systemd/system/petasan-node-stats.service; static; vendor preset: enabled)
Active: active (running) since Mon 2022-07-25 10:54:43 EDT; 1h 56min ago
Main PID: 1390975 (node_stats.py)
Tasks: 1 (limit: 154527)
Memory: 34.5M
CGroup: /system.slice/petasan-node-stats.service
└─1390975 /usr/bin/python3 /opt/petasan/scripts/node_stats.py
Jul 25 10:54:43 node3 systemd[1]: Started PetaSAN Node Stats Service.
Also I noticed I started to get the following winbindd errors after the upgrade:
journalctl -u winbind
Jul 24 09:09:29 node3 winbindd[1561334]: [2022/07/24 09:09:29.094735, 0] ../../source3/winbindd/winbindd_cache.c:3203(initialize_winbindd_cache)
Jul 24 09:09:29 node3 winbindd[1561334]: initialize_winbindd_cache: clearing cache and re-creating with version number 2
Jul 24 09:09:29 node3 winbindd[1561334]: [2022/07/24 09:09:29.111228, 0] ../../lib/util/become_daemon.c:135(daemon_ready)
Jul 24 09:09:29 node3 winbindd[1561334]: daemon_ready: daemon 'winbindd' finished starting up and ready to serve connections
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.116379, 0] ../../source3/winbindd/winbindd_cm.c:1873(wb_open_internal_pipe)
Jul 24 09:09:29 node3 winbindd[1561336]: open_internal_pipe: Could not connect to dssetup pipe: NT_STATUS_RPC_INTERFACE_NOT_FOUND
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.116603, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:2E - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.119170, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.119949, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:46 node3 winbindd[1561334]: [2022/07/24 09:09:46.370550, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:09:46 node3 winbindd[1561334]: Got sig[15] terminate (is_parent=1)
Jul 24 09:10:30 node3 winbindd[1570749]: [2022/07/24 09:10:30.866997, 0] ../../source3/winbindd/winbindd_cache.c:3203(initialize_winbindd_cache)
Jul 24 09:10:30 node3 winbindd[1570749]: initialize_winbindd_cache: clearing cache and re-creating with version number 2
Jul 24 09:10:30 node3 winbindd[1570749]: [2022/07/24 09:10:30.868686, 0] ../../lib/util/become_daemon.c:135(daemon_ready)
Jul 24 09:10:30 node3 winbindd[1570749]: daemon_ready: daemon 'winbindd' finished starting up and ready to serve connections
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.882919, 0] ../../source3/winbindd/winbindd_cm.c:1873(wb_open_internal_pipe)
Jul 24 09:10:30 node3 winbindd[1570752]: open_internal_pipe: Could not connect to dssetup pipe: NT_STATUS_RPC_INTERFACE_NOT_FOUND
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.883118, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:2E - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.883937, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.884747, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:15:30 node3 winbindd[1570752]: [2022/07/24 09:15:30.887676, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:15:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:15:30 node3 winbindd[1570752]: [2022/07/24 09:15:30.888530, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:15:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:16:02 node3 winbindd[1570749]: [2022/07/24 09:16:02.944054, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:16:02 node3 winbindd[1570752]: [2022/07/24 09:16:02.944059, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:16:02 node3 winbindd[1570794]: [2022/07/24 09:16:02.944054, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
...
Jul 24 10:05:38 node3 winbindd[57625]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120613, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120754, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57625]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120792, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57625]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120817, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120842, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: reinit_after_fork() failed
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.122917, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57628]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123088, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123226, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57628]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123263, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57628]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123282, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123300, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: reinit_after_fork() failed
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125280, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57631]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125442, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125539, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57631]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125565, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57631]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125583, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125600, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: reinit_after_fork() failed
...
Jul 25 12:35:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:40:38 node3 winbindd[73959]: [2022/07/25 12:40:38.712010, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:40:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:40:38 node3 winbindd[73959]: [2022/07/25 12:40:38.712873, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:40:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:45:38 node3 winbindd[73959]: [2022/07/25 12:45:38.713649, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:45:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:45:38 node3 winbindd[73959]: [2022/07/25 12:45:38.714526, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:45:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:50:38 node3 winbindd[73959]: [2022/07/25 12:50:38.715140, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:50:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:50:38 node3 winbindd[73959]: [2022/07/25 12:50:38.716001, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:50:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:55:38 node3 winbindd[73959]: [2022/07/25 12:55:38.715959, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:55:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:55:38 node3 winbindd[73959]: [2022/07/25 12:55:38.716811, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:55:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Tested also write a fake stat:
echo "PetaSAN.NodeStats.node3.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 IP_NODE3 2003
Connection to IP_NODE3 2003 port [tcp/cfinger] succeeded!
Edit1: Found the following log about stats in one of the nodes: /opt/petasan/log/stats.log
[2022-07-24 10:01:50] plugin_load: plugin "write_graphite" successfully loaded.
[2022-07-24 10:01:50] plugin_load: plugin "python" successfully loaded.
[2022-07-24 10:01:50] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2022-07-24 10:01:50] Systemd detected, trying to signal readiness.
Edit2: I noticed the following on the config of collectd at /opt/petasan/config/stats/collectd/collectd.conf
On line #39 the import is commented as show below, however I double checked my backup of "/opt/petasan" before the upgrade and I the configs files seems are the same.
<Plugin "python">
ModulePath "/usr/lib/collectd/plugins/ceph"
Import "ceph_pool_plugin"
Import "ceph_monitor_plugin"
Import "ceph_osd_plugin"
Import "ceph_pg_plugin"
#Import "ceph_latency_plugin"
<Module "ceph_pool_plugin">
Verbose "False"
Cluster "CLUSTER_NAME"
Interval "60"
</Module>
Edit2: Followed the following threads: [1] [2] but either one didn't resolv the issue.
[1] http://www.petasan.org/forums/?view=thread&id=1043
[2] http://www.petasan.org/forums/?view=thread&id=1022
Last edited on July 27, 2022, 12:43 am by JG · #1
admin
2,930 Posts
July 26, 2022, 11:49 pmQuote from admin on July 26, 2022, 11:49 pm1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?
3) is shared file system mounted on all nodes ?
mount | grep shared
4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on that server, what is status of graphite service
systemctl status carbon-cache
5) what is output of
gluster vol status gfs-vol
1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?
3) is shared file system mounted on all nodes ?
mount | grep shared
4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on that server, what is status of graphite service
systemctl status carbon-cache
5) what is output of
gluster vol status gfs-vol
Last edited on July 26, 2022, 11:51 pm by admin · #2
JG
26 Posts
July 26, 2022, 11:59 pmQuote from JG on July 26, 2022, 11:59 pm
Quote from admin on July 26, 2022, 11:49 pm
1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
The chart is visible, no error but no data. If I choose a diferent date/time with the dropdown menus I can see data for the time before the upgrade.
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?
No re-ocurring errors at all, in fact the logs are a lot cleaner (no weird error messages) than before the upgrade. The below messages are from the leader node of the cluster.
24/07/2022 10:18:06 INFO LockBase : Reassignment job finished.
24/07/2022 10:18:46 INFO Success saving application config
25/07/2022 06:25:03 INFO GlusterFS mount attempt
26/07/2022 06:25:17 INFO GlusterFS mount attempt
3) is shared file system mounted on all nodes ?
mount | grep shared
It's mounted on all nodes.
4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
{'node3': 'MGMT_IP_NODE3'}
on that server, what is status of graphite service
systemctl status carbon-cache
Service is up and running.
carbon-cache.service - Graphite Carbon Cache
Loaded: loaded (/lib/systemd/system/carbon-cache.service; disabled; vendor preset: enabled)
Active: active (running) since Sun 2022-07-24 10:01:49 EDT; 2 days ago
Docs: https://graphite.readthedocs.io
Process: 54843 ExecStart=/usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache.pid --logdir=/var/lo
g/carbon/ start (code=exited, status=0/SUCCESS)
Main PID: 54857 (carbon-cache)
Tasks: 3 (limit: 154527)
Memory: 1.8G
CGroup: /system.slice/carbon-cache.service
└─54857 /usr/bin/python3 /usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache.pid --logdi
r=/var/log/carbon/ start
Jul 24 10:00:48 node3 systemd[1]: Starting Graphite Carbon Cache...
Jul 24 10:01:49 node3 systemd[1]: Started Graphite Carbon Cache.
5) what is output of
gluster vol status gfs-vol
Status of volume: gfs-vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick x.x.x.1:/opt/petasan/config/gfs-br
ick 49152 0 Y 2013
Brick x.x.x..2:/opt/petasan/config/gfs-br
ick 49153 0 Y 2018
Brick x.x.x.3:/opt/petasan/config/gfs-br
ick N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 2019
Self-heal Daemon on x.x.x.1 N/A N/A N N/A
Self-heal Daemon on x.x.x.2 N/A N/A N N/A
Task Status of Volume gfs-vol
------------------------------------------------------------------------------
There are no active volume tasks
Quote from admin on July 26, 2022, 11:49 pm
1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
The chart is visible, no error but no data. If I choose a diferent date/time with the dropdown menus I can see data for the time before the upgrade.
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?
No re-ocurring errors at all, in fact the logs are a lot cleaner (no weird error messages) than before the upgrade. The below messages are from the leader node of the cluster.
24/07/2022 10:18:06 INFO LockBase : Reassignment job finished.
24/07/2022 10:18:46 INFO Success saving application config
25/07/2022 06:25:03 INFO GlusterFS mount attempt
26/07/2022 06:25:17 INFO GlusterFS mount attempt
3) is shared file system mounted on all nodes ?
mount | grep shared
It's mounted on all nodes.
4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
{'node3': 'MGMT_IP_NODE3'}
on that server, what is status of graphite service
systemctl status carbon-cache
Service is up and running.
carbon-cache.service - Graphite Carbon Cache
Loaded: loaded (/lib/systemd/system/carbon-cache.service; disabled; vendor preset: enabled)
Active: active (running) since Sun 2022-07-24 10:01:49 EDT; 2 days ago
Docs: https://graphite.readthedocs.io
Process: 54843 ExecStart=/usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache.pid --logdir=/var/lo
g/carbon/ start (code=exited, status=0/SUCCESS)
Main PID: 54857 (carbon-cache)
Tasks: 3 (limit: 154527)
Memory: 1.8G
CGroup: /system.slice/carbon-cache.service
└─54857 /usr/bin/python3 /usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache.pid --logdi
r=/var/log/carbon/ start
Jul 24 10:00:48 node3 systemd[1]: Starting Graphite Carbon Cache...
Jul 24 10:01:49 node3 systemd[1]: Started Graphite Carbon Cache.
5) what is output of
gluster vol status gfs-vol
Status of volume: gfs-vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick x.x.x.1:/opt/petasan/config/gfs-br
ick 49152 0 Y 2013
Brick x.x.x..2:/opt/petasan/config/gfs-br
ick 49153 0 Y 2018
Brick x.x.x.3:/opt/petasan/config/gfs-br
ick N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 2019
Self-heal Daemon on x.x.x.1 N/A N/A N N/A
Self-heal Daemon on x.x.x.2 N/A N/A N N/A
Task Status of Volume gfs-vol
------------------------------------------------------------------------------
There are no active volume tasks
Last edited on July 27, 2022, 12:17 am by JG · #3
admin
2,930 Posts
July 27, 2022, 12:32 amQuote from admin on July 27, 2022, 12:32 amSo i understand the chart background is showing, there are no errors, the old data points from cluster and node stats show up but new data points from both cluster and node stats do not show up. correct me if wrong.
1) on node which is stats server, can you try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
2) If this does not fix issues, on all nodes do a
systemctl restart petasan-node-stats
3) do you see any log errors in
/var/log/carbon/
4) Try to manually write a fake 50% cpu stats from any node and see if we get errors or if it shows up on chart
then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.Node01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003
does the cpu value show up on graph (wait about 1 min), do you get errors ?
5) what is output of
dpkg -l | grep netcat
So i understand the chart background is showing, there are no errors, the old data points from cluster and node stats show up but new data points from both cluster and node stats do not show up. correct me if wrong.
1) on node which is stats server, can you try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
2) If this does not fix issues, on all nodes do a
systemctl restart petasan-node-stats
3) do you see any log errors in
/var/log/carbon/
4) Try to manually write a fake 50% cpu stats from any node and see if we get errors or if it shows up on chart
then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.Node01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003
does the cpu value show up on graph (wait about 1 min), do you get errors ?
5) what is output of
dpkg -l | grep netcat
Last edited on July 27, 2022, 12:33 am by admin · #4
JG
26 Posts
July 27, 2022, 12:40 amQuote from JG on July 27, 2022, 12:40 am
Quote from admin on July 27, 2022, 12:32 am
So i understand the chart background is showing, there are no errors, the old data points from cluster and node stats show up but new data points from both cluster and node stats do not show up. correct me if wrong.
This is correct
1) on node which is stats server, can you try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
This did the trick and fixed the issue. Thanks a lot for all the work.
2) If this does not fix issues, on all nodes do a
systemctl restart petasan-node-stats
3) do you see any log errors in
/var/log/carbon/
4) Try to manually write a fake 50% cpu stats from any node and see if we get errors or if it shows up on chart
then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.Node01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003
does the cpu value show up on graph (wait about 1 min), do you get errors ?
5) what is output of
dpkg -l | grep netcat
Quote from admin on July 27, 2022, 12:32 am
So i understand the chart background is showing, there are no errors, the old data points from cluster and node stats show up but new data points from both cluster and node stats do not show up. correct me if wrong.
This is correct
1) on node which is stats server, can you try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
This did the trick and fixed the issue. Thanks a lot for all the work.
2) If this does not fix issues, on all nodes do a
systemctl restart petasan-node-stats
3) do you see any log errors in
/var/log/carbon/
4) Try to manually write a fake 50% cpu stats from any node and see if we get errors or if it shows up on chart
then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.Node01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003
does the cpu value show up on graph (wait about 1 min), do you get errors ?
5) what is output of
dpkg -l | grep netcat
admin
2,930 Posts
July 27, 2022, 1:05 amQuote from admin on July 27, 2022, 1:05 amExcellent 🙂 glad things worked.
the commands that did the trick should have run in the online update script, for example if you upgraded from 3.01, the upgrade script
/opt/petasan/scripts/online-updates/3.0.1/update
should already do it:
if (systemctl is-active --quiet carbon-cache)
then
echo "Restarting stats services"
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
fi
We do test ourselves and do test upgrades and it does work, but maybe there are corner conditions if others do see it let us know
Excellent 🙂 glad things worked.
the commands that did the trick should have run in the online update script, for example if you upgraded from 3.01, the upgrade script
/opt/petasan/scripts/online-updates/3.0.1/update
should already do it:
if (systemctl is-active --quiet carbon-cache)
then
echo "Restarting stats services"
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
fi
We do test ourselves and do test upgrades and it does work, but maybe there are corner conditions if others do see it let us know
Last edited on July 27, 2022, 1:06 am by admin · #6
No graphs statistics on all nodes after upgrade from 3.0.1 to 3.1.0 (RESOLVED)
JG
26 Posts
Quote from JG on July 25, 2022, 11:39 amHi,
I updated our 3 nodes cluster yesterday from 3.0.1 to 3.1.0, upgrade was smoth and I rebooted all nodes (one at a time) and no issues, however I noticed that the graph statistics do not work after the upgrade (statistics of more than 1 day are shown). Cluster is on production so unfortunately I can't reboot the nodes. Posting here for awareness of the issue, thanks in advance for any idea about where to look.
Currently the Node leader of the cluster is Node #3 all the following outputs are from this node.
tail /opt/petasan/log/PetaSAN.log
24/07/2022 10:17:53 INFO Start reassigning resource "S3-x-x-x-9" from node "node2" to node "node3".
24/07/2022 10:17:55 INFO S3Server : sync Consul settings
24/07/2022 10:17:55 INFO S3Server : sync Consul settings -> done
24/07/2022 10:17:55 INFO LockBase : Try to acquire the resource = S3-x-x-x-9.
24/07/2022 10:17:59 INFO LockBase : Succeeded on acquiring the resource = S3-x-x-x-9
24/07/2022 10:18:02 INFO S3Server : sync Consul settings
24/07/2022 10:18:02 INFO S3Server : sync Consul settings -> done
24/07/2022 10:18:06 INFO LockBase : Reassignment job finished.
24/07/2022 10:18:46 INFO Success saving application config
25/07/2022 06:25:03 INFO GlusterFS mount attempt./check_interfaces_match.py
cluster management interface eth0
node management interface eth0
management interface match
cluster eth count 6
node eth count 8
Error: eth count mis-match !!
detected interfaces
eth0
eth1
eth2
eth3
eth4
eth5
bond0
bond1cat /etc/udev/rules.d/70-persistent-net.rules
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp6s0f0, ASSIGNED_NAME=eth0
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp6s0f1, ASSIGNED_NAME=eth1
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp1s0f0, ASSIGNED_NAME=eth2
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp1s0f1, ASSIGNED_NAME=eth3
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp129s0f0, ASSIGNED_NAME=eth4
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp129s0f1, ASSIGNED_NAME=eth5SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f4", ATTR{type}=="1", NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f5", ATTR{type}=="1", NAME="eth1"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f0", ATTR{type}=="1", NAME="eth2"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f2", ATTR{type}=="1", NAME="eth3"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="a0:36:9f:EDITED:c4", ATTR{type}=="1", NAME="eth4"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="a0:36:9f:EDITED:c6", ATTR{type}=="1", NAME="eth5"./get_cluster_leader.py
{'node3': 'MGMT_IP_NODE_#3'}systemctl list-units -t service | grep petasan
petasan-admin.service loaded active running PetaSAN Web Management and Administration
petasan-cifs.service loaded active running PetaSAN CIFS Service
petasan-cluster-leader.service loaded active running PetaSAN Cluster Leader
petasan-config-upload.service loaded active exited PetaSAN Config Upload
petasan-console.service loaded active running PetaSAN Node Console
petasan-deploy.service loaded active running PetaSAN Node Deployment
petasan-file-sync.service loaded active running PetaSAN File Sync Service
petasan-iscsi.service loaded active running PetaSAN iSCSI Service
petasan-mount-sharedfs.service loaded active running PetaSAN Mount SharedFS
petasan-node-stats.service loaded active running PetaSAN Node Stats Service
petasan-notification.service loaded active running PetaSAN Notification Service
petasan-qperf.service loaded active running PetaSAN qperf server
petasan-s3.service loaded active running PetaSAN S3 Service
petasan-start-osds.service loaded active exited PetaSAN Start All OSDs service
petasan-start-services.service loaded active exited PetaSAN Start Services
petasan-sync-replication-node.service loaded active exited PetaSAN Sync Replication Node
petasan-tuning.service loaded active exited PetaSAN Tuning Service
petasan-update-node-info.service loaded active exited PetaSAN Update Node Info
systemctl status petasan-node-stats
● petasan-node-stats.service - PetaSAN Node Stats Service
Loaded: loaded (/lib/systemd/system/petasan-node-stats.service; static; vendor preset: enabled)
Active: active (running) since Mon 2022-07-25 10:54:43 EDT; 1h 56min ago
Main PID: 1390975 (node_stats.py)
Tasks: 1 (limit: 154527)
Memory: 34.5M
CGroup: /system.slice/petasan-node-stats.service
└─1390975 /usr/bin/python3 /opt/petasan/scripts/node_stats.pyJul 25 10:54:43 node3 systemd[1]: Started PetaSAN Node Stats Service.
Also I noticed I started to get the following winbindd errors after the upgrade:
journalctl -u winbind
Jul 24 09:09:29 node3 winbindd[1561334]: [2022/07/24 09:09:29.094735, 0] ../../source3/winbindd/winbindd_cache.c:3203(initialize_winbindd_cache)
Jul 24 09:09:29 node3 winbindd[1561334]: initialize_winbindd_cache: clearing cache and re-creating with version number 2
Jul 24 09:09:29 node3 winbindd[1561334]: [2022/07/24 09:09:29.111228, 0] ../../lib/util/become_daemon.c:135(daemon_ready)
Jul 24 09:09:29 node3 winbindd[1561334]: daemon_ready: daemon 'winbindd' finished starting up and ready to serve connections
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.116379, 0] ../../source3/winbindd/winbindd_cm.c:1873(wb_open_internal_pipe)
Jul 24 09:09:29 node3 winbindd[1561336]: open_internal_pipe: Could not connect to dssetup pipe: NT_STATUS_RPC_INTERFACE_NOT_FOUND
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.116603, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:2E - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.119170, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.119949, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:46 node3 winbindd[1561334]: [2022/07/24 09:09:46.370550, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:09:46 node3 winbindd[1561334]: Got sig[15] terminate (is_parent=1)
Jul 24 09:10:30 node3 winbindd[1570749]: [2022/07/24 09:10:30.866997, 0] ../../source3/winbindd/winbindd_cache.c:3203(initialize_winbindd_cache)
Jul 24 09:10:30 node3 winbindd[1570749]: initialize_winbindd_cache: clearing cache and re-creating with version number 2
Jul 24 09:10:30 node3 winbindd[1570749]: [2022/07/24 09:10:30.868686, 0] ../../lib/util/become_daemon.c:135(daemon_ready)
Jul 24 09:10:30 node3 winbindd[1570749]: daemon_ready: daemon 'winbindd' finished starting up and ready to serve connections
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.882919, 0] ../../source3/winbindd/winbindd_cm.c:1873(wb_open_internal_pipe)
Jul 24 09:10:30 node3 winbindd[1570752]: open_internal_pipe: Could not connect to dssetup pipe: NT_STATUS_RPC_INTERFACE_NOT_FOUND
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.883118, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:2E - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.883937, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.884747, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:15:30 node3 winbindd[1570752]: [2022/07/24 09:15:30.887676, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:15:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:15:30 node3 winbindd[1570752]: [2022/07/24 09:15:30.888530, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:15:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:16:02 node3 winbindd[1570749]: [2022/07/24 09:16:02.944054, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:16:02 node3 winbindd[1570752]: [2022/07/24 09:16:02.944059, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:16:02 node3 winbindd[1570794]: [2022/07/24 09:16:02.944054, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
...
Jul 24 10:05:38 node3 winbindd[57625]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120613, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120754, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57625]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120792, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57625]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120817, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120842, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: reinit_after_fork() failed
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.122917, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57628]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123088, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123226, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57628]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123263, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57628]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123282, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123300, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: reinit_after_fork() failed
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125280, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57631]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125442, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125539, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57631]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125565, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57631]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125583, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125600, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: reinit_after_fork() failed...
Jul 25 12:35:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:40:38 node3 winbindd[73959]: [2022/07/25 12:40:38.712010, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:40:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:40:38 node3 winbindd[73959]: [2022/07/25 12:40:38.712873, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:40:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:45:38 node3 winbindd[73959]: [2022/07/25 12:45:38.713649, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:45:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:45:38 node3 winbindd[73959]: [2022/07/25 12:45:38.714526, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:45:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:50:38 node3 winbindd[73959]: [2022/07/25 12:50:38.715140, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:50:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:50:38 node3 winbindd[73959]: [2022/07/25 12:50:38.716001, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:50:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:55:38 node3 winbindd[73959]: [2022/07/25 12:55:38.715959, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:55:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:55:38 node3 winbindd[73959]: [2022/07/25 12:55:38.716811, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:55:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERRORTested also write a fake stat:
echo "PetaSAN.NodeStats.node3.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 IP_NODE3 2003
Connection to IP_NODE3 2003 port [tcp/cfinger] succeeded!
Edit1: Found the following log about stats in one of the nodes: /opt/petasan/log/stats.log
[2022-07-24 10:01:50] plugin_load: plugin "write_graphite" successfully loaded.
[2022-07-24 10:01:50] plugin_load: plugin "python" successfully loaded.
[2022-07-24 10:01:50] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2022-07-24 10:01:50] Systemd detected, trying to signal readiness.Edit2: I noticed the following on the config of collectd at /opt/petasan/config/stats/collectd/collectd.conf
On line #39 the import is commented as show below, however I double checked my backup of "/opt/petasan" before the upgrade and I the configs files seems are the same.
<Plugin "python">
ModulePath "/usr/lib/collectd/plugins/ceph"
Import "ceph_pool_plugin"
Import "ceph_monitor_plugin"
Import "ceph_osd_plugin"
Import "ceph_pg_plugin"
#Import "ceph_latency_plugin"<Module "ceph_pool_plugin">
Verbose "False"
Cluster "CLUSTER_NAME"
Interval "60"
</Module>Edit2: Followed the following threads: [1] [2] but either one didn't resolv the issue.
[1] http://www.petasan.org/forums/?view=thread&id=1043
[2] http://www.petasan.org/forums/?view=thread&id=1022
Hi,
I updated our 3 nodes cluster yesterday from 3.0.1 to 3.1.0, upgrade was smoth and I rebooted all nodes (one at a time) and no issues, however I noticed that the graph statistics do not work after the upgrade (statistics of more than 1 day are shown). Cluster is on production so unfortunately I can't reboot the nodes. Posting here for awareness of the issue, thanks in advance for any idea about where to look.
Currently the Node leader of the cluster is Node #3 all the following outputs are from this node.
tail /opt/petasan/log/PetaSAN.log
24/07/2022 10:17:53 INFO Start reassigning resource "S3-x-x-x-9" from node "node2" to node "node3".
24/07/2022 10:17:55 INFO S3Server : sync Consul settings
24/07/2022 10:17:55 INFO S3Server : sync Consul settings -> done
24/07/2022 10:17:55 INFO LockBase : Try to acquire the resource = S3-x-x-x-9.
24/07/2022 10:17:59 INFO LockBase : Succeeded on acquiring the resource = S3-x-x-x-9
24/07/2022 10:18:02 INFO S3Server : sync Consul settings
24/07/2022 10:18:02 INFO S3Server : sync Consul settings -> done
24/07/2022 10:18:06 INFO LockBase : Reassignment job finished.
24/07/2022 10:18:46 INFO Success saving application config
25/07/2022 06:25:03 INFO GlusterFS mount attempt
./check_interfaces_match.py
cluster management interface eth0
node management interface eth0
management interface match
cluster eth count 6
node eth count 8
Error: eth count mis-match !!
detected interfaces
eth0
eth1
eth2
eth3
eth4
eth5
bond0
bond1
cat /etc/udev/rules.d/70-persistent-net.rules
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp6s0f0, ASSIGNED_NAME=eth0
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp6s0f1, ASSIGNED_NAME=eth1
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp1s0f0, ASSIGNED_NAME=eth2
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp1s0f1, ASSIGNED_NAME=eth3
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp129s0f0, ASSIGNED_NAME=eth4
# ADDED BY PETASAN, DO NOT MODIFY : DEFAULT_NAME=enp129s0f1, ASSIGNED_NAME=eth5
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f4", ATTR{type}=="1", NAME="eth0"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f5", ATTR{type}=="1", NAME="eth1"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f0", ATTR{type}=="1", NAME="eth2"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="24:6e:96:EDITED:f2", ATTR{type}=="1", NAME="eth3"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="a0:36:9f:EDITED:c4", ATTR{type}=="1", NAME="eth4"
SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="a0:36:9f:EDITED:c6", ATTR{type}=="1", NAME="eth5"
./get_cluster_leader.py
{'node3': 'MGMT_IP_NODE_#3'}
systemctl list-units -t service | grep petasan
petasan-admin.service loaded active running PetaSAN Web Management and Administration
petasan-cifs.service loaded active running PetaSAN CIFS Service
petasan-cluster-leader.service loaded active running PetaSAN Cluster Leader
petasan-config-upload.service loaded active exited PetaSAN Config Upload
petasan-console.service loaded active running PetaSAN Node Console
petasan-deploy.service loaded active running PetaSAN Node Deployment
petasan-file-sync.service loaded active running PetaSAN File Sync Service
petasan-iscsi.service loaded active running PetaSAN iSCSI Service
petasan-mount-sharedfs.service loaded active running PetaSAN Mount SharedFS
petasan-node-stats.service loaded active running PetaSAN Node Stats Service
petasan-notification.service loaded active running PetaSAN Notification Service
petasan-qperf.service loaded active running PetaSAN qperf server
petasan-s3.service loaded active running PetaSAN S3 Service
petasan-start-osds.service loaded active exited PetaSAN Start All OSDs service
petasan-start-services.service loaded active exited PetaSAN Start Services
petasan-sync-replication-node.service loaded active exited PetaSAN Sync Replication Node
petasan-tuning.service loaded active exited PetaSAN Tuning Service
petasan-update-node-info.service loaded active exited PetaSAN Update Node Info
systemctl status petasan-node-stats
● petasan-node-stats.service - PetaSAN Node Stats Service
Loaded: loaded (/lib/systemd/system/petasan-node-stats.service; static; vendor preset: enabled)
Active: active (running) since Mon 2022-07-25 10:54:43 EDT; 1h 56min ago
Main PID: 1390975 (node_stats.py)
Tasks: 1 (limit: 154527)
Memory: 34.5M
CGroup: /system.slice/petasan-node-stats.service
└─1390975 /usr/bin/python3 /opt/petasan/scripts/node_stats.py
Jul 25 10:54:43 node3 systemd[1]: Started PetaSAN Node Stats Service.
Also I noticed I started to get the following winbindd errors after the upgrade:
journalctl -u winbind
Jul 24 09:09:29 node3 winbindd[1561334]: [2022/07/24 09:09:29.094735, 0] ../../source3/winbindd/winbindd_cache.c:3203(initialize_winbindd_cache)
Jul 24 09:09:29 node3 winbindd[1561334]: initialize_winbindd_cache: clearing cache and re-creating with version number 2
Jul 24 09:09:29 node3 winbindd[1561334]: [2022/07/24 09:09:29.111228, 0] ../../lib/util/become_daemon.c:135(daemon_ready)
Jul 24 09:09:29 node3 winbindd[1561334]: daemon_ready: daemon 'winbindd' finished starting up and ready to serve connections
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.116379, 0] ../../source3/winbindd/winbindd_cm.c:1873(wb_open_internal_pipe)
Jul 24 09:09:29 node3 winbindd[1561336]: open_internal_pipe: Could not connect to dssetup pipe: NT_STATUS_RPC_INTERFACE_NOT_FOUND
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.116603, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:2E - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.119170, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:29 node3 winbindd[1561336]: [2022/07/24 09:09:29.119949, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:09:29 node3 winbindd[1561336]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:09:46 node3 winbindd[1561334]: [2022/07/24 09:09:46.370550, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:09:46 node3 winbindd[1561334]: Got sig[15] terminate (is_parent=1)
Jul 24 09:10:30 node3 winbindd[1570749]: [2022/07/24 09:10:30.866997, 0] ../../source3/winbindd/winbindd_cache.c:3203(initialize_winbindd_cache)
Jul 24 09:10:30 node3 winbindd[1570749]: initialize_winbindd_cache: clearing cache and re-creating with version number 2
Jul 24 09:10:30 node3 winbindd[1570749]: [2022/07/24 09:10:30.868686, 0] ../../lib/util/become_daemon.c:135(daemon_ready)
Jul 24 09:10:30 node3 winbindd[1570749]: daemon_ready: daemon 'winbindd' finished starting up and ready to serve connections
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.882919, 0] ../../source3/winbindd/winbindd_cm.c:1873(wb_open_internal_pipe)
Jul 24 09:10:30 node3 winbindd[1570752]: open_internal_pipe: Could not connect to dssetup pipe: NT_STATUS_RPC_INTERFACE_NOT_FOUND
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.883118, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:2E - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.883937, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:10:30 node3 winbindd[1570752]: [2022/07/24 09:10:30.884747, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:10:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:15:30 node3 winbindd[1570752]: [2022/07/24 09:15:30.887676, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:15:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:15:30 node3 winbindd[1570752]: [2022/07/24 09:15:30.888530, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 24 09:15:30 node3 winbindd[1570752]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 24 09:16:02 node3 winbindd[1570749]: [2022/07/24 09:16:02.944054, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:16:02 node3 winbindd[1570752]: [2022/07/24 09:16:02.944059, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
Jul 24 09:16:02 node3 winbindd[1570794]: [2022/07/24 09:16:02.944054, 0] ../../source3/winbindd/winbindd.c:247(winbindd_sig_term_handler)
...
Jul 24 10:05:38 node3 winbindd[57625]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120613, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120754, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57625]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120792, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57625]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120817, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57625]: [2022/07/24 10:05:38.120842, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57625]: reinit_after_fork() failed
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.122917, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57628]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123088, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123226, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57628]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123263, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57628]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123282, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57628]: [2022/07/24 10:05:38.123300, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57628]: reinit_after_fork() failed
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125280, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57631]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125442, 0] ../../source3/lib/util.c:500(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: messaging_reinit() failed: NT_STATUS_IO_DEVICE_ERROR
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125539, 0] ../../source3/lib/ctdbd_conn.c:494(ctdbd_init_connection)
Jul 24 10:05:38 node3 winbindd[57631]: ctdbd_init_connection: ctdbd_init_connection_internal failed (Input/output error)
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125565, 0] ../../source3/lib/dbwrap/dbwrap_ctdb.c:103(ctdb_async_ctx_init_internal)
Jul 24 10:05:38 node3 winbindd[57631]: ctdb_async_ctx_init_internal: ctdbd_init_connection failed
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125583, 0] ../../source3/lib/util.c:508(reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: reinit_after_fork: db_ctdb_async_ctx_reinit failed: No such file or directory
Jul 24 10:05:38 node3 winbindd[57631]: [2022/07/24 10:05:38.125600, 0] ../../source3/winbindd/winbindd_dual.c:1566(winbindd_reinit_after_fork)
Jul 24 10:05:38 node3 winbindd[57631]: reinit_after_fork() failed
...
Jul 25 12:35:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:40:38 node3 winbindd[73959]: [2022/07/25 12:40:38.712010, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:40:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:40:38 node3 winbindd[73959]: [2022/07/25 12:40:38.712873, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:40:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:45:38 node3 winbindd[73959]: [2022/07/25 12:45:38.713649, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:45:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:45:38 node3 winbindd[73959]: [2022/07/25 12:45:38.714526, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:45:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:50:38 node3 winbindd[73959]: [2022/07/25 12:50:38.715140, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:50:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:50:38 node3 winbindd[73959]: [2022/07/25 12:50:38.716001, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:50:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:55:38 node3 winbindd[73959]: [2022/07/25 12:55:38.715959, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:55:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Jul 25 12:55:38 node3 winbindd[73959]: [2022/07/25 12:55:38.716811, 0] ../../source3/rpc_server/rpc_ncacn_np.c:454(rpcint_dispatch)
Jul 25 12:55:38 node3 winbindd[73959]: rpcint_dispatch: DCE/RPC fault in call lsarpc:32 - DCERPC_NCA_S_OP_RNG_ERROR
Tested also write a fake stat:
echo "PetaSAN.NodeStats.node3.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 IP_NODE3 2003
Connection to IP_NODE3 2003 port [tcp/cfinger] succeeded!
Edit1: Found the following log about stats in one of the nodes: /opt/petasan/log/stats.log
[2022-07-24 10:01:50] plugin_load: plugin "write_graphite" successfully loaded.
[2022-07-24 10:01:50] plugin_load: plugin "python" successfully loaded.
[2022-07-24 10:01:50] python plugin: Found a configuration for the "ceph_latency_plugin" plugin, but the plugin isn't loaded or didn't register a configuration callback.
[2022-07-24 10:01:50] Systemd detected, trying to signal readiness.
Edit2: I noticed the following on the config of collectd at /opt/petasan/config/stats/collectd/collectd.conf
On line #39 the import is commented as show below, however I double checked my backup of "/opt/petasan" before the upgrade and I the configs files seems are the same.
<Plugin "python">
ModulePath "/usr/lib/collectd/plugins/ceph"
Import "ceph_pool_plugin"
Import "ceph_monitor_plugin"
Import "ceph_osd_plugin"
Import "ceph_pg_plugin"
#Import "ceph_latency_plugin"
<Module "ceph_pool_plugin">
Verbose "False"
Cluster "CLUSTER_NAME"
Interval "60"
</Module>
Edit2: Followed the following threads: [1] [2] but either one didn't resolv the issue.
[1] http://www.petasan.org/forums/?view=thread&id=1043
[2] http://www.petasan.org/forums/?view=thread&id=1022
admin
2,930 Posts
Quote from admin on July 26, 2022, 11:49 pm1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?3) is shared file system mounted on all nodes ?
mount | grep shared4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.pyon that server, what is status of graphite service
systemctl status carbon-cache5) what is output of
gluster vol status gfs-vol
1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?
3) is shared file system mounted on all nodes ?
mount | grep shared
4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
on that server, what is status of graphite service
systemctl status carbon-cache
5) what is output of
gluster vol status gfs-vol
JG
26 Posts
Quote from JG on July 26, 2022, 11:59 pmQuote from admin on July 26, 2022, 11:49 pm1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
The chart is visible, no error but no data. If I choose a diferent date/time with the dropdown menus I can see data for the time before the upgrade.
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?
No re-ocurring errors at all, in fact the logs are a lot cleaner (no weird error messages) than before the upgrade. The below messages are from the leader node of the cluster.
24/07/2022 10:18:06 INFO LockBase : Reassignment job finished.
24/07/2022 10:18:46 INFO Success saving application config
25/07/2022 06:25:03 INFO GlusterFS mount attempt
26/07/2022 06:25:17 INFO GlusterFS mount attempt3) is shared file system mounted on all nodes ?
mount | grep sharedIt's mounted on all nodes.
4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py{'node3': 'MGMT_IP_NODE3'}
on that server, what is status of graphite service
systemctl status carbon-cacheService is up and running.
carbon-cache.service - Graphite Carbon Cache
Loaded: loaded (/lib/systemd/system/carbon-cache.service; disabled; vendor preset: enabled)
Active: active (running) since Sun 2022-07-24 10:01:49 EDT; 2 days ago
Docs: https://graphite.readthedocs.io
Process: 54843 ExecStart=/usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache.pid --logdir=/var/lo
g/carbon/ start (code=exited, status=0/SUCCESS)
Main PID: 54857 (carbon-cache)
Tasks: 3 (limit: 154527)
Memory: 1.8G
CGroup: /system.slice/carbon-cache.service
└─54857 /usr/bin/python3 /usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache.pid --logdi
r=/var/log/carbon/ startJul 24 10:00:48 node3 systemd[1]: Starting Graphite Carbon Cache...
Jul 24 10:01:49 node3 systemd[1]: Started Graphite Carbon Cache.5) what is output of
gluster vol status gfs-vol
Status of volume: gfs-vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick x.x.x.1:/opt/petasan/config/gfs-br
ick 49152 0 Y 2013
Brick x.x.x..2:/opt/petasan/config/gfs-br
ick 49153 0 Y 2018
Brick x.x.x.3:/opt/petasan/config/gfs-br
ick N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 2019
Self-heal Daemon on x.x.x.1 N/A N/A N N/A
Self-heal Daemon on x.x.x.2 N/A N/A N N/ATask Status of Volume gfs-vol
------------------------------------------------------------------------------
There are no active volume tasks
Quote from admin on July 26, 2022, 11:49 pm1) Is the issue the charts not visible or they are visible with no data or with error ? If no data is it for both cluster stats and node stats ?
The chart is visible, no error but no data. If I choose a diferent date/time with the dropdown menus I can see data for the time before the upgrade.
2) Do you see any re-occuring stats error in /opt/petasan/log/PetaSAN.log ?
No re-ocurring errors at all, in fact the logs are a lot cleaner (no weird error messages) than before the upgrade. The below messages are from the leader node of the cluster.
24/07/2022 10:18:06 INFO LockBase : Reassignment job finished.
24/07/2022 10:18:46 INFO Success saving application config
25/07/2022 06:25:03 INFO GlusterFS mount attempt
26/07/2022 06:25:17 INFO GlusterFS mount attempt
3) is shared file system mounted on all nodes ?
mount | grep shared
It's mounted on all nodes.
4) get the stats server ip from
/opt/petasan/scripts/util/get_cluster_leader.py
{'node3': 'MGMT_IP_NODE3'}
on that server, what is status of graphite service
systemctl status carbon-cache
Service is up and running.
carbon-cache.service - Graphite Carbon Cache
Loaded: loaded (/lib/systemd/system/carbon-cache.service; disabled; vendor preset: enabled)
Active: active (running) since Sun 2022-07-24 10:01:49 EDT; 2 days ago
Docs: https://graphite.readthedocs.io
Process: 54843 ExecStart=/usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache.pid --logdir=/var/lo
g/carbon/ start (code=exited, status=0/SUCCESS)
Main PID: 54857 (carbon-cache)
Tasks: 3 (limit: 154527)
Memory: 1.8G
CGroup: /system.slice/carbon-cache.service
└─54857 /usr/bin/python3 /usr/bin/carbon-cache --config=/etc/carbon/carbon.conf --pidfile=/var/run/carbon-cache.pid --logdi
r=/var/log/carbon/ start
Jul 24 10:00:48 node3 systemd[1]: Starting Graphite Carbon Cache...
Jul 24 10:01:49 node3 systemd[1]: Started Graphite Carbon Cache.
5) what is output of
gluster vol status gfs-vol
Status of volume: gfs-vol
Gluster process TCP Port RDMA Port Online Pid
------------------------------------------------------------------------------
Brick x.x.x.1:/opt/petasan/config/gfs-br
ick 49152 0 Y 2013
Brick x.x.x..2:/opt/petasan/config/gfs-br
ick 49153 0 Y 2018
Brick x.x.x.3:/opt/petasan/config/gfs-br
ick N/A N/A N N/A
Self-heal Daemon on localhost N/A N/A Y 2019
Self-heal Daemon on x.x.x.1 N/A N/A N N/A
Self-heal Daemon on x.x.x.2 N/A N/A N N/A
Task Status of Volume gfs-vol
------------------------------------------------------------------------------
There are no active volume tasks
admin
2,930 Posts
Quote from admin on July 27, 2022, 12:32 amSo i understand the chart background is showing, there are no errors, the old data points from cluster and node stats show up but new data points from both cluster and node stats do not show up. correct me if wrong.
1) on node which is stats server, can you try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh2) If this does not fix issues, on all nodes do a
systemctl restart petasan-node-stats
3) do you see any log errors in
/var/log/carbon/
4) Try to manually write a fake 50% cpu stats from any node and see if we get errors or if it shows up on chart
then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.Node01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003does the cpu value show up on graph (wait about 1 min), do you get errors ?
5) what is output of
dpkg -l | grep netcat
So i understand the chart background is showing, there are no errors, the old data points from cluster and node stats show up but new data points from both cluster and node stats do not show up. correct me if wrong.
1) on node which is stats server, can you try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
2) If this does not fix issues, on all nodes do a
systemctl restart petasan-node-stats
3) do you see any log errors in
/var/log/carbon/
4) Try to manually write a fake 50% cpu stats from any node and see if we get errors or if it shows up on chart
then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.Node01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003
does the cpu value show up on graph (wait about 1 min), do you get errors ?
5) what is output of
dpkg -l | grep netcat
JG
26 Posts
Quote from JG on July 27, 2022, 12:40 amQuote from admin on July 27, 2022, 12:32 amSo i understand the chart background is showing, there are no errors, the old data points from cluster and node stats show up but new data points from both cluster and node stats do not show up. correct me if wrong.
This is correct
1) on node which is stats server, can you try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.shThis did the trick and fixed the issue. Thanks a lot for all the work.
2) If this does not fix issues, on all nodes do a
systemctl restart petasan-node-stats
3) do you see any log errors in
/var/log/carbon/
4) Try to manually write a fake 50% cpu stats from any node and see if we get errors or if it shows up on chart
then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.Node01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003does the cpu value show up on graph (wait about 1 min), do you get errors ?
5) what is output of
dpkg -l | grep netcat
Quote from admin on July 27, 2022, 12:32 amSo i understand the chart background is showing, there are no errors, the old data points from cluster and node stats show up but new data points from both cluster and node stats do not show up. correct me if wrong.
This is correct
1) on node which is stats server, can you try
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
This did the trick and fixed the issue. Thanks a lot for all the work.
2) If this does not fix issues, on all nodes do a
systemctl restart petasan-node-stats
3) do you see any log errors in
/var/log/carbon/
4) Try to manually write a fake 50% cpu stats from any node and see if we get errors or if it shows up on chart
then send command via netcat, syntax is
echo "PetaSAN.NodeStats.NODE_NAME.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 STATS_SERVER_IP 2003
example
echo "PetaSAN.NodeStats.Node01.cpu_all.percent_util 50 `date +%s`" | nc -v -q0 10.0.1.13 2003does the cpu value show up on graph (wait about 1 min), do you get errors ?
5) what is output of
dpkg -l | grep netcat
admin
2,930 Posts
Quote from admin on July 27, 2022, 1:05 amExcellent 🙂 glad things worked.
the commands that did the trick should have run in the online update script, for example if you upgraded from 3.01, the upgrade script
/opt/petasan/scripts/online-updates/3.0.1/update
should already do it:
if (systemctl is-active --quiet carbon-cache)
then
echo "Restarting stats services"
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
fiWe do test ourselves and do test upgrades and it does work, but maybe there are corner conditions if others do see it let us know
Excellent 🙂 glad things worked.
the commands that did the trick should have run in the online update script, for example if you upgraded from 3.01, the upgrade script
/opt/petasan/scripts/online-updates/3.0.1/update
should already do it:
if (systemctl is-active --quiet carbon-cache)
then
echo "Restarting stats services"
/opt/petasan/scripts/stats-stop.sh
/opt/petasan/scripts/stats-setup.sh
/opt/petasan/scripts/stats-start.sh
fi
We do test ourselves and do test upgrades and it does work, but maybe there are corner conditions if others do see it let us know