8 OSD's went down
philip.shannon
37 Posts
July 25, 2017, 12:33 pmQuote from philip.shannon on July 25, 2017, 12:33 pmI noticed that 8 OSD's went down (out of 10 on this system). We have 4 nodes each with 10 OSD's so usually there are 40 OSD's up but for some reason when I logged in on Monday 8 were down. Here is the info from the petasan log for this node:
21/07/2017 10:41:15 ERROR
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:32 WARNING , retrying in 1 seconds...
21/07/2017 10:41:35 INFO GlusterFS mount attempt
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:40 WARNING , retrying in 2 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:50 WARNING , retrying in 4 seconds...
21/07/2017 10:42:01 WARNING , retrying in 8 seconds...
21/07/2017 10:42:02 WARNING , retrying in 16 seconds...
21/07/2017 10:42:02 WARNING , retrying in 16 seconds...
21/07/2017 10:42:03 WARNING , retrying in 16 seconds...
21/07/2017 10:42:08 INFO GlusterFS mount attempt
21/07/2017 10:42:16 WARNING , retrying in 16 seconds...
21/07/2017 10:42:25 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 561, in handle_cluster_startup
result = consul_api.set_leader_startup_time(current_node_name, str(i))
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 305, in set_leader_startup_time
return consul_obj.kv.put(ConfigAPI().get_consul_leaders_path()+node_name,minutes)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 459, in put
'/v1/kv/%s' % key, params=params, data=value)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 82, in put
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:26 ERROR Error during __proces.
21/07/2017 10:42:26 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 84, in start
self.__session = ConsulAPI().get_new_session_ID(self.__session_name,self.__node_info.name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 38, in get_new_session_ID
self.drop_all_node_sessions(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 282, in drop_all_node_sessions
sessions = self.get_sessions_dict(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 250, in get_sessions_dict
for sess in consul_obj.session.list()[1]:
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 1440, in list
'/v1/session/list', params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:26 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/file_sync_manager.py", line 75, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/base.py", line 49, in watch
return cons.kv.get(key, index=current_index, recurse=True)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 391, in get
callback, '/v1/kv/%s' % key, params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:34 WARNING , retrying in 1 seconds...
21/07/2017 10:42:35 WARNING , retrying in 1 seconds...
21/07/2017 10:42:37 WARNING , retrying in 1 seconds...
21/07/2017 10:42:39 ERROR
21/07/2017 10:42:41 INFO GlusterFS mount attempt
21/07/2017 10:42:43 WARNING , retrying in 2 seconds...
21/07/2017 10:42:43 WARNING , retrying in 2 seconds...
21/07/2017 10:42:45 WARNING , retrying in 2 seconds...
21/07/2017 10:42:52 WARNING , retrying in 4 seconds...
21/07/2017 10:42:53 WARNING , retrying in 4 seconds...
21/07/2017 10:42:54 WARNING , retrying in 4 seconds...
21/07/2017 10:42:57 WARNING , retrying in 1 seconds...
21/07/2017 10:43:03 WARNING , retrying in 8 seconds...
21/07/2017 10:43:04 WARNING , retrying in 8 seconds...
21/07/2017 10:43:05 WARNING , retrying in 2 seconds...
21/07/2017 10:43:05 WARNING , retrying in 8 seconds...
21/07/2017 10:43:14 WARNING , retrying in 4 seconds...
21/07/2017 10:43:14 INFO GlusterFS mount attempt
21/07/2017 10:43:18 WARNING , retrying in 16 seconds...
21/07/2017 10:43:19 WARNING , retrying in 16 seconds...
21/07/2017 10:43:20 WARNING , retrying in 16 seconds...
21/07/2017 10:43:25 WARNING , retrying in 8 seconds...
21/07/2017 10:43:40 WARNING , retrying in 16 seconds...
21/07/2017 10:43:41 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 561, in handle_cluster_startup
result = consul_api.set_leader_startup_time(current_node_name, str(i))
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 305, in set_leader_startup_time
return consul_obj.kv.put(ConfigAPI().get_consul_leaders_path()+node_name,minutes)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 459, in put
'/v1/kv/%s' % key, params=params, data=value)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 82, in put
raise RetryConsulException()
RetryConsulException
21/07/2017 10:43:42 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/file_sync_manager.py", line 75, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/base.py", line 49, in watch
return cons.kv.get(key, index=current_index, recurse=True)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 391, in get
callback, '/v1/kv/%s' % key, params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:43:44 ERROR Error during __proces.
21/07/2017 10:43:44 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 84, in start
self.__session = ConsulAPI().get_new_session_ID(self.__session_name,self.__node_info.name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 38, in get_new_session_ID
self.drop_all_node_sessions(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 282, in drop_all_node_sessions
sessions = self.get_sessions_dict(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 250, in get_sessions_dict
for sess in consul_obj.session.list()[1]:
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 1440, in list
'/v1/session/list', params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
I noticed that 8 OSD's went down (out of 10 on this system). We have 4 nodes each with 10 OSD's so usually there are 40 OSD's up but for some reason when I logged in on Monday 8 were down. Here is the info from the petasan log for this node:
21/07/2017 10:41:15 ERROR
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:32 WARNING , retrying in 1 seconds...
21/07/2017 10:41:35 INFO GlusterFS mount attempt
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:40 WARNING , retrying in 2 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:50 WARNING , retrying in 4 seconds...
21/07/2017 10:42:01 WARNING , retrying in 8 seconds...
21/07/2017 10:42:02 WARNING , retrying in 16 seconds...
21/07/2017 10:42:02 WARNING , retrying in 16 seconds...
21/07/2017 10:42:03 WARNING , retrying in 16 seconds...
21/07/2017 10:42:08 INFO GlusterFS mount attempt
21/07/2017 10:42:16 WARNING , retrying in 16 seconds...
21/07/2017 10:42:25 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 561, in handle_cluster_startup
result = consul_api.set_leader_startup_time(current_node_name, str(i))
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 305, in set_leader_startup_time
return consul_obj.kv.put(ConfigAPI().get_consul_leaders_path()+node_name,minutes)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 459, in put
'/v1/kv/%s' % key, params=params, data=value)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 82, in put
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:26 ERROR Error during __proces.
21/07/2017 10:42:26 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 84, in start
self.__session = ConsulAPI().get_new_session_ID(self.__session_name,self.__node_info.name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 38, in get_new_session_ID
self.drop_all_node_sessions(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 282, in drop_all_node_sessions
sessions = self.get_sessions_dict(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 250, in get_sessions_dict
for sess in consul_obj.session.list()[1]:
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 1440, in list
'/v1/session/list', params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:26 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/file_sync_manager.py", line 75, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/base.py", line 49, in watch
return cons.kv.get(key, index=current_index, recurse=True)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 391, in get
callback, '/v1/kv/%s' % key, params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:34 WARNING , retrying in 1 seconds...
21/07/2017 10:42:35 WARNING , retrying in 1 seconds...
21/07/2017 10:42:37 WARNING , retrying in 1 seconds...
21/07/2017 10:42:39 ERROR
21/07/2017 10:42:41 INFO GlusterFS mount attempt
21/07/2017 10:42:43 WARNING , retrying in 2 seconds...
21/07/2017 10:42:43 WARNING , retrying in 2 seconds...
21/07/2017 10:42:45 WARNING , retrying in 2 seconds...
21/07/2017 10:42:52 WARNING , retrying in 4 seconds...
21/07/2017 10:42:53 WARNING , retrying in 4 seconds...
21/07/2017 10:42:54 WARNING , retrying in 4 seconds...
21/07/2017 10:42:57 WARNING , retrying in 1 seconds...
21/07/2017 10:43:03 WARNING , retrying in 8 seconds...
21/07/2017 10:43:04 WARNING , retrying in 8 seconds...
21/07/2017 10:43:05 WARNING , retrying in 2 seconds...
21/07/2017 10:43:05 WARNING , retrying in 8 seconds...
21/07/2017 10:43:14 WARNING , retrying in 4 seconds...
21/07/2017 10:43:14 INFO GlusterFS mount attempt
21/07/2017 10:43:18 WARNING , retrying in 16 seconds...
21/07/2017 10:43:19 WARNING , retrying in 16 seconds...
21/07/2017 10:43:20 WARNING , retrying in 16 seconds...
21/07/2017 10:43:25 WARNING , retrying in 8 seconds...
21/07/2017 10:43:40 WARNING , retrying in 16 seconds...
21/07/2017 10:43:41 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 561, in handle_cluster_startup
result = consul_api.set_leader_startup_time(current_node_name, str(i))
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 305, in set_leader_startup_time
return consul_obj.kv.put(ConfigAPI().get_consul_leaders_path()+node_name,minutes)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 459, in put
'/v1/kv/%s' % key, params=params, data=value)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 82, in put
raise RetryConsulException()
RetryConsulException
21/07/2017 10:43:42 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/file_sync_manager.py", line 75, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/base.py", line 49, in watch
return cons.kv.get(key, index=current_index, recurse=True)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 391, in get
callback, '/v1/kv/%s' % key, params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:43:44 ERROR Error during __proces.
21/07/2017 10:43:44 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 84, in start
self.__session = ConsulAPI().get_new_session_ID(self.__session_name,self.__node_info.name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 38, in get_new_session_ID
self.drop_all_node_sessions(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 282, in drop_all_node_sessions
sessions = self.get_sessions_dict(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 250, in get_sessions_dict
for sess in consul_obj.session.list()[1]:
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 1440, in list
'/v1/session/list', params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
admin
2,930 Posts
July 25, 2017, 1:12 pmQuote from admin on July 25, 2017, 1:12 pmIt appears like a networking connectivity problem. There are various services components like Ceph, Consul and Gluster all seem to be running in the cluster, yet on the problem node they are all failing, most of the logs are the Consul code not able to communicate with the remaining cluster, it keeps retrying then aborts.
Try via ssh or via the console menu to ping from that node to other nodes on the different subnets, specifically on backend 1 which the above service use. If the ping is ok i would try to reboot and see if this fixes things. Also if you suspect network issues, you can use bonded interfaces. Please let me know how things work out.
It appears like a networking connectivity problem. There are various services components like Ceph, Consul and Gluster all seem to be running in the cluster, yet on the problem node they are all failing, most of the logs are the Consul code not able to communicate with the remaining cluster, it keeps retrying then aborts.
Try via ssh or via the console menu to ping from that node to other nodes on the different subnets, specifically on backend 1 which the above service use. If the ping is ok i would try to reboot and see if this fixes things. Also if you suspect network issues, you can use bonded interfaces. Please let me know how things work out.
8 OSD's went down
philip.shannon
37 Posts
Quote from philip.shannon on July 25, 2017, 12:33 pmI noticed that 8 OSD's went down (out of 10 on this system). We have 4 nodes each with 10 OSD's so usually there are 40 OSD's up but for some reason when I logged in on Monday 8 were down. Here is the info from the petasan log for this node:
21/07/2017 10:41:15 ERROR
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:32 WARNING , retrying in 1 seconds...
21/07/2017 10:41:35 INFO GlusterFS mount attempt
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:40 WARNING , retrying in 2 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:50 WARNING , retrying in 4 seconds...
21/07/2017 10:42:01 WARNING , retrying in 8 seconds...
21/07/2017 10:42:02 WARNING , retrying in 16 seconds...
21/07/2017 10:42:02 WARNING , retrying in 16 seconds...
21/07/2017 10:42:03 WARNING , retrying in 16 seconds...
21/07/2017 10:42:08 INFO GlusterFS mount attempt
21/07/2017 10:42:16 WARNING , retrying in 16 seconds...
21/07/2017 10:42:25 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 561, in handle_cluster_startup
result = consul_api.set_leader_startup_time(current_node_name, str(i))
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 305, in set_leader_startup_time
return consul_obj.kv.put(ConfigAPI().get_consul_leaders_path()+node_name,minutes)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 459, in put
'/v1/kv/%s' % key, params=params, data=value)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 82, in put
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:26 ERROR Error during __proces.
21/07/2017 10:42:26 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 84, in start
self.__session = ConsulAPI().get_new_session_ID(self.__session_name,self.__node_info.name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 38, in get_new_session_ID
self.drop_all_node_sessions(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 282, in drop_all_node_sessions
sessions = self.get_sessions_dict(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 250, in get_sessions_dict
for sess in consul_obj.session.list()[1]:
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 1440, in list
'/v1/session/list', params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:26 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/file_sync_manager.py", line 75, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/base.py", line 49, in watch
return cons.kv.get(key, index=current_index, recurse=True)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 391, in get
callback, '/v1/kv/%s' % key, params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:34 WARNING , retrying in 1 seconds...
21/07/2017 10:42:35 WARNING , retrying in 1 seconds...
21/07/2017 10:42:37 WARNING , retrying in 1 seconds...
21/07/2017 10:42:39 ERROR
21/07/2017 10:42:41 INFO GlusterFS mount attempt
21/07/2017 10:42:43 WARNING , retrying in 2 seconds...
21/07/2017 10:42:43 WARNING , retrying in 2 seconds...
21/07/2017 10:42:45 WARNING , retrying in 2 seconds...
21/07/2017 10:42:52 WARNING , retrying in 4 seconds...
21/07/2017 10:42:53 WARNING , retrying in 4 seconds...
21/07/2017 10:42:54 WARNING , retrying in 4 seconds...
21/07/2017 10:42:57 WARNING , retrying in 1 seconds...
21/07/2017 10:43:03 WARNING , retrying in 8 seconds...
21/07/2017 10:43:04 WARNING , retrying in 8 seconds...
21/07/2017 10:43:05 WARNING , retrying in 2 seconds...
21/07/2017 10:43:05 WARNING , retrying in 8 seconds...
21/07/2017 10:43:14 WARNING , retrying in 4 seconds...
21/07/2017 10:43:14 INFO GlusterFS mount attempt
21/07/2017 10:43:18 WARNING , retrying in 16 seconds...
21/07/2017 10:43:19 WARNING , retrying in 16 seconds...
21/07/2017 10:43:20 WARNING , retrying in 16 seconds...
21/07/2017 10:43:25 WARNING , retrying in 8 seconds...
21/07/2017 10:43:40 WARNING , retrying in 16 seconds...
21/07/2017 10:43:41 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 561, in handle_cluster_startup
result = consul_api.set_leader_startup_time(current_node_name, str(i))
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 305, in set_leader_startup_time
return consul_obj.kv.put(ConfigAPI().get_consul_leaders_path()+node_name,minutes)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 459, in put
'/v1/kv/%s' % key, params=params, data=value)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 82, in put
raise RetryConsulException()
RetryConsulException
21/07/2017 10:43:42 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/file_sync_manager.py", line 75, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/base.py", line 49, in watch
return cons.kv.get(key, index=current_index, recurse=True)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 391, in get
callback, '/v1/kv/%s' % key, params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:43:44 ERROR Error during __proces.
21/07/2017 10:43:44 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 84, in start
self.__session = ConsulAPI().get_new_session_ID(self.__session_name,self.__node_info.name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 38, in get_new_session_ID
self.drop_all_node_sessions(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 282, in drop_all_node_sessions
sessions = self.get_sessions_dict(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 250, in get_sessions_dict
for sess in consul_obj.session.list()[1]:
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 1440, in list
'/v1/session/list', params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
I noticed that 8 OSD's went down (out of 10 on this system). We have 4 nodes each with 10 OSD's so usually there are 40 OSD's up but for some reason when I logged in on Monday 8 were down. Here is the info from the petasan log for this node:
21/07/2017 10:41:15 ERROR
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:19 WARNING , retrying in 1 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:27 WARNING , retrying in 2 seconds...
21/07/2017 10:41:32 WARNING , retrying in 1 seconds...
21/07/2017 10:41:35 INFO GlusterFS mount attempt
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:36 WARNING , retrying in 4 seconds...
21/07/2017 10:41:40 WARNING , retrying in 2 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:47 WARNING , retrying in 8 seconds...
21/07/2017 10:41:50 WARNING , retrying in 4 seconds...
21/07/2017 10:42:01 WARNING , retrying in 8 seconds...
21/07/2017 10:42:02 WARNING , retrying in 16 seconds...
21/07/2017 10:42:02 WARNING , retrying in 16 seconds...
21/07/2017 10:42:03 WARNING , retrying in 16 seconds...
21/07/2017 10:42:08 INFO GlusterFS mount attempt
21/07/2017 10:42:16 WARNING , retrying in 16 seconds...
21/07/2017 10:42:25 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 561, in handle_cluster_startup
result = consul_api.set_leader_startup_time(current_node_name, str(i))
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 305, in set_leader_startup_time
return consul_obj.kv.put(ConfigAPI().get_consul_leaders_path()+node_name,minutes)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 459, in put
'/v1/kv/%s' % key, params=params, data=value)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 82, in put
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:26 ERROR Error during __proces.
21/07/2017 10:42:26 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 84, in start
self.__session = ConsulAPI().get_new_session_ID(self.__session_name,self.__node_info.name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 38, in get_new_session_ID
self.drop_all_node_sessions(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 282, in drop_all_node_sessions
sessions = self.get_sessions_dict(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 250, in get_sessions_dict
for sess in consul_obj.session.list()[1]:
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 1440, in list
'/v1/session/list', params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:26 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/file_sync_manager.py", line 75, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/base.py", line 49, in watch
return cons.kv.get(key, index=current_index, recurse=True)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 391, in get
callback, '/v1/kv/%s' % key, params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:42:34 WARNING , retrying in 1 seconds...
21/07/2017 10:42:35 WARNING , retrying in 1 seconds...
21/07/2017 10:42:37 WARNING , retrying in 1 seconds...
21/07/2017 10:42:39 ERROR
21/07/2017 10:42:41 INFO GlusterFS mount attempt
21/07/2017 10:42:43 WARNING , retrying in 2 seconds...
21/07/2017 10:42:43 WARNING , retrying in 2 seconds...
21/07/2017 10:42:45 WARNING , retrying in 2 seconds...
21/07/2017 10:42:52 WARNING , retrying in 4 seconds...
21/07/2017 10:42:53 WARNING , retrying in 4 seconds...
21/07/2017 10:42:54 WARNING , retrying in 4 seconds...
21/07/2017 10:42:57 WARNING , retrying in 1 seconds...
21/07/2017 10:43:03 WARNING , retrying in 8 seconds...
21/07/2017 10:43:04 WARNING , retrying in 8 seconds...
21/07/2017 10:43:05 WARNING , retrying in 2 seconds...
21/07/2017 10:43:05 WARNING , retrying in 8 seconds...
21/07/2017 10:43:14 WARNING , retrying in 4 seconds...
21/07/2017 10:43:14 INFO GlusterFS mount attempt
21/07/2017 10:43:18 WARNING , retrying in 16 seconds...
21/07/2017 10:43:19 WARNING , retrying in 16 seconds...
21/07/2017 10:43:20 WARNING , retrying in 16 seconds...
21/07/2017 10:43:25 WARNING , retrying in 8 seconds...
21/07/2017 10:43:40 WARNING , retrying in 16 seconds...
21/07/2017 10:43:41 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 561, in handle_cluster_startup
result = consul_api.set_leader_startup_time(current_node_name, str(i))
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 305, in set_leader_startup_time
return consul_obj.kv.put(ConfigAPI().get_consul_leaders_path()+node_name,minutes)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 459, in put
'/v1/kv/%s' % key, params=params, data=value)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 82, in put
raise RetryConsulException()
RetryConsulException
21/07/2017 10:43:42 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/file_sync_manager.py", line 75, in sync
index, data = base.watch(self.root_path, current_index)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/base.py", line 49, in watch
return cons.kv.get(key, index=current_index, recurse=True)
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 391, in get
callback, '/v1/kv/%s' % key, params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
21/07/2017 10:43:44 ERROR Error during __proces.
21/07/2017 10:43:44 ERROR
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/iscsi_service.py", line 84, in start
self.__session = ConsulAPI().get_new_session_ID(self.__session_name,self.__node_info.name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 38, in get_new_session_ID
self.drop_all_node_sessions(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 282, in drop_all_node_sessions
sessions = self.get_sessions_dict(session_name,node_name)
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/api.py", line 250, in get_sessions_dict
for sess in consul_obj.session.list()[1]:
File "/usr/local/lib/python2.7/dist-packages/consul/base.py", line 1440, in list
'/v1/session/list', params=params)
File "/usr/local/lib/python2.7/dist-packages/retry/compat.py", line 16, in wrapper
return caller(f, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 74, in retry_decorator
logger)
File "/usr/local/lib/python2.7/dist-packages/retry/api.py", line 33, in __retry_internal
return f()
File "/usr/lib/python2.7/dist-packages/PetaSAN/core/consul/ps_consul.py", line 71, in get
raise RetryConsulException()
RetryConsulException
admin
2,930 Posts
Quote from admin on July 25, 2017, 1:12 pmIt appears like a networking connectivity problem. There are various services components like Ceph, Consul and Gluster all seem to be running in the cluster, yet on the problem node they are all failing, most of the logs are the Consul code not able to communicate with the remaining cluster, it keeps retrying then aborts.
Try via ssh or via the console menu to ping from that node to other nodes on the different subnets, specifically on backend 1 which the above service use. If the ping is ok i would try to reboot and see if this fixes things. Also if you suspect network issues, you can use bonded interfaces. Please let me know how things work out.
It appears like a networking connectivity problem. There are various services components like Ceph, Consul and Gluster all seem to be running in the cluster, yet on the problem node they are all failing, most of the logs are the Consul code not able to communicate with the remaining cluster, it keeps retrying then aborts.
Try via ssh or via the console menu to ping from that node to other nodes on the different subnets, specifically on backend 1 which the above service use. If the ping is ok i would try to reboot and see if this fixes things. Also if you suspect network issues, you can use bonded interfaces. Please let me know how things work out.