iSCSI disk list gone
Pages: 1 2
wailer
75 Posts
October 28, 2020, 1:10 pmQuote from wailer on October 28, 2020, 1:10 pmUPDATE:
Okay I managed to get everything back to normal after waiting all recovery tasks to finish and restarting ceph-mon for the node which had the high load and 1 stuck slow op.
Now, we simulated a node failure (replica 3, min size 2) , and everything works when 1 node is down. But... when this node gets back up..the iscsi disks stops and ceph status shows inactive pg's that recovers after some minutes.. Then we must start iSCSI disks manually.
Is that expected behaviour? Service interruption when failed node gets back?
UPDATE:
Okay I managed to get everything back to normal after waiting all recovery tasks to finish and restarting ceph-mon for the node which had the high load and 1 stuck slow op.
Now, we simulated a node failure (replica 3, min size 2) , and everything works when 1 node is down. But... when this node gets back up..the iscsi disks stops and ceph status shows inactive pg's that recovers after some minutes.. Then we must start iSCSI disks manually.
Is that expected behaviour? Service interruption when failed node gets back?
admin
2,930 Posts
October 28, 2020, 1:46 pmQuote from admin on October 28, 2020, 1:46 pmNo when the node is back up, the pgs should not be in an inactive state. pgs being in-active resulted in iSCSI disk being down.
you need to look into more detail what causes the pgs to go inactive, try to lower the recovery and backfill speeds when you bring the node back if it is a load related issue but it could be hardware as well.
No when the node is back up, the pgs should not be in an inactive state. pgs being in-active resulted in iSCSI disk being down.
you need to look into more detail what causes the pgs to go inactive, try to lower the recovery and backfill speeds when you bring the node back if it is a load related issue but it could be hardware as well.
wailer
75 Posts
October 30, 2020, 9:28 amQuote from wailer on October 30, 2020, 9:28 amFinally solved by upgrading to 2.6.2. Thanks!
Finally solved by upgrading to 2.6.2. Thanks!
Pages: 1 2
iSCSI disk list gone
wailer
75 Posts
Quote from wailer on October 28, 2020, 1:10 pmUPDATE:
Okay I managed to get everything back to normal after waiting all recovery tasks to finish and restarting ceph-mon for the node which had the high load and 1 stuck slow op.
Now, we simulated a node failure (replica 3, min size 2) , and everything works when 1 node is down. But... when this node gets back up..the iscsi disks stops and ceph status shows inactive pg's that recovers after some minutes.. Then we must start iSCSI disks manually.
Is that expected behaviour? Service interruption when failed node gets back?
UPDATE:
Okay I managed to get everything back to normal after waiting all recovery tasks to finish and restarting ceph-mon for the node which had the high load and 1 stuck slow op.
Now, we simulated a node failure (replica 3, min size 2) , and everything works when 1 node is down. But... when this node gets back up..the iscsi disks stops and ceph status shows inactive pg's that recovers after some minutes.. Then we must start iSCSI disks manually.
Is that expected behaviour? Service interruption when failed node gets back?
admin
2,930 Posts
Quote from admin on October 28, 2020, 1:46 pmNo when the node is back up, the pgs should not be in an inactive state. pgs being in-active resulted in iSCSI disk being down.
you need to look into more detail what causes the pgs to go inactive, try to lower the recovery and backfill speeds when you bring the node back if it is a load related issue but it could be hardware as well.
No when the node is back up, the pgs should not be in an inactive state. pgs being in-active resulted in iSCSI disk being down.
you need to look into more detail what causes the pgs to go inactive, try to lower the recovery and backfill speeds when you bring the node back if it is a load related issue but it could be hardware as well.
wailer
75 Posts
Quote from wailer on October 30, 2020, 9:28 amFinally solved by upgrading to 2.6.2. Thanks!
Finally solved by upgrading to 2.6.2. Thanks!