Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Iscsi Disk freeze

We have a cluster of 3 nodes, the datas are no is redundancy and 2 iscsi disks. The two disks are accessible by ping and telnet on the port but on our iscsi initiator the status on one disk is reconnecting. We trying to stop the disk he steel stopping. We have to restart every time, and it happens every day.

How can I do to resolved it ??

Can you provide more info. did you rule out hardware and network ? what is the configuration of hardware ? how many osds ? is the cluster status Ok or is it in error ? the charts for PG Status : does it show are pgs active at time of failure ? do the charts for mons and osds show any down at time of failure ? does it happen everyday at some specific load condition like backup jobs ? the charts for disk, cpu, mem % utilization are they ok at time of failure or saturated ? do you see any errors in logs ( PetaSAN/syslog/ceph) ?

what do you mean by  the datas are no is redundancy

no, we haven't a rule on hardware and network. we have 48 osds, ceph status.

ceph health show an error    "1/2129038 objects unfound (0.000%)
Possible data damage: 1 pg recovery_unfound
Degraded data redundancy: 1/2129088 objects degraded (0.000%), 1 pg degraded
1 pgs not deep-scrubbed in time
1 pgs not scrubbed in time
1 slow ops, oldest one blocked for 62569 sec, mon.NODEO2 has slow ops

cluster:
id: bf167be6-46ed-4c6d-bb3e-72e466994805
health: HEALTH_ERR
1/2129038 objects unfound (0.000%)
Possible data damage: 1 pg recovery_unfound
Degraded data redundancy: 1/2129088 objects degraded (0.000%), 1 pg degraded
1 pgs not deep-scrubbed in time
1 pgs not scrubbed in time
1 slow ops, oldest one blocked for 63836 sec, mon.NODEO2 has slow ops

services:
mon: 3 daemons, quorum NODEO3,NODEO1,NODEO2 (age 17h)
mgr: NODEO2(active, since 17h), standbys: NODEO3, NODEO1
mds: cephfs:1 {0=NODEO3=up:active} 2 up:standby
osd: 48 osds: 48 up (since 17h), 48 in (since 17h)

data:
pools: 4 pools, 1216 pgs
objects: 2.13M objects, 8.0 TiB
usage: 8.0 TiB used, 78 TiB / 86 TiB avail
pgs: 1/2129088 objects degraded (0.000%)
1/2129038 objects unfound (0.000%)
1215 active+clean
1 active+recovery_unfound+degraded

io:
client: 6.4 KiB/s rd, 7 op/s rd, 0 op/s wr

1) what do you mean by  "the datas are no is redundancy" ?

2) the cluster state is in error HEALTH_ERR, there is 1 pg with recovery_unfound, this is preventing the iSCSI disks from working. Can you trace from the PG Status charts when this happened and do you recall anything occurred at that time ?

3) find the pg with this error

ceph health detail

then show the output of:
ceph pg PG list_unfound
ceph pg PG query