Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Iscsi not start

Pages: 1 2

Hi,

 

When i included the new node, the petasan show the osd2 down and show me just the option for exclude him.

The new node48 show the OSDs 3 and 7, and all OSDs is up and started.

root@node48:~# ceph health detail --cluster loks
HEALTH_WARN Reduced data availability: 55 pgs inactive, 55 pgs incomplete; Degraded data redundancy: 55 pgs unclean
PG_AVAILABILITY Reduced data availability: 55 pgs inactive, 55 pgs incomplete
pg 1.7 is incomplete, acting [7,4]
pg 1.a is incomplete, acting [5,7]
pg 1.10 is incomplete, acting [3,5]
pg 1.15 is incomplete, acting [3,5]
pg 1.1c is incomplete, acting [5,3]
pg 1.22 is incomplete, acting [7,5]
pg 1.24 is incomplete, acting [5,7]
pg 1.2d is incomplete, acting [3,5]
pg 1.2e is incomplete, acting [3,5]
pg 1.32 is incomplete, acting [7,4]
pg 1.34 is incomplete, acting [4,3]
pg 1.38 is incomplete, acting [4,7]
pg 1.41 is incomplete, acting [3,4]
pg 1.42 is incomplete, acting [4,3]
pg 1.4e is incomplete, acting [3,5]
pg 1.55 is incomplete, acting [4,3]
pg 1.56 is incomplete, acting [3,4]
pg 1.5e is incomplete, acting [7,4]
pg 1.64 is incomplete, acting [4,7]
pg 1.6a is incomplete, acting [3,4]
pg 1.6e is incomplete, acting [5,7]
pg 1.70 is incomplete, acting [5,7]
pg 1.72 is incomplete, acting [3,4]
pg 1.81 is incomplete, acting [4,7]
pg 1.84 is incomplete, acting [4,3]
pg 1.8d is incomplete, acting [4,3]
pg 1.92 is incomplete, acting [5,3]
pg 1.94 is incomplete, acting [3,5]
pg 1.97 is incomplete, acting [3,4]
pg 1.9d is incomplete, acting [4,3]
pg 1.a1 is incomplete, acting [5,7]
pg 1.a3 is incomplete, acting [4,3]
pg 1.a4 is incomplete, acting [3,5]
pg 1.a7 is incomplete, acting [4,3]
pg 1.ab is incomplete, acting [7,5]
pg 1.ac is incomplete, acting [3,4]
pg 1.b2 is stuck inactive for 308250.991831, current state incomplete, last acting [5,3]
pg 1.c8 is incomplete, acting [5,7]
pg 1.ca is incomplete, acting [7,4]
pg 1.ce is incomplete, acting [3,5]
pg 1.d3 is incomplete, acting [5,3]
pg 1.d4 is incomplete, acting [5,7]
pg 1.d5 is incomplete, acting [3,5]
pg 1.d7 is incomplete, acting [7,5]
pg 1.d8 is incomplete, acting [7,4]
pg 1.d9 is incomplete, acting [3,5]
pg 1.e4 is incomplete, acting [4,3]
pg 1.e5 is incomplete, acting [4,7]
pg 1.e8 is incomplete, acting [3,4]
pg 1.ea is incomplete, acting [7,6]
pg 1.fc is incomplete, acting [7,4]
PG_DEGRADED Degraded data redundancy: 55 pgs unclean
pg 1.7 is stuck unclean since forever, current state incomplete, last acting [7,4]
pg 1.a is stuck unclean for 338320.011063, current state incomplete, last acting [5,7]
pg 1.10 is stuck unclean since forever, current state incomplete, last acting [3,5]
pg 1.15 is stuck unclean since forever, current state incomplete, last acting [3,5]
pg 1.1c is stuck unclean for 339842.526709, current state incomplete, last acting [5,3]
pg 1.22 is stuck unclean since forever, current state incomplete, last acting [7,5]
pg 1.24 is stuck unclean for 337308.425823, current state incomplete, last acting [5,7]
pg 1.2d is stuck unclean since forever, current state incomplete, last acting [3,5]
pg 1.2e is stuck unclean since forever, current state incomplete, last acting [3,5]
pg 1.32 is stuck unclean since forever, current state incomplete, last acting [7,4]
pg 1.34 is stuck unclean for 337069.758337, current state incomplete, last acting [4,3]
pg 1.38 is stuck unclean for 337601.590150, current state incomplete, last acting [4,7]
pg 1.41 is stuck unclean since forever, current state incomplete, last acting [3,4]
pg 1.42 is stuck unclean for 337060.232791, current state incomplete, last acting [4,3]
pg 1.4e is stuck unclean since forever, current state incomplete, last acting [3,5]
pg 1.55 is stuck unclean for 337017.317347, current state incomplete, last acting [4,3]
pg 1.56 is stuck unclean since forever, current state incomplete, last acting [3,4]
pg 1.5e is stuck unclean since forever, current state incomplete, last acting [7,4]
pg 1.64 is stuck unclean for 350982.214792, current state incomplete, last acting [4,7]
pg 1.6a is stuck unclean since forever, current state incomplete, last acting [3,4]
pg 1.6e is stuck unclean for 337086.227150, current state incomplete, last acting [5,7]
pg 1.70 is stuck unclean for 337088.813288, current state incomplete, last acting [5,7]
pg 1.72 is stuck unclean since forever, current state incomplete, last acting [3,4]
pg 1.81 is stuck unclean for 337200.926400, current state incomplete, last acting [4,7]
pg 1.84 is stuck unclean for 337072.863306, current state incomplete, last acting [4,3]
pg 1.8d is stuck unclean for 337026.276733, current state incomplete, last acting [4,3]
pg 1.92 is stuck unclean for 337286.897884, current state incomplete, last acting [5,3]
pg 1.94 is stuck unclean since forever, current state incomplete, last acting [3,5]
pg 1.97 is stuck unclean since forever, current state incomplete, last acting [3,4]
pg 1.9d is stuck unclean for 337027.837860, current state incomplete, last acting [4,3]
pg 1.a1 is stuck unclean for 337019.656995, current state incomplete, last acting [5,7]
pg 1.a3 is stuck unclean for 337896.060862, current state incomplete, last acting [4,3]
pg 1.a4 is stuck unclean since forever, current state incomplete, last acting [3,5]
pg 1.a7 is stuck unclean for 339452.951245, current state incomplete, last acting [4,3]
pg 1.ab is stuck unclean since forever, current state incomplete, last acting [7,5]
pg 1.ac is stuck unclean since forever, current state incomplete, last acting [3,4]
pg 1.b2 is stuck unclean for 337076.245823, current state incomplete, last acting [5,3]
pg 1.c8 is stuck unclean for 337027.407565, current state incomplete, last acting [5,7]
pg 1.ca is stuck unclean since forever, current state incomplete, last acting [7,4]
pg 1.ce is stuck unclean since forever, current state incomplete, last acting [3,5]
pg 1.d3 is stuck unclean for 338267.489514, current state incomplete, last acting [5,3]
pg 1.d4 is stuck unclean for 337923.960109, current state incomplete, last acting [5,7]
pg 1.d5 is stuck unclean since forever, current state incomplete, last acting [3,5]
pg 1.d7 is stuck unclean since forever, current state incomplete, last acting [7,5]
pg 1.d8 is stuck unclean since forever, current state incomplete, last acting [7,4]
pg 1.d9 is stuck unclean since forever, current state incomplete, last acting [3,5]
pg 1.e4 is stuck unclean for 337071.145460, current state incomplete, last acting [4,3]
pg 1.e5 is stuck unclean for 337885.022022, current state incomplete, last acting [4,7]
pg 1.e8 is stuck unclean since forever, current state incomplete, last acting [3,4]
pg 1.ea is stuck unclean since forever, current state incomplete, last acting [7,6]
pg 1.fc is stuck unclean since forever, current state incomplete, last acting [7,4]

root@node48:~# ceph pg 1.10  query --cluster loks

"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-12-03 23:01:24.176814",
"comment": "not enough complete instances of this PG"
},
{
"name": "Started/Primary/Peering",
"enter_time": "2018-12-03 23:01:24.140892",
"past_intervals": [
{
"first": "7293",
"last": "8250",
"all_participants": [
{
"osd": 0
},
{
"osd": 1
},
{
"osd": 2
},
{
"osd": 3
},
{
"osd": 5
},
{
"osd": 6
}
],
"intervals": [
{
"first": "7613",
"last": "7625",
"acting": "2"
},
{
"first": "7957",
"last": "7958",
"acting": "0"
},
{
"first": "8224",
"last": "8225",
"acting": "3"
},
{
"first": "8248",
"last": "8250",
"acting": "5"
}
]
}
],
"probing_osds": [
"0",
"1",
"3",
"5",
"6"
],
"down_osds_we_would_probe": [
2
],
"peering_blocked_by": [],
"peering_blocked_by_detail": [
{
"detail": "peering_blocked_by_history_les_bound"
}
]
},
{
"name": "Started",
"enter_time": "2018-12-03 23:01:24.140852"
}
],

 

root@node48:~# ceph pg 1.b2  query --cluster loks

],
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-12-03 23:01:24.101226",
"comment": "not enough complete instances of this PG"
},
{
"name": "Started/Primary/Peering",
"enter_time": "2018-12-03 23:01:24.098373",
"past_intervals": [
{
"first": "7293",
"last": "8250",
"all_participants": [
{
"osd": 2
},
{
"osd": 3
},
{
"osd": 5
},
{
"osd": 6
}
],
"intervals": [
{
"first": "7613",
"last": "7625",
"acting": "2"
},
{
"first": "7957",
"last": "7960",
"acting": "6"
},
{
"first": "8224",
"last": "8225",
"acting": "3"
},
{
"first": "8248",
"last": "8250",
"acting": "5"
}
]
}
],
"probing_osds": [
"3",
"5",
"6"
],
"down_osds_we_would_probe": [
2
],
"peering_blocked_by": [],
"peering_blocked_by_detail": [
{
"detail": "peering_blocked_by_history_les_bound"
}
]
},
{
"name": "Started",
"enter_time": "2018-12-03 23:01:24.098337"
}
],
"agent_state": {}
}
root@node48:~# ceph pg 1.b2  query --cluster loks

It is not clear for me if OSD 2 is still around or not, it is not listed in the ceph osd tree output, this will happen if it has been deleted but i need you to confirm this. Also is OSD 7 another disk or was OSD 2 deleted and re-added ? OSD 3 has not been deleted correct ?

If OSD 2 is still physically around, then the next best thing aside from trying to re-start it is to attempt to extract object in it, so please clarify this so i can help you with extracting this.

If however OSD 2 was deleted, then we can proceed with more risk that may result in some data loss, hopefully it will be data than was written the moment the power failure occurred but it may be more, so it is not without risk,  We will make 2 attempts:

in the /etc/ceph/loks.conf on all nodes add this to the bottom:

osd_find_best_info_ignore_history_les=true

apply to all nodes at bottom of file then reboot.

If after 20 min the cluster is stuck and not recovering/changing  (check ceph status, if pg status is changing slowly ), do a

ceph osd lost 2 --cluster loks

Again not do this unless you know the that OSD 2 is no more available, even if it is available as a physical disk out of the cluster, we may be able to extract data from it.

Hi,

The osd2 really was deleted, so i did the change in /etc/ceph/loks.conf and works, the iscsi is ok again.

Thank you so much for you time and help!

 

 

Excellent 🙂  remember to delete this setting from the config file and restart the nodes one at a time when cluster is active / clean, if you have io it is better re-assign the paths to other nodes while rebooting.

Pages: 1 2