Iscsi not start
Pages: 1 2
Lerner
7 Posts
December 1, 2018, 9:29 pmQuote from Lerner on December 1, 2018, 9:29 pmGood afternoon,
I have a problem to start iscsi disk after a power outage.
When i click in iscsi disk list, not open.
Can help us with this problem? Have a way to copy the files from servers?
ceph health
2018-12-01 18:26:22.724485 7efe3ad8f700 -1 Errors while parsing config file!
2018-12-01 18:26:22.724506 7efe3ad8f700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-12-01 18:26:22.724508 7efe3ad8f700 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-12-01 18:26:22.724509 7efe3ad8f700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
consul members
Node Address Status Type Build Protocol DC
node41 192.168.184.41:8301 alive server 0.7.3 2 petasan
node42 192.168.184.42:8301 alive client 0.7.3 2 petasan
node44 192.168.184.44:8301 alive server 0.7.3 2 petasan
node48 192.168.184.48:8301 failed server 0.7.3 2 petasan
Thanks
Good afternoon,
I have a problem to start iscsi disk after a power outage.
When i click in iscsi disk list, not open.
Can help us with this problem? Have a way to copy the files from servers?
ceph health
2018-12-01 18:26:22.724485 7efe3ad8f700 -1 Errors while parsing config file!
2018-12-01 18:26:22.724506 7efe3ad8f700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-12-01 18:26:22.724508 7efe3ad8f700 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-12-01 18:26:22.724509 7efe3ad8f700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
consul members
Node Address Status Type Build Protocol DC
node41 192.168.184.41:8301 alive server 0.7.3 2 petasan
node42 192.168.184.42:8301 alive client 0.7.3 2 petasan
node44 192.168.184.44:8301 alive server 0.7.3 2 petasan
node48 192.168.184.48:8301 failed server 0.7.3 2 petasan
Thanks
admin
2,930 Posts
December 2, 2018, 11:43 amQuote from admin on December 2, 2018, 11:43 amit is probably a Ceph layer issue rather than iscsi
You need to check Ceph: if Ceph is recovering, then it may take time before it gets active again, , else if it is stuck then need to find out why via cli commands. Note you need to add the --cluster XX parameter to your commands, where XX is the name of your cluster.
ceph health detail --cluster XX
For PGs that are stuck, try to find out why via
ceph pg XX query --cluster XX
and look at the “recovery_status” sections.
What type/model of disks do you have ?
it is probably a Ceph layer issue rather than iscsi
You need to check Ceph: if Ceph is recovering, then it may take time before it gets active again, , else if it is stuck then need to find out why via cli commands. Note you need to add the --cluster XX parameter to your commands, where XX is the name of your cluster.
ceph health detail --cluster XX
For PGs that are stuck, try to find out why via
ceph pg XX query --cluster XX
and look at the “recovery_status” sections.
What type/model of disks do you have ?
Lerner
7 Posts
December 2, 2018, 4:22 pmQuote from Lerner on December 2, 2018, 4:22 pmHi,
Thank you for you answer.
The disks are the most SATA/Seagate.
Follow the output of commands (ceph not start i dont know why)
ceph health detail --cluster loks
HEALTH_ERR Reduced data availability: 55 pgs inactive, 27 pgs down, 28 pgs incomplete; Degraded data redundancy: 55 pgs unclean; 7 stuck requests are blocked > 4096 sec; 1/3 mons down, quorum node41,node44
PG_AVAILABILITY Reduced data availability: 55 pgs inactive, 27 pgs down, 28 pgs incomplete
pg 1.7 is incomplete, acting [5,6]
pg 1.a is down, acting [5,6]
pg 1.10 is down, acting [0,5]
pg 1.15 is incomplete, acting [6,1]
pg 1.1c is incomplete, acting [5,6]
pg 1.22 is down, acting [5,6]
pg 1.24 is down, acting [5,0]
pg 1.2d is incomplete, acting [6,5]
pg 1.2e is down, acting [5,6]
pg 1.32 is down, acting [4,1]
pg 1.34 is incomplete, acting [4,6]
pg 1.38 is down, acting [4,0]
pg 1.41 is incomplete, acting [1,4]
pg 1.42 is incomplete, acting [4,0]
pg 1.4e is incomplete, acting [4,1]
pg 1.55 is incomplete, acting [4,1]
pg 1.56 is down, acting [1,4]
pg 1.5e is incomplete, acting [6,4]
pg 1.64 is down, acting [4,1]
pg 1.6a is down, acting [6,4]
pg 1.6e is down, acting [5,6]
pg 1.70 is down, acting [5,0]
pg 1.72 is down, acting [0,4]
pg 1.81 is incomplete, acting [4,1]
pg 1.84 is incomplete, acting [4,0]
pg 1.8d is incomplete, acting [4,6]
pg 1.92 is incomplete, acting [5,0]
pg 1.94 is incomplete, acting [1,5]
pg 1.97 is down, acting [0,4]
pg 1.9d is incomplete, acting [4,6]
pg 1.a1 is incomplete, acting [5,0]
pg 1.a3 is incomplete, acting [4,6]
pg 1.a4 is incomplete, acting [5,6]
pg 1.a7 is incomplete, acting [4,0]
pg 1.ab is down, acting [0,5]
pg 1.ac is down, acting [5,0]
pg 1.b2 is stuck inactive for 147747.228969, current state down, last acting [5,6]
pg 1.c8 is down, acting [5,6]
pg 1.ca is down, acting [5,6]
pg 1.ce is incomplete, acting [5,1]
pg 1.d3 is down, acting [5,0]
pg 1.d4 is down, acting [5,6]
pg 1.d5 is incomplete, acting [4,1]
pg 1.d7 is down, acting [1,5]
pg 1.d8 is down, acting [1,4]
pg 1.d9 is incomplete, acting [1,5]
pg 1.e4 is incomplete, acting [4,6]
pg 1.e5 is incomplete, acting [4,1]
pg 1.e8 is down, acting [0,6]
pg 1.ea is incomplete, acting [6,5]
pg 1.fc is incomplete, acting [1,5]
PG_DEGRADED Degraded data redundancy: 55 pgs unclean
pg 1.7 is stuck unclean since forever, current state incomplete, last acting [5,6]
pg 1.a is stuck unclean for 177816.248200, current state down, last acting [5,6]
pg 1.10 is stuck unclean since forever, current state down, last acting [0,5]
pg 1.15 is stuck unclean since forever, current state incomplete, last acting [6,1]
pg 1.1c is stuck unclean for 179338.763846, current state incomplete, last acting [5,6]
pg 1.22 is stuck unclean for 176573.873492, current state down, last acting [5,6]
pg 1.24 is stuck unclean for 176804.662961, current state down, last acting [5,0]
pg 1.2d is stuck unclean since forever, current state incomplete, last acting [6,5]
pg 1.2e is stuck unclean for 177810.929634, current state down, last acting [5,6]
pg 1.32 is stuck unclean for 178030.266384, current state down, last acting [4,1]
pg 1.34 is stuck unclean for 176565.995475, current state incomplete, last acting [4,6]
pg 1.38 is stuck unclean for 177097.827288, current state down, last acting [4,0]
pg 1.41 is stuck unclean since forever, current state incomplete, last acting [1,4]
pg 1.42 is stuck unclean for 176556.469928, current state incomplete, last acting [4,0]
pg 1.4e is stuck unclean since forever, current state incomplete, last acting [4,1]
pg 1.55 is stuck unclean for 176513.554485, current state incomplete, last acting [4,1]
pg 1.56 is stuck unclean since forever, current state down, last acting [1,4]
pg 1.5e is stuck unclean since forever, current state incomplete, last acting [6,4]
pg 1.64 is stuck unclean for 190478.451929, current state down, last acting [4,1]
pg 1.6a is stuck unclean since forever, current state down, last acting [6,4]
pg 1.6e is stuck unclean for 176582.464287, current state down, last acting [5,6]
pg 1.70 is stuck unclean for 176585.050426, current state down, last acting [5,0]
pg 1.72 is stuck unclean since forever, current state down, last acting [0,4]
pg 1.81 is stuck unclean for 176697.163538, current state incomplete, last acting [4,1]
pg 1.84 is stuck unclean for 176569.100444, current state incomplete, last acting [4,0]
pg 1.8d is stuck unclean for 176522.513870, current state incomplete, last acting [4,6]
pg 1.92 is stuck unclean for 176783.135021, current state incomplete, last acting [5,0]
pg 1.94 is stuck unclean since forever, current state incomplete, last acting [1,5]
pg 1.97 is stuck unclean since forever, current state down, last acting [0,4]
pg 1.9d is stuck unclean for 176524.074997, current state incomplete, last acting [4,6]
pg 1.a1 is stuck unclean for 176515.894133, current state incomplete, last acting [5,0]
pg 1.a3 is stuck unclean for 177392.298000, current state incomplete, last acting [4,6]
pg 1.a4 is stuck unclean for 176521.983648, current state incomplete, last acting [5,6]
pg 1.a7 is stuck unclean for 178949.188383, current state incomplete, last acting [4,0]
pg 1.ab is stuck unclean since forever, current state down, last acting [0,5]
pg 1.ac is stuck unclean since forever, current state down, last acting [5,0]
pg 1.b2 is stuck unclean for 176572.482961, current state down, last acting [5,6]
pg 1.c8 is stuck unclean for 176523.644703, current state down, last acting [5,6]
pg 1.ca is stuck unclean since forever, current state down, last acting [5,6]
pg 1.ce is stuck unclean for 177798.955780, current state incomplete, last acting [5,1]
pg 1.d3 is stuck unclean for 177763.726652, current state down, last acting [5,0]
pg 1.d4 is stuck unclean for 177420.197246, current state down, last acting [5,6]
pg 1.d5 is stuck unclean since forever, current state incomplete, last acting [4,1]
pg 1.d7 is stuck unclean since forever, current state down, last acting [1,5]
pg 1.d8 is stuck unclean since forever, current state down, last acting [1,4]
pg 1.d9 is stuck unclean since forever, current state incomplete, last acting [1,5]
pg 1.e4 is stuck unclean for 176567.382597, current state incomplete, last acting [4,6]
pg 1.e5 is stuck unclean for 177381.259159, current state incomplete, last acting [4,1]
pg 1.e8 is stuck unclean since forever, current state down, last acting [0,6]
pg 1.ea is stuck unclean since forever, current state incomplete, last acting [6,5]
pg 1.fc is stuck unclean since forever, current state incomplete, last acting [1,5]
REQUEST_STUCK 7 stuck requests are blocked > 4096 sec
7 ops are blocked > 134218 sec
osd.5 has stuck requests > 134218 sec
MON_DOWN 1/3 mons down, quorum node41,node44
mon.node48 (rank 2) addr 192.168.184.48:6789/0 is down (out of quorum)
root@node41:~# ceph pg loks query --cluster loks
no valid command found; 10 closest matches:
pg force_create_pg <pgid>
pg set_nearfull_ratio <float[0.0-1.0]>
pg set_full_ratio <float[0.0-1.0]>
pg map <pgid>
pg ls {<int>} {<states> [<states>...]}
pg dump_stuck {inactive|unclean|stale|undersized|degraded [inactive|unclean|stale|undersized|degraded...]} {<int>}
pg ls-by-primary <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg ls-by-osd <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg dump_pools_json
pg ls-by-pool <poolstr> {<states> [<states>...]}
Error EINVAL: invalid command
root@node41:~# ceph pg query --cluster loks
no valid command found; 10 closest matches:
pg force_create_pg <pgid>
pg set_nearfull_ratio <float[0.0-1.0]>
pg set_full_ratio <float[0.0-1.0]>
pg map <pgid>
pg ls {<int>} {<states> [<states>...]}
pg dump_stuck {inactive|unclean|stale|undersized|degraded [inactive|unclean|stale|undersized|degraded...]} {<int>}
pg ls-by-primary <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg ls-by-osd <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg dump_pools_json
pg ls-by-pool <poolstr> {<states> [<states>...]}
Error EINVAL: invalid command
root@node41:~# ceph query --cluster loks
no valid command found; 10 closest matches:
mon dump {<int[0-]>}
mon stat
fs set_default <fs_name>
fs set-default <fs_name>
fs add_data_pool <fs_name> <pool>
fs rm_data_pool <fs_name> <pool>
fs set <fs_name> max_mds|max_file_size|allow_new_snaps|inline_data|cluster_down|allow_multimds|allow_dirfrags|balancer|standby_count_wanted <val> {<confirm>}
fs flag set enable_multiple <val> {--yes-i-really-mean-it}
fs ls
fs get <fs_name>
Error EINVAL: invalid command
ceph pg loks query
2018-12-02 13:20:07.052092 7faabef80700 -1 Errors while parsing config file!
2018-12-02 13:20:07.052097 7faabef80700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-12-02 13:20:07.052098 7faabef80700 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-12-02 13:20:07.052099 7faabef80700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
Hi,
Thank you for you answer.
The disks are the most SATA/Seagate.
Follow the output of commands (ceph not start i dont know why)
ceph health detail --cluster loks
HEALTH_ERR Reduced data availability: 55 pgs inactive, 27 pgs down, 28 pgs incomplete; Degraded data redundancy: 55 pgs unclean; 7 stuck requests are blocked > 4096 sec; 1/3 mons down, quorum node41,node44
PG_AVAILABILITY Reduced data availability: 55 pgs inactive, 27 pgs down, 28 pgs incomplete
pg 1.7 is incomplete, acting [5,6]
pg 1.a is down, acting [5,6]
pg 1.10 is down, acting [0,5]
pg 1.15 is incomplete, acting [6,1]
pg 1.1c is incomplete, acting [5,6]
pg 1.22 is down, acting [5,6]
pg 1.24 is down, acting [5,0]
pg 1.2d is incomplete, acting [6,5]
pg 1.2e is down, acting [5,6]
pg 1.32 is down, acting [4,1]
pg 1.34 is incomplete, acting [4,6]
pg 1.38 is down, acting [4,0]
pg 1.41 is incomplete, acting [1,4]
pg 1.42 is incomplete, acting [4,0]
pg 1.4e is incomplete, acting [4,1]
pg 1.55 is incomplete, acting [4,1]
pg 1.56 is down, acting [1,4]
pg 1.5e is incomplete, acting [6,4]
pg 1.64 is down, acting [4,1]
pg 1.6a is down, acting [6,4]
pg 1.6e is down, acting [5,6]
pg 1.70 is down, acting [5,0]
pg 1.72 is down, acting [0,4]
pg 1.81 is incomplete, acting [4,1]
pg 1.84 is incomplete, acting [4,0]
pg 1.8d is incomplete, acting [4,6]
pg 1.92 is incomplete, acting [5,0]
pg 1.94 is incomplete, acting [1,5]
pg 1.97 is down, acting [0,4]
pg 1.9d is incomplete, acting [4,6]
pg 1.a1 is incomplete, acting [5,0]
pg 1.a3 is incomplete, acting [4,6]
pg 1.a4 is incomplete, acting [5,6]
pg 1.a7 is incomplete, acting [4,0]
pg 1.ab is down, acting [0,5]
pg 1.ac is down, acting [5,0]
pg 1.b2 is stuck inactive for 147747.228969, current state down, last acting [5,6]
pg 1.c8 is down, acting [5,6]
pg 1.ca is down, acting [5,6]
pg 1.ce is incomplete, acting [5,1]
pg 1.d3 is down, acting [5,0]
pg 1.d4 is down, acting [5,6]
pg 1.d5 is incomplete, acting [4,1]
pg 1.d7 is down, acting [1,5]
pg 1.d8 is down, acting [1,4]
pg 1.d9 is incomplete, acting [1,5]
pg 1.e4 is incomplete, acting [4,6]
pg 1.e5 is incomplete, acting [4,1]
pg 1.e8 is down, acting [0,6]
pg 1.ea is incomplete, acting [6,5]
pg 1.fc is incomplete, acting [1,5]
PG_DEGRADED Degraded data redundancy: 55 pgs unclean
pg 1.7 is stuck unclean since forever, current state incomplete, last acting [5,6]
pg 1.a is stuck unclean for 177816.248200, current state down, last acting [5,6]
pg 1.10 is stuck unclean since forever, current state down, last acting [0,5]
pg 1.15 is stuck unclean since forever, current state incomplete, last acting [6,1]
pg 1.1c is stuck unclean for 179338.763846, current state incomplete, last acting [5,6]
pg 1.22 is stuck unclean for 176573.873492, current state down, last acting [5,6]
pg 1.24 is stuck unclean for 176804.662961, current state down, last acting [5,0]
pg 1.2d is stuck unclean since forever, current state incomplete, last acting [6,5]
pg 1.2e is stuck unclean for 177810.929634, current state down, last acting [5,6]
pg 1.32 is stuck unclean for 178030.266384, current state down, last acting [4,1]
pg 1.34 is stuck unclean for 176565.995475, current state incomplete, last acting [4,6]
pg 1.38 is stuck unclean for 177097.827288, current state down, last acting [4,0]
pg 1.41 is stuck unclean since forever, current state incomplete, last acting [1,4]
pg 1.42 is stuck unclean for 176556.469928, current state incomplete, last acting [4,0]
pg 1.4e is stuck unclean since forever, current state incomplete, last acting [4,1]
pg 1.55 is stuck unclean for 176513.554485, current state incomplete, last acting [4,1]
pg 1.56 is stuck unclean since forever, current state down, last acting [1,4]
pg 1.5e is stuck unclean since forever, current state incomplete, last acting [6,4]
pg 1.64 is stuck unclean for 190478.451929, current state down, last acting [4,1]
pg 1.6a is stuck unclean since forever, current state down, last acting [6,4]
pg 1.6e is stuck unclean for 176582.464287, current state down, last acting [5,6]
pg 1.70 is stuck unclean for 176585.050426, current state down, last acting [5,0]
pg 1.72 is stuck unclean since forever, current state down, last acting [0,4]
pg 1.81 is stuck unclean for 176697.163538, current state incomplete, last acting [4,1]
pg 1.84 is stuck unclean for 176569.100444, current state incomplete, last acting [4,0]
pg 1.8d is stuck unclean for 176522.513870, current state incomplete, last acting [4,6]
pg 1.92 is stuck unclean for 176783.135021, current state incomplete, last acting [5,0]
pg 1.94 is stuck unclean since forever, current state incomplete, last acting [1,5]
pg 1.97 is stuck unclean since forever, current state down, last acting [0,4]
pg 1.9d is stuck unclean for 176524.074997, current state incomplete, last acting [4,6]
pg 1.a1 is stuck unclean for 176515.894133, current state incomplete, last acting [5,0]
pg 1.a3 is stuck unclean for 177392.298000, current state incomplete, last acting [4,6]
pg 1.a4 is stuck unclean for 176521.983648, current state incomplete, last acting [5,6]
pg 1.a7 is stuck unclean for 178949.188383, current state incomplete, last acting [4,0]
pg 1.ab is stuck unclean since forever, current state down, last acting [0,5]
pg 1.ac is stuck unclean since forever, current state down, last acting [5,0]
pg 1.b2 is stuck unclean for 176572.482961, current state down, last acting [5,6]
pg 1.c8 is stuck unclean for 176523.644703, current state down, last acting [5,6]
pg 1.ca is stuck unclean since forever, current state down, last acting [5,6]
pg 1.ce is stuck unclean for 177798.955780, current state incomplete, last acting [5,1]
pg 1.d3 is stuck unclean for 177763.726652, current state down, last acting [5,0]
pg 1.d4 is stuck unclean for 177420.197246, current state down, last acting [5,6]
pg 1.d5 is stuck unclean since forever, current state incomplete, last acting [4,1]
pg 1.d7 is stuck unclean since forever, current state down, last acting [1,5]
pg 1.d8 is stuck unclean since forever, current state down, last acting [1,4]
pg 1.d9 is stuck unclean since forever, current state incomplete, last acting [1,5]
pg 1.e4 is stuck unclean for 176567.382597, current state incomplete, last acting [4,6]
pg 1.e5 is stuck unclean for 177381.259159, current state incomplete, last acting [4,1]
pg 1.e8 is stuck unclean since forever, current state down, last acting [0,6]
pg 1.ea is stuck unclean since forever, current state incomplete, last acting [6,5]
pg 1.fc is stuck unclean since forever, current state incomplete, last acting [1,5]
REQUEST_STUCK 7 stuck requests are blocked > 4096 sec
7 ops are blocked > 134218 sec
osd.5 has stuck requests > 134218 sec
MON_DOWN 1/3 mons down, quorum node41,node44
mon.node48 (rank 2) addr 192.168.184.48:6789/0 is down (out of quorum)
root@node41:~# ceph pg loks query --cluster loks
no valid command found; 10 closest matches:
pg force_create_pg <pgid>
pg set_nearfull_ratio <float[0.0-1.0]>
pg set_full_ratio <float[0.0-1.0]>
pg map <pgid>
pg ls {<int>} {<states> [<states>...]}
pg dump_stuck {inactive|unclean|stale|undersized|degraded [inactive|unclean|stale|undersized|degraded...]} {<int>}
pg ls-by-primary <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg ls-by-osd <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg dump_pools_json
pg ls-by-pool <poolstr> {<states> [<states>...]}
Error EINVAL: invalid command
root@node41:~# ceph pg query --cluster loks
no valid command found; 10 closest matches:
pg force_create_pg <pgid>
pg set_nearfull_ratio <float[0.0-1.0]>
pg set_full_ratio <float[0.0-1.0]>
pg map <pgid>
pg ls {<int>} {<states> [<states>...]}
pg dump_stuck {inactive|unclean|stale|undersized|degraded [inactive|unclean|stale|undersized|degraded...]} {<int>}
pg ls-by-primary <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg ls-by-osd <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg dump_pools_json
pg ls-by-pool <poolstr> {<states> [<states>...]}
Error EINVAL: invalid command
root@node41:~# ceph query --cluster loks
no valid command found; 10 closest matches:
mon dump {<int[0-]>}
mon stat
fs set_default <fs_name>
fs set-default <fs_name>
fs add_data_pool <fs_name> <pool>
fs rm_data_pool <fs_name> <pool>
fs set <fs_name> max_mds|max_file_size|allow_new_snaps|inline_data|cluster_down|allow_multimds|allow_dirfrags|balancer|standby_count_wanted <val> {<confirm>}
fs flag set enable_multiple <val> {--yes-i-really-mean-it}
fs ls
fs get <fs_name>
Error EINVAL: invalid command
ceph pg loks query
2018-12-02 13:20:07.052092 7faabef80700 -1 Errors while parsing config file!
2018-12-02 13:20:07.052097 7faabef80700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-12-02 13:20:07.052098 7faabef80700 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-12-02 13:20:07.052099 7faabef80700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
admin
2,930 Posts
December 2, 2018, 7:22 pmQuote from admin on December 2, 2018, 7:22 pmHow many OSDs do you have, how many up ? any down ?
run:
ceph pg 1.10 query --cluster loks
ceph pg 1.15 query --cluster loks
look into “recovery_status” sections to see what is stuck
Are the disks SSDs or HDDs ?
How many OSDs do you have, how many up ? any down ?
run:
ceph pg 1.10 query --cluster loks
ceph pg 1.15 query --cluster loks
look into “recovery_status” sections to see what is stuck
Are the disks SSDs or HDDs ?
Last edited on December 2, 2018, 9:42 pm by admin · #4
Lerner
7 Posts
December 2, 2018, 10:43 pmQuote from Lerner on December 2, 2018, 10:43 pmHi,
We have 5 OSDs up and 1 down.
All disks HDDs sata.
ceph pg 1.10 query --cluster loks
"recovery_state": [
{
"name": "Started/Primary/Peering/Down",
"enter_time": "2018-12-01 18:08:56.573719",
"comment": "not enough up instances of this PG to go active"
},
{
"name": "Started/Primary/Peering",
"enter_time": "2018-12-01 18:08:56.573649",
"past_intervals": [
{
"first": "7293",
"last": "7963",
"all_participants": [
{
"osd": 0
},
{
"osd": 1
},
{
"osd": 2
},
{
"osd": 5
},
{
"osd": 6
}
],
"intervals": [
{
"first": "7613",
"last": "7625",
"acting": "2"
},
{
"first": "7957",
"last": "7958",
"acting": "0"
},
{
"first": "7961",
"last": "7963",
"acting": "5"
}
]
}
],
"probing_osds": [
"0",
"1",
"5",
"6"
],
"blocked": "peering is blocked due to down osds",
"down_osds_we_would_probe": [
2
],
"peering_blocked_by": [
{
"osd": 2,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let us proceed"
}
]
},
{
"name": "Started",
"enter_time": "2018-12-01 18:08:56.573578"
}
],
ceph pg 1.15 query --cluster loks
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-12-01 18:27:49.462944",
"comment": "not enough complete instances of this PG"
},
{
"name": "Started/Primary/Peering",
"enter_time": "2018-12-01 18:27:49.422286",
"past_intervals": [
{
"first": "7293",
"last": "7963",
"all_participants": [
{
"osd": 1
},
{
"osd": 3
},
{
"osd": 5
},
{
"osd": 6
}
],
"intervals": [
{
"first": "7613",
"last": "7620",
"acting": "3"
},
{
"first": "7672",
"last": "7673",
"acting": "5"
},
{
"first": "7817",
"last": "7819",
"acting": "1,5"
},
{
"first": "7962",
"last": "7963",
"acting": "6"
}
]
}
],
"probing_osds": [
"1",
"5",
"6"
],
"down_osds_we_would_probe": [
3
],
"peering_blocked_by": [],
"peering_blocked_by_detail": [
{
"detail": "peering_blocked_by_history_les_bound"
}
]
},
{
"name": "Started",
"enter_time": "2018-12-01 18:27:49.422188"
}
],
Hi,
We have 5 OSDs up and 1 down.
All disks HDDs sata.
ceph pg 1.10 query --cluster loks
"recovery_state": [
{
"name": "Started/Primary/Peering/Down",
"enter_time": "2018-12-01 18:08:56.573719",
"comment": "not enough up instances of this PG to go active"
},
{
"name": "Started/Primary/Peering",
"enter_time": "2018-12-01 18:08:56.573649",
"past_intervals": [
{
"first": "7293",
"last": "7963",
"all_participants": [
{
"osd": 0
},
{
"osd": 1
},
{
"osd": 2
},
{
"osd": 5
},
{
"osd": 6
}
],
"intervals": [
{
"first": "7613",
"last": "7625",
"acting": "2"
},
{
"first": "7957",
"last": "7958",
"acting": "0"
},
{
"first": "7961",
"last": "7963",
"acting": "5"
}
]
}
],
"probing_osds": [
"0",
"1",
"5",
"6"
],
"blocked": "peering is blocked due to down osds",
"down_osds_we_would_probe": [
2
],
"peering_blocked_by": [
{
"osd": 2,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let us proceed"
}
]
},
{
"name": "Started",
"enter_time": "2018-12-01 18:08:56.573578"
}
],
ceph pg 1.15 query --cluster loks
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-12-01 18:27:49.462944",
"comment": "not enough complete instances of this PG"
},
{
"name": "Started/Primary/Peering",
"enter_time": "2018-12-01 18:27:49.422286",
"past_intervals": [
{
"first": "7293",
"last": "7963",
"all_participants": [
{
"osd": 1
},
{
"osd": 3
},
{
"osd": 5
},
{
"osd": 6
}
],
"intervals": [
{
"first": "7613",
"last": "7620",
"acting": "3"
},
{
"first": "7672",
"last": "7673",
"acting": "5"
},
{
"first": "7817",
"last": "7819",
"acting": "1,5"
},
{
"first": "7962",
"last": "7963",
"acting": "6"
}
]
}
],
"probing_osds": [
"1",
"5",
"6"
],
"down_osds_we_would_probe": [
3
],
"peering_blocked_by": [],
"peering_blocked_by_detail": [
{
"detail": "peering_blocked_by_history_les_bound"
}
]
},
{
"name": "Started",
"enter_time": "2018-12-01 18:27:49.422188"
}
],
admin
2,930 Posts
December 3, 2018, 7:16 amQuote from admin on December 3, 2018, 7:16 amFrom the logs you sent, it seems 2 out of 6 OSDs are down: OSD# 2,3
can you check
ceph status --cluster loks
ceph osd tree --cluster loks
The best thing to try now is to start these 2 OSDs. On their nodes:
systemctl restart ceph-osd@2
systemctl restart ceph-osd@3
systemctl status ceph-osd@2
systemctl status ceph-osd@3
i noticed you also have 1 downed monitor, if the 2 OSDs are not stating, on their node edit the conf file:
/etc/ceph/loks.conf
and temporarily modify the "mon_host = " to exclude the ip address of the failed mon then try again to restart the 2 failed OSDs. Note you shoud later revert the changes to the conf file.
If still they do not start, try to start them manually and see what error you get on console:
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 2
/usr/bin/ceph-osd -f --cluster loks --id 2 --setuser ceph --setgroup ceph
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
/usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup ceph
you can also find other error log in
/var/log/ceph/loks-osd.2.log
/var/log/ceph/loks-osd.3.log
From the logs you sent, it seems 2 out of 6 OSDs are down: OSD# 2,3
can you check
ceph status --cluster loks
ceph osd tree --cluster loks
The best thing to try now is to start these 2 OSDs. On their nodes:
systemctl restart ceph-osd@2
systemctl restart ceph-osd@3
systemctl status ceph-osd@2
systemctl status ceph-osd@3
i noticed you also have 1 downed monitor, if the 2 OSDs are not stating, on their node edit the conf file:
/etc/ceph/loks.conf
and temporarily modify the "mon_host = " to exclude the ip address of the failed mon then try again to restart the 2 failed OSDs. Note you shoud later revert the changes to the conf file.
If still they do not start, try to start them manually and see what error you get on console:
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 2
/usr/bin/ceph-osd -f --cluster loks --id 2 --setuser ceph --setgroup ceph
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
/usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup ceph
you can also find other error log in
/var/log/ceph/loks-osd.2.log
/var/log/ceph/loks-osd.3.log
Last edited on December 3, 2018, 7:20 am by admin · #6
Lerner
7 Posts
December 3, 2018, 11:54 amQuote from Lerner on December 3, 2018, 11:54 amHi,
Is that possible just remove this OSDs of cluster? Because we have other 3 Petasan instances that is possible to work with him.
I just need now to get the files of iscsi storage, because are VMs of clients.
The result of commands:
root@node44:~# systemctl restart ceph-osd@3
Job for ceph-osd@3.service failed because the control process exited with error code. See "systemctl status ceph-osd@3.service" and "journalctl -xe" for details.
root@node44:~# /usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
OSD data directory /var/lib/ceph/osd/loks-3 does not exist; bailing out.
root@node44:~# /usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup ceph
2018-12-03 08:51:50.169527 7fde0ecb9e00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/loks-3: (2) No such file or directory
The same output for the two OSDs.
Thanks!
Hi,
Is that possible just remove this OSDs of cluster? Because we have other 3 Petasan instances that is possible to work with him.
I just need now to get the files of iscsi storage, because are VMs of clients.
The result of commands:
root@node44:~# systemctl restart ceph-osd@3
Job for ceph-osd@3.service failed because the control process exited with error code. See "systemctl status ceph-osd@3.service" and "journalctl -xe" for details.
root@node44:~# /usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
OSD data directory /var/lib/ceph/osd/loks-3 does not exist; bailing out.
root@node44:~# /usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup ceph
2018-12-03 08:51:50.169527 7fde0ecb9e00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/loks-3: (2) No such file or directory
The same output for the two OSDs.
Thanks!
admin
2,930 Posts
December 3, 2018, 2:43 pmQuote from admin on December 3, 2018, 2:43 pmNo you can not move the 2 OSDs to another cluster. You can remove them from their hosts and put them in other hosts within the same PetaSAN cluster if you think the problem is with the host node rather than the disks themselves.
Can you send me the the output of the first 2 commands in my prev post, are OSD 2 and 3 on same host or different ?
edit /etc/ceph/loks.conf
and temporarily modify the "mon_host = " to exclude the ip address of the failed mon
find the disk /dev/SDX where OSD 2 and OSD 3 are on via the ui or via the cli
ceph-disk list
Mount their first/metada partition:
mount /dev/sdX1 /var/lib/ceph/osd/loks-2
where sdX is the disk for OSD 2
mount /dev/sdY1 /var/lib/ceph/osd/loks-3
where sdY is the disk for OSD 3
After mounting, try to start the OSDs manually and see what error you get on console:
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 2
/usr/bin/ceph-osd -f --cluster loks --id 2 --setuser ceph --setgroup ceph
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
/usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup ceph
you can also find other error log in
/var/log/ceph/loks-osd.2.log
/var/log/ceph/loks-osd.3.log
No you can not move the 2 OSDs to another cluster. You can remove them from their hosts and put them in other hosts within the same PetaSAN cluster if you think the problem is with the host node rather than the disks themselves.
Can you send me the the output of the first 2 commands in my prev post, are OSD 2 and 3 on same host or different ?
edit /etc/ceph/loks.conf
and temporarily modify the "mon_host = " to exclude the ip address of the failed mon
find the disk /dev/SDX where OSD 2 and OSD 3 are on via the ui or via the cli
ceph-disk list
Mount their first/metada partition:
mount /dev/sdX1 /var/lib/ceph/osd/loks-2
where sdX is the disk for OSD 2
mount /dev/sdY1 /var/lib/ceph/osd/loks-3
where sdY is the disk for OSD 3
After mounting, try to start the OSDs manually and see what error you get on console:
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 2
/usr/bin/ceph-osd -f --cluster loks --id 2 --setuser ceph --setgroup ceph
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
/usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup ceph
you can also find other error log in
/var/log/ceph/loks-osd.2.log
/var/log/ceph/loks-osd.3.log
Lerner
7 Posts
December 4, 2018, 1:42 amQuote from Lerner on December 4, 2018, 1:42 amHi,
I replaced the node that was gave problem (node 48).
Follow the output:
root@node41:~# ceph status --cluster loks
cluster:
id: c8497f53-4077-4359-8d50-7439c0d2760f
health: HEALTH_WARN
Reduced data availability: 55 pgs inactive, 55 pgs incomplete
Degraded data redundancy: 55 pgs unclean
services:
mon: 3 daemons, quorum node41,node44,node48
mgr: node44(active), standbys: node41, node48
osd: 7 osds: 7 up, 7 in
data:
pools: 1 pools, 256 pgs
objects: 151k objects, 604 GB
usage: 1216 GB used, 5482 GB / 6699 GB avail
pgs: 21.484% pgs not active
198 active+clean
55 incomplete
2 active+clean+scrubbing+deep
1 active+clean+scrubbing
root@node41:~# ceph osd tree --cluster loks
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 6.54225 root default
-7 1.81918 host node41
4 hdd 0.90959 osd.4 up 1.00000 1.00000
5 hdd 0.90959 osd.5 up 1.00000 1.00000
-9 0.90709 host node42
6 hdd 0.90709 osd.6 up 1.00000 1.00000
-3 1.81619 host node44
0 hdd 0.90810 osd.0 up 1.00000 1.00000
1 hdd 0.90810 osd.1 up 1.00000 1.00000
-5 1.99979 host node48
3 hdd 0.99989 osd.3 up 1.00000 1.00000
7 hdd 0.99989 osd.7 up 1.00000 1.00000
root@node41:~#
Thanks!
Hi,
I replaced the node that was gave problem (node 48).
Follow the output:
root@node41:~# ceph status --cluster loks
cluster:
id: c8497f53-4077-4359-8d50-7439c0d2760f
health: HEALTH_WARN
Reduced data availability: 55 pgs inactive, 55 pgs incomplete
Degraded data redundancy: 55 pgs unclean
services:
mon: 3 daemons, quorum node41,node44,node48
mgr: node44(active), standbys: node41, node48
osd: 7 osds: 7 up, 7 in
data:
pools: 1 pools, 256 pgs
objects: 151k objects, 604 GB
usage: 1216 GB used, 5482 GB / 6699 GB avail
pgs: 21.484% pgs not active
198 active+clean
55 incomplete
2 active+clean+scrubbing+deep
1 active+clean+scrubbing
root@node41:~# ceph osd tree --cluster loks
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 6.54225 root default
-7 1.81918 host node41
4 hdd 0.90959 osd.4 up 1.00000 1.00000
5 hdd 0.90959 osd.5 up 1.00000 1.00000
-9 0.90709 host node42
6 hdd 0.90709 osd.6 up 1.00000 1.00000
-3 1.81619 host node44
0 hdd 0.90810 osd.0 up 1.00000 1.00000
1 hdd 0.90810 osd.1 up 1.00000 1.00000
-5 1.99979 host node48
3 hdd 0.99989 osd.3 up 1.00000 1.00000
7 hdd 0.99989 osd.7 up 1.00000 1.00000
root@node41:~#
Thanks!
admin
2,930 Posts
December 4, 2018, 8:56 amQuote from admin on December 4, 2018, 8:56 amHi
What happened to osd 2 ? did you try to start 2&3 per command line ? what was the output ? As stated starting ths osds is the best thing to do. even if they do not start we can retrieve data as long as the physical disks are not corrupt.
Now we need to see why there in-active/in-complete pgs, can you do what was done prev by quering such pgs and look at the the recovery status section as before and post a couple of them if they are different.
Hi
What happened to osd 2 ? did you try to start 2&3 per command line ? what was the output ? As stated starting ths osds is the best thing to do. even if they do not start we can retrieve data as long as the physical disks are not corrupt.
Now we need to see why there in-active/in-complete pgs, can you do what was done prev by quering such pgs and look at the the recovery status section as before and post a couple of them if they are different.
Pages: 1 2
Iscsi not start
Lerner
7 Posts
Quote from Lerner on December 1, 2018, 9:29 pmGood afternoon,
I have a problem to start iscsi disk after a power outage.
When i click in iscsi disk list, not open.
Can help us with this problem? Have a way to copy the files from servers?ceph health
2018-12-01 18:26:22.724485 7efe3ad8f700 -1 Errors while parsing config file!
2018-12-01 18:26:22.724506 7efe3ad8f700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-12-01 18:26:22.724508 7efe3ad8f700 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-12-01 18:26:22.724509 7efe3ad8f700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)consul members
Node Address Status Type Build Protocol DC
node41 192.168.184.41:8301 alive server 0.7.3 2 petasan
node42 192.168.184.42:8301 alive client 0.7.3 2 petasan
node44 192.168.184.44:8301 alive server 0.7.3 2 petasan
node48 192.168.184.48:8301 failed server 0.7.3 2 petasanThanks
Good afternoon,
I have a problem to start iscsi disk after a power outage.
When i click in iscsi disk list, not open.
Can help us with this problem? Have a way to copy the files from servers?
ceph health
2018-12-01 18:26:22.724485 7efe3ad8f700 -1 Errors while parsing config file!
2018-12-01 18:26:22.724506 7efe3ad8f700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-12-01 18:26:22.724508 7efe3ad8f700 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-12-01 18:26:22.724509 7efe3ad8f700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
consul members
Node Address Status Type Build Protocol DC
node41 192.168.184.41:8301 alive server 0.7.3 2 petasan
node42 192.168.184.42:8301 alive client 0.7.3 2 petasan
node44 192.168.184.44:8301 alive server 0.7.3 2 petasan
node48 192.168.184.48:8301 failed server 0.7.3 2 petasan
Thanks
admin
2,930 Posts
Quote from admin on December 2, 2018, 11:43 amit is probably a Ceph layer issue rather than iscsi
You need to check Ceph: if Ceph is recovering, then it may take time before it gets active again, , else if it is stuck then need to find out why via cli commands. Note you need to add the --cluster XX parameter to your commands, where XX is the name of your cluster.
ceph health detail --cluster XX
For PGs that are stuck, try to find out why via
ceph pg XX query --cluster XX
and look at the “recovery_status” sections.What type/model of disks do you have ?
it is probably a Ceph layer issue rather than iscsi
You need to check Ceph: if Ceph is recovering, then it may take time before it gets active again, , else if it is stuck then need to find out why via cli commands. Note you need to add the --cluster XX parameter to your commands, where XX is the name of your cluster.
ceph health detail --cluster XX
For PGs that are stuck, try to find out why via
ceph pg XX query --cluster XX
and look at the “recovery_status” sections.
What type/model of disks do you have ?
Lerner
7 Posts
Quote from Lerner on December 2, 2018, 4:22 pmHi,
Thank you for you answer.
The disks are the most SATA/Seagate.
Follow the output of commands (ceph not start i dont know why)
ceph health detail --cluster loks
HEALTH_ERR Reduced data availability: 55 pgs inactive, 27 pgs down, 28 pgs incomplete; Degraded data redundancy: 55 pgs unclean; 7 stuck requests are blocked > 4096 sec; 1/3 mons down, quorum node41,node44
PG_AVAILABILITY Reduced data availability: 55 pgs inactive, 27 pgs down, 28 pgs incomplete
pg 1.7 is incomplete, acting [5,6]
pg 1.a is down, acting [5,6]
pg 1.10 is down, acting [0,5]
pg 1.15 is incomplete, acting [6,1]
pg 1.1c is incomplete, acting [5,6]
pg 1.22 is down, acting [5,6]
pg 1.24 is down, acting [5,0]
pg 1.2d is incomplete, acting [6,5]
pg 1.2e is down, acting [5,6]
pg 1.32 is down, acting [4,1]
pg 1.34 is incomplete, acting [4,6]
pg 1.38 is down, acting [4,0]
pg 1.41 is incomplete, acting [1,4]
pg 1.42 is incomplete, acting [4,0]
pg 1.4e is incomplete, acting [4,1]
pg 1.55 is incomplete, acting [4,1]
pg 1.56 is down, acting [1,4]
pg 1.5e is incomplete, acting [6,4]
pg 1.64 is down, acting [4,1]
pg 1.6a is down, acting [6,4]
pg 1.6e is down, acting [5,6]
pg 1.70 is down, acting [5,0]
pg 1.72 is down, acting [0,4]
pg 1.81 is incomplete, acting [4,1]
pg 1.84 is incomplete, acting [4,0]
pg 1.8d is incomplete, acting [4,6]
pg 1.92 is incomplete, acting [5,0]
pg 1.94 is incomplete, acting [1,5]
pg 1.97 is down, acting [0,4]
pg 1.9d is incomplete, acting [4,6]
pg 1.a1 is incomplete, acting [5,0]
pg 1.a3 is incomplete, acting [4,6]
pg 1.a4 is incomplete, acting [5,6]
pg 1.a7 is incomplete, acting [4,0]
pg 1.ab is down, acting [0,5]
pg 1.ac is down, acting [5,0]
pg 1.b2 is stuck inactive for 147747.228969, current state down, last acting [5,6]
pg 1.c8 is down, acting [5,6]
pg 1.ca is down, acting [5,6]
pg 1.ce is incomplete, acting [5,1]
pg 1.d3 is down, acting [5,0]
pg 1.d4 is down, acting [5,6]
pg 1.d5 is incomplete, acting [4,1]
pg 1.d7 is down, acting [1,5]
pg 1.d8 is down, acting [1,4]
pg 1.d9 is incomplete, acting [1,5]
pg 1.e4 is incomplete, acting [4,6]
pg 1.e5 is incomplete, acting [4,1]
pg 1.e8 is down, acting [0,6]
pg 1.ea is incomplete, acting [6,5]
pg 1.fc is incomplete, acting [1,5]
PG_DEGRADED Degraded data redundancy: 55 pgs unclean
pg 1.7 is stuck unclean since forever, current state incomplete, last acting [5,6]
pg 1.a is stuck unclean for 177816.248200, current state down, last acting [5,6]
pg 1.10 is stuck unclean since forever, current state down, last acting [0,5]
pg 1.15 is stuck unclean since forever, current state incomplete, last acting [6,1]
pg 1.1c is stuck unclean for 179338.763846, current state incomplete, last acting [5,6]
pg 1.22 is stuck unclean for 176573.873492, current state down, last acting [5,6]
pg 1.24 is stuck unclean for 176804.662961, current state down, last acting [5,0]
pg 1.2d is stuck unclean since forever, current state incomplete, last acting [6,5]
pg 1.2e is stuck unclean for 177810.929634, current state down, last acting [5,6]
pg 1.32 is stuck unclean for 178030.266384, current state down, last acting [4,1]
pg 1.34 is stuck unclean for 176565.995475, current state incomplete, last acting [4,6]
pg 1.38 is stuck unclean for 177097.827288, current state down, last acting [4,0]
pg 1.41 is stuck unclean since forever, current state incomplete, last acting [1,4]
pg 1.42 is stuck unclean for 176556.469928, current state incomplete, last acting [4,0]
pg 1.4e is stuck unclean since forever, current state incomplete, last acting [4,1]
pg 1.55 is stuck unclean for 176513.554485, current state incomplete, last acting [4,1]
pg 1.56 is stuck unclean since forever, current state down, last acting [1,4]
pg 1.5e is stuck unclean since forever, current state incomplete, last acting [6,4]
pg 1.64 is stuck unclean for 190478.451929, current state down, last acting [4,1]
pg 1.6a is stuck unclean since forever, current state down, last acting [6,4]
pg 1.6e is stuck unclean for 176582.464287, current state down, last acting [5,6]
pg 1.70 is stuck unclean for 176585.050426, current state down, last acting [5,0]
pg 1.72 is stuck unclean since forever, current state down, last acting [0,4]
pg 1.81 is stuck unclean for 176697.163538, current state incomplete, last acting [4,1]
pg 1.84 is stuck unclean for 176569.100444, current state incomplete, last acting [4,0]
pg 1.8d is stuck unclean for 176522.513870, current state incomplete, last acting [4,6]
pg 1.92 is stuck unclean for 176783.135021, current state incomplete, last acting [5,0]
pg 1.94 is stuck unclean since forever, current state incomplete, last acting [1,5]
pg 1.97 is stuck unclean since forever, current state down, last acting [0,4]
pg 1.9d is stuck unclean for 176524.074997, current state incomplete, last acting [4,6]
pg 1.a1 is stuck unclean for 176515.894133, current state incomplete, last acting [5,0]
pg 1.a3 is stuck unclean for 177392.298000, current state incomplete, last acting [4,6]
pg 1.a4 is stuck unclean for 176521.983648, current state incomplete, last acting [5,6]
pg 1.a7 is stuck unclean for 178949.188383, current state incomplete, last acting [4,0]
pg 1.ab is stuck unclean since forever, current state down, last acting [0,5]
pg 1.ac is stuck unclean since forever, current state down, last acting [5,0]
pg 1.b2 is stuck unclean for 176572.482961, current state down, last acting [5,6]
pg 1.c8 is stuck unclean for 176523.644703, current state down, last acting [5,6]
pg 1.ca is stuck unclean since forever, current state down, last acting [5,6]
pg 1.ce is stuck unclean for 177798.955780, current state incomplete, last acting [5,1]
pg 1.d3 is stuck unclean for 177763.726652, current state down, last acting [5,0]
pg 1.d4 is stuck unclean for 177420.197246, current state down, last acting [5,6]
pg 1.d5 is stuck unclean since forever, current state incomplete, last acting [4,1]
pg 1.d7 is stuck unclean since forever, current state down, last acting [1,5]
pg 1.d8 is stuck unclean since forever, current state down, last acting [1,4]
pg 1.d9 is stuck unclean since forever, current state incomplete, last acting [1,5]
pg 1.e4 is stuck unclean for 176567.382597, current state incomplete, last acting [4,6]
pg 1.e5 is stuck unclean for 177381.259159, current state incomplete, last acting [4,1]
pg 1.e8 is stuck unclean since forever, current state down, last acting [0,6]
pg 1.ea is stuck unclean since forever, current state incomplete, last acting [6,5]
pg 1.fc is stuck unclean since forever, current state incomplete, last acting [1,5]
REQUEST_STUCK 7 stuck requests are blocked > 4096 sec
7 ops are blocked > 134218 sec
osd.5 has stuck requests > 134218 sec
MON_DOWN 1/3 mons down, quorum node41,node44
mon.node48 (rank 2) addr 192.168.184.48:6789/0 is down (out of quorum)
root@node41:~# ceph pg loks query --cluster loks
no valid command found; 10 closest matches:
pg force_create_pg <pgid>
pg set_nearfull_ratio <float[0.0-1.0]>
pg set_full_ratio <float[0.0-1.0]>
pg map <pgid>
pg ls {<int>} {<states> [<states>...]}
pg dump_stuck {inactive|unclean|stale|undersized|degraded [inactive|unclean|stale|undersized|degraded...]} {<int>}
pg ls-by-primary <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg ls-by-osd <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg dump_pools_json
pg ls-by-pool <poolstr> {<states> [<states>...]}
Error EINVAL: invalid command
root@node41:~# ceph pg query --cluster loks
no valid command found; 10 closest matches:
pg force_create_pg <pgid>
pg set_nearfull_ratio <float[0.0-1.0]>
pg set_full_ratio <float[0.0-1.0]>
pg map <pgid>
pg ls {<int>} {<states> [<states>...]}
pg dump_stuck {inactive|unclean|stale|undersized|degraded [inactive|unclean|stale|undersized|degraded...]} {<int>}
pg ls-by-primary <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg ls-by-osd <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg dump_pools_json
pg ls-by-pool <poolstr> {<states> [<states>...]}
Error EINVAL: invalid command
root@node41:~# ceph query --cluster loks
no valid command found; 10 closest matches:
mon dump {<int[0-]>}
mon stat
fs set_default <fs_name>
fs set-default <fs_name>
fs add_data_pool <fs_name> <pool>
fs rm_data_pool <fs_name> <pool>
fs set <fs_name> max_mds|max_file_size|allow_new_snaps|inline_data|cluster_down|allow_multimds|allow_dirfrags|balancer|standby_count_wanted <val> {<confirm>}
fs flag set enable_multiple <val> {--yes-i-really-mean-it}
fs ls
fs get <fs_name>
Error EINVAL: invalid command
ceph pg loks query
2018-12-02 13:20:07.052092 7faabef80700 -1 Errors while parsing config file!
2018-12-02 13:20:07.052097 7faabef80700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-12-02 13:20:07.052098 7faabef80700 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-12-02 13:20:07.052099 7faabef80700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
Hi,
Thank you for you answer.
The disks are the most SATA/Seagate.
Follow the output of commands (ceph not start i dont know why)
ceph health detail --cluster loks
HEALTH_ERR Reduced data availability: 55 pgs inactive, 27 pgs down, 28 pgs incomplete; Degraded data redundancy: 55 pgs unclean; 7 stuck requests are blocked > 4096 sec; 1/3 mons down, quorum node41,node44
PG_AVAILABILITY Reduced data availability: 55 pgs inactive, 27 pgs down, 28 pgs incomplete
pg 1.7 is incomplete, acting [5,6]
pg 1.a is down, acting [5,6]
pg 1.10 is down, acting [0,5]
pg 1.15 is incomplete, acting [6,1]
pg 1.1c is incomplete, acting [5,6]
pg 1.22 is down, acting [5,6]
pg 1.24 is down, acting [5,0]
pg 1.2d is incomplete, acting [6,5]
pg 1.2e is down, acting [5,6]
pg 1.32 is down, acting [4,1]
pg 1.34 is incomplete, acting [4,6]
pg 1.38 is down, acting [4,0]
pg 1.41 is incomplete, acting [1,4]
pg 1.42 is incomplete, acting [4,0]
pg 1.4e is incomplete, acting [4,1]
pg 1.55 is incomplete, acting [4,1]
pg 1.56 is down, acting [1,4]
pg 1.5e is incomplete, acting [6,4]
pg 1.64 is down, acting [4,1]
pg 1.6a is down, acting [6,4]
pg 1.6e is down, acting [5,6]
pg 1.70 is down, acting [5,0]
pg 1.72 is down, acting [0,4]
pg 1.81 is incomplete, acting [4,1]
pg 1.84 is incomplete, acting [4,0]
pg 1.8d is incomplete, acting [4,6]
pg 1.92 is incomplete, acting [5,0]
pg 1.94 is incomplete, acting [1,5]
pg 1.97 is down, acting [0,4]
pg 1.9d is incomplete, acting [4,6]
pg 1.a1 is incomplete, acting [5,0]
pg 1.a3 is incomplete, acting [4,6]
pg 1.a4 is incomplete, acting [5,6]
pg 1.a7 is incomplete, acting [4,0]
pg 1.ab is down, acting [0,5]
pg 1.ac is down, acting [5,0]
pg 1.b2 is stuck inactive for 147747.228969, current state down, last acting [5,6]
pg 1.c8 is down, acting [5,6]
pg 1.ca is down, acting [5,6]
pg 1.ce is incomplete, acting [5,1]
pg 1.d3 is down, acting [5,0]
pg 1.d4 is down, acting [5,6]
pg 1.d5 is incomplete, acting [4,1]
pg 1.d7 is down, acting [1,5]
pg 1.d8 is down, acting [1,4]
pg 1.d9 is incomplete, acting [1,5]
pg 1.e4 is incomplete, acting [4,6]
pg 1.e5 is incomplete, acting [4,1]
pg 1.e8 is down, acting [0,6]
pg 1.ea is incomplete, acting [6,5]
pg 1.fc is incomplete, acting [1,5]
PG_DEGRADED Degraded data redundancy: 55 pgs unclean
pg 1.7 is stuck unclean since forever, current state incomplete, last acting [5,6]
pg 1.a is stuck unclean for 177816.248200, current state down, last acting [5,6]
pg 1.10 is stuck unclean since forever, current state down, last acting [0,5]
pg 1.15 is stuck unclean since forever, current state incomplete, last acting [6,1]
pg 1.1c is stuck unclean for 179338.763846, current state incomplete, last acting [5,6]
pg 1.22 is stuck unclean for 176573.873492, current state down, last acting [5,6]
pg 1.24 is stuck unclean for 176804.662961, current state down, last acting [5,0]
pg 1.2d is stuck unclean since forever, current state incomplete, last acting [6,5]
pg 1.2e is stuck unclean for 177810.929634, current state down, last acting [5,6]
pg 1.32 is stuck unclean for 178030.266384, current state down, last acting [4,1]
pg 1.34 is stuck unclean for 176565.995475, current state incomplete, last acting [4,6]
pg 1.38 is stuck unclean for 177097.827288, current state down, last acting [4,0]
pg 1.41 is stuck unclean since forever, current state incomplete, last acting [1,4]
pg 1.42 is stuck unclean for 176556.469928, current state incomplete, last acting [4,0]
pg 1.4e is stuck unclean since forever, current state incomplete, last acting [4,1]
pg 1.55 is stuck unclean for 176513.554485, current state incomplete, last acting [4,1]
pg 1.56 is stuck unclean since forever, current state down, last acting [1,4]
pg 1.5e is stuck unclean since forever, current state incomplete, last acting [6,4]
pg 1.64 is stuck unclean for 190478.451929, current state down, last acting [4,1]
pg 1.6a is stuck unclean since forever, current state down, last acting [6,4]
pg 1.6e is stuck unclean for 176582.464287, current state down, last acting [5,6]
pg 1.70 is stuck unclean for 176585.050426, current state down, last acting [5,0]
pg 1.72 is stuck unclean since forever, current state down, last acting [0,4]
pg 1.81 is stuck unclean for 176697.163538, current state incomplete, last acting [4,1]
pg 1.84 is stuck unclean for 176569.100444, current state incomplete, last acting [4,0]
pg 1.8d is stuck unclean for 176522.513870, current state incomplete, last acting [4,6]
pg 1.92 is stuck unclean for 176783.135021, current state incomplete, last acting [5,0]
pg 1.94 is stuck unclean since forever, current state incomplete, last acting [1,5]
pg 1.97 is stuck unclean since forever, current state down, last acting [0,4]
pg 1.9d is stuck unclean for 176524.074997, current state incomplete, last acting [4,6]
pg 1.a1 is stuck unclean for 176515.894133, current state incomplete, last acting [5,0]
pg 1.a3 is stuck unclean for 177392.298000, current state incomplete, last acting [4,6]
pg 1.a4 is stuck unclean for 176521.983648, current state incomplete, last acting [5,6]
pg 1.a7 is stuck unclean for 178949.188383, current state incomplete, last acting [4,0]
pg 1.ab is stuck unclean since forever, current state down, last acting [0,5]
pg 1.ac is stuck unclean since forever, current state down, last acting [5,0]
pg 1.b2 is stuck unclean for 176572.482961, current state down, last acting [5,6]
pg 1.c8 is stuck unclean for 176523.644703, current state down, last acting [5,6]
pg 1.ca is stuck unclean since forever, current state down, last acting [5,6]
pg 1.ce is stuck unclean for 177798.955780, current state incomplete, last acting [5,1]
pg 1.d3 is stuck unclean for 177763.726652, current state down, last acting [5,0]
pg 1.d4 is stuck unclean for 177420.197246, current state down, last acting [5,6]
pg 1.d5 is stuck unclean since forever, current state incomplete, last acting [4,1]
pg 1.d7 is stuck unclean since forever, current state down, last acting [1,5]
pg 1.d8 is stuck unclean since forever, current state down, last acting [1,4]
pg 1.d9 is stuck unclean since forever, current state incomplete, last acting [1,5]
pg 1.e4 is stuck unclean for 176567.382597, current state incomplete, last acting [4,6]
pg 1.e5 is stuck unclean for 177381.259159, current state incomplete, last acting [4,1]
pg 1.e8 is stuck unclean since forever, current state down, last acting [0,6]
pg 1.ea is stuck unclean since forever, current state incomplete, last acting [6,5]
pg 1.fc is stuck unclean since forever, current state incomplete, last acting [1,5]
REQUEST_STUCK 7 stuck requests are blocked > 4096 sec
7 ops are blocked > 134218 sec
osd.5 has stuck requests > 134218 sec
MON_DOWN 1/3 mons down, quorum node41,node44
mon.node48 (rank 2) addr 192.168.184.48:6789/0 is down (out of quorum)
root@node41:~# ceph pg loks query --cluster loks
no valid command found; 10 closest matches:
pg force_create_pg <pgid>
pg set_nearfull_ratio <float[0.0-1.0]>
pg set_full_ratio <float[0.0-1.0]>
pg map <pgid>
pg ls {<int>} {<states> [<states>...]}
pg dump_stuck {inactive|unclean|stale|undersized|degraded [inactive|unclean|stale|undersized|degraded...]} {<int>}
pg ls-by-primary <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg ls-by-osd <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg dump_pools_json
pg ls-by-pool <poolstr> {<states> [<states>...]}
Error EINVAL: invalid command
root@node41:~# ceph pg query --cluster loks
no valid command found; 10 closest matches:
pg force_create_pg <pgid>
pg set_nearfull_ratio <float[0.0-1.0]>
pg set_full_ratio <float[0.0-1.0]>
pg map <pgid>
pg ls {<int>} {<states> [<states>...]}
pg dump_stuck {inactive|unclean|stale|undersized|degraded [inactive|unclean|stale|undersized|degraded...]} {<int>}
pg ls-by-primary <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg ls-by-osd <osdname (id|osd.id)> {<int>} {<states> [<states>...]}
pg dump_pools_json
pg ls-by-pool <poolstr> {<states> [<states>...]}
Error EINVAL: invalid command
root@node41:~# ceph query --cluster loks
no valid command found; 10 closest matches:
mon dump {<int[0-]>}
mon stat
fs set_default <fs_name>
fs set-default <fs_name>
fs add_data_pool <fs_name> <pool>
fs rm_data_pool <fs_name> <pool>
fs set <fs_name> max_mds|max_file_size|allow_new_snaps|inline_data|cluster_down|allow_multimds|allow_dirfrags|balancer|standby_count_wanted <val> {<confirm>}
fs flag set enable_multiple <val> {--yes-i-really-mean-it}
fs ls
fs get <fs_name>
Error EINVAL: invalid command
ceph pg loks query
2018-12-02 13:20:07.052092 7faabef80700 -1 Errors while parsing config file!
2018-12-02 13:20:07.052097 7faabef80700 -1 parse_file: cannot open /etc/ceph/ceph.conf: (2) No such file or directory
2018-12-02 13:20:07.052098 7faabef80700 -1 parse_file: cannot open ~/.ceph/ceph.conf: (2) No such file or directory
2018-12-02 13:20:07.052099 7faabef80700 -1 parse_file: cannot open ceph.conf: (2) No such file or directory
Error initializing cluster client: ObjectNotFound('error calling conf_read_file',)
admin
2,930 Posts
Quote from admin on December 2, 2018, 7:22 pmHow many OSDs do you have, how many up ? any down ?
run:
ceph pg 1.10 query --cluster loks
ceph pg 1.15 query --cluster lokslook into “recovery_status” sections to see what is stuck
Are the disks SSDs or HDDs ?
How many OSDs do you have, how many up ? any down ?
run:
ceph pg 1.10 query --cluster loks
ceph pg 1.15 query --cluster loks
look into “recovery_status” sections to see what is stuck
Are the disks SSDs or HDDs ?
Lerner
7 Posts
Quote from Lerner on December 2, 2018, 10:43 pmHi,
We have 5 OSDs up and 1 down.
All disks HDDs sata.
ceph pg 1.10 query --cluster loks
"recovery_state": [
{
"name": "Started/Primary/Peering/Down",
"enter_time": "2018-12-01 18:08:56.573719",
"comment": "not enough up instances of this PG to go active"
},
{
"name": "Started/Primary/Peering",
"enter_time": "2018-12-01 18:08:56.573649",
"past_intervals": [
{
"first": "7293",
"last": "7963",
"all_participants": [
{
"osd": 0
},
{
"osd": 1
},
{
"osd": 2
},
{
"osd": 5
},
{
"osd": 6
}
],
"intervals": [
{
"first": "7613",
"last": "7625",
"acting": "2"
},
{
"first": "7957",
"last": "7958",
"acting": "0"
},
{
"first": "7961",
"last": "7963",
"acting": "5"
}
]
}
],
"probing_osds": [
"0",
"1",
"5",
"6"
],
"blocked": "peering is blocked due to down osds",
"down_osds_we_would_probe": [
2
],
"peering_blocked_by": [
{
"osd": 2,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let us proceed"
}
]
},
{
"name": "Started",
"enter_time": "2018-12-01 18:08:56.573578"
}
],ceph pg 1.15 query --cluster loks
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-12-01 18:27:49.462944",
"comment": "not enough complete instances of this PG"
},
{
"name": "Started/Primary/Peering",
"enter_time": "2018-12-01 18:27:49.422286",
"past_intervals": [
{
"first": "7293",
"last": "7963",
"all_participants": [
{
"osd": 1
},
{
"osd": 3
},
{
"osd": 5
},
{
"osd": 6
}
],
"intervals": [
{
"first": "7613",
"last": "7620",
"acting": "3"
},
{
"first": "7672",
"last": "7673",
"acting": "5"
},
{
"first": "7817",
"last": "7819",
"acting": "1,5"
},
{
"first": "7962",
"last": "7963",
"acting": "6"
}
]
}
],
"probing_osds": [
"1",
"5",
"6"
],
"down_osds_we_would_probe": [
3
],
"peering_blocked_by": [],
"peering_blocked_by_detail": [
{
"detail": "peering_blocked_by_history_les_bound"
}
]
},
{
"name": "Started",
"enter_time": "2018-12-01 18:27:49.422188"
}
],
Hi,
We have 5 OSDs up and 1 down.
All disks HDDs sata.
ceph pg 1.10 query --cluster loks
"recovery_state": [
{
"name": "Started/Primary/Peering/Down",
"enter_time": "2018-12-01 18:08:56.573719",
"comment": "not enough up instances of this PG to go active"
},
{
"name": "Started/Primary/Peering",
"enter_time": "2018-12-01 18:08:56.573649",
"past_intervals": [
{
"first": "7293",
"last": "7963",
"all_participants": [
{
"osd": 0
},
{
"osd": 1
},
{
"osd": 2
},
{
"osd": 5
},
{
"osd": 6
}
],
"intervals": [
{
"first": "7613",
"last": "7625",
"acting": "2"
},
{
"first": "7957",
"last": "7958",
"acting": "0"
},
{
"first": "7961",
"last": "7963",
"acting": "5"
}
]
}
],
"probing_osds": [
"0",
"1",
"5",
"6"
],
"blocked": "peering is blocked due to down osds",
"down_osds_we_would_probe": [
2
],
"peering_blocked_by": [
{
"osd": 2,
"current_lost_at": 0,
"comment": "starting or marking this osd lost may let us proceed"
}
]
},
{
"name": "Started",
"enter_time": "2018-12-01 18:08:56.573578"
}
],
ceph pg 1.15 query --cluster loks
"recovery_state": [
{
"name": "Started/Primary/Peering/Incomplete",
"enter_time": "2018-12-01 18:27:49.462944",
"comment": "not enough complete instances of this PG"
},
{
"name": "Started/Primary/Peering",
"enter_time": "2018-12-01 18:27:49.422286",
"past_intervals": [
{
"first": "7293",
"last": "7963",
"all_participants": [
{
"osd": 1
},
{
"osd": 3
},
{
"osd": 5
},
{
"osd": 6
}
],
"intervals": [
{
"first": "7613",
"last": "7620",
"acting": "3"
},
{
"first": "7672",
"last": "7673",
"acting": "5"
},
{
"first": "7817",
"last": "7819",
"acting": "1,5"
},
{
"first": "7962",
"last": "7963",
"acting": "6"
}
]
}
],
"probing_osds": [
"1",
"5",
"6"
],
"down_osds_we_would_probe": [
3
],
"peering_blocked_by": [],
"peering_blocked_by_detail": [
{
"detail": "peering_blocked_by_history_les_bound"
}
]
},
{
"name": "Started",
"enter_time": "2018-12-01 18:27:49.422188"
}
],
admin
2,930 Posts
Quote from admin on December 3, 2018, 7:16 amFrom the logs you sent, it seems 2 out of 6 OSDs are down: OSD# 2,3
can you checkceph status --cluster loks
ceph osd tree --cluster loksThe best thing to try now is to start these 2 OSDs. On their nodes:
systemctl restart ceph-osd@2
systemctl restart ceph-osd@3systemctl status ceph-osd@2
systemctl status ceph-osd@3i noticed you also have 1 downed monitor, if the 2 OSDs are not stating, on their node edit the conf file:
/etc/ceph/loks.conf
and temporarily modify the "mon_host = " to exclude the ip address of the failed mon then try again to restart the 2 failed OSDs. Note you shoud later revert the changes to the conf file.If still they do not start, try to start them manually and see what error you get on console:
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 2
/usr/bin/ceph-osd -f --cluster loks --id 2 --setuser ceph --setgroup ceph
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
/usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup cephyou can also find other error log in
/var/log/ceph/loks-osd.2.log
/var/log/ceph/loks-osd.3.log
From the logs you sent, it seems 2 out of 6 OSDs are down: OSD# 2,3
can you check
ceph status --cluster loks
ceph osd tree --cluster loks
The best thing to try now is to start these 2 OSDs. On their nodes:
systemctl restart ceph-osd@2
systemctl restart ceph-osd@3systemctl status ceph-osd@2
systemctl status ceph-osd@3
i noticed you also have 1 downed monitor, if the 2 OSDs are not stating, on their node edit the conf file:
/etc/ceph/loks.conf
and temporarily modify the "mon_host = " to exclude the ip address of the failed mon then try again to restart the 2 failed OSDs. Note you shoud later revert the changes to the conf file.
If still they do not start, try to start them manually and see what error you get on console:
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 2
/usr/bin/ceph-osd -f --cluster loks --id 2 --setuser ceph --setgroup ceph
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
/usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup ceph
you can also find other error log in
/var/log/ceph/loks-osd.2.log
/var/log/ceph/loks-osd.3.log
Lerner
7 Posts
Quote from Lerner on December 3, 2018, 11:54 amHi,
Is that possible just remove this OSDs of cluster? Because we have other 3 Petasan instances that is possible to work with him.
I just need now to get the files of iscsi storage, because are VMs of clients.
The result of commands:
root@node44:~# systemctl restart ceph-osd@3
Job for ceph-osd@3.service failed because the control process exited with error code. See "systemctl status ceph-osd@3.service" and "journalctl -xe" for details.root@node44:~# /usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
OSD data directory /var/lib/ceph/osd/loks-3 does not exist; bailing out.root@node44:~# /usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup ceph
2018-12-03 08:51:50.169527 7fde0ecb9e00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/loks-3: (2) No such file or directoryThe same output for the two OSDs.
Thanks!
Hi,
Is that possible just remove this OSDs of cluster? Because we have other 3 Petasan instances that is possible to work with him.
I just need now to get the files of iscsi storage, because are VMs of clients.
The result of commands:
root@node44:~# systemctl restart ceph-osd@3
Job for ceph-osd@3.service failed because the control process exited with error code. See "systemctl status ceph-osd@3.service" and "journalctl -xe" for details.
root@node44:~# /usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
OSD data directory /var/lib/ceph/osd/loks-3 does not exist; bailing out.
root@node44:~# /usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup ceph
2018-12-03 08:51:50.169527 7fde0ecb9e00 -1 ** ERROR: unable to open OSD superblock on /var/lib/ceph/osd/loks-3: (2) No such file or directory
The same output for the two OSDs.
Thanks!
admin
2,930 Posts
Quote from admin on December 3, 2018, 2:43 pmNo you can not move the 2 OSDs to another cluster. You can remove them from their hosts and put them in other hosts within the same PetaSAN cluster if you think the problem is with the host node rather than the disks themselves.
Can you send me the the output of the first 2 commands in my prev post, are OSD 2 and 3 on same host or different ?
edit /etc/ceph/loks.conf
and temporarily modify the "mon_host = " to exclude the ip address of the failed monfind the disk /dev/SDX where OSD 2 and OSD 3 are on via the ui or via the cli
ceph-disk list
Mount their first/metada partition:
mount /dev/sdX1 /var/lib/ceph/osd/loks-2
where sdX is the disk for OSD 2mount /dev/sdY1 /var/lib/ceph/osd/loks-3
where sdY is the disk for OSD 3After mounting, try to start the OSDs manually and see what error you get on console:
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 2
/usr/bin/ceph-osd -f --cluster loks --id 2 --setuser ceph --setgroup ceph
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
/usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup cephyou can also find other error log in
/var/log/ceph/loks-osd.2.log
/var/log/ceph/loks-osd.3.log
No you can not move the 2 OSDs to another cluster. You can remove them from their hosts and put them in other hosts within the same PetaSAN cluster if you think the problem is with the host node rather than the disks themselves.
Can you send me the the output of the first 2 commands in my prev post, are OSD 2 and 3 on same host or different ?
edit /etc/ceph/loks.conf
and temporarily modify the "mon_host = " to exclude the ip address of the failed mon
find the disk /dev/SDX where OSD 2 and OSD 3 are on via the ui or via the cli
ceph-disk list
Mount their first/metada partition:
mount /dev/sdX1 /var/lib/ceph/osd/loks-2
where sdX is the disk for OSD 2mount /dev/sdY1 /var/lib/ceph/osd/loks-3
where sdY is the disk for OSD 3
After mounting, try to start the OSDs manually and see what error you get on console:
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 2
/usr/bin/ceph-osd -f --cluster loks --id 2 --setuser ceph --setgroup ceph
/usr/lib/ceph/ceph-osd-prestart.sh --cluster loks --id 3
/usr/bin/ceph-osd -f --cluster loks --id 3 --setuser ceph --setgroup ceph
you can also find other error log in
/var/log/ceph/loks-osd.2.log
/var/log/ceph/loks-osd.3.log
Lerner
7 Posts
Quote from Lerner on December 4, 2018, 1:42 amHi,
I replaced the node that was gave problem (node 48).
Follow the output:
root@node41:~# ceph status --cluster loks
cluster:
id: c8497f53-4077-4359-8d50-7439c0d2760f
health: HEALTH_WARN
Reduced data availability: 55 pgs inactive, 55 pgs incomplete
Degraded data redundancy: 55 pgs uncleanservices:
mon: 3 daemons, quorum node41,node44,node48
mgr: node44(active), standbys: node41, node48
osd: 7 osds: 7 up, 7 indata:
pools: 1 pools, 256 pgs
objects: 151k objects, 604 GB
usage: 1216 GB used, 5482 GB / 6699 GB avail
pgs: 21.484% pgs not active
198 active+clean
55 incomplete
2 active+clean+scrubbing+deep
1 active+clean+scrubbing
root@node41:~# ceph osd tree --cluster loks
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 6.54225 root default
-7 1.81918 host node41
4 hdd 0.90959 osd.4 up 1.00000 1.00000
5 hdd 0.90959 osd.5 up 1.00000 1.00000
-9 0.90709 host node42
6 hdd 0.90709 osd.6 up 1.00000 1.00000
-3 1.81619 host node44
0 hdd 0.90810 osd.0 up 1.00000 1.00000
1 hdd 0.90810 osd.1 up 1.00000 1.00000
-5 1.99979 host node48
3 hdd 0.99989 osd.3 up 1.00000 1.00000
7 hdd 0.99989 osd.7 up 1.00000 1.00000
root@node41:~#
Thanks!
Hi,
I replaced the node that was gave problem (node 48).
Follow the output:
root@node41:~# ceph status --cluster loks
cluster:
id: c8497f53-4077-4359-8d50-7439c0d2760f
health: HEALTH_WARN
Reduced data availability: 55 pgs inactive, 55 pgs incomplete
Degraded data redundancy: 55 pgs unclean
services:
mon: 3 daemons, quorum node41,node44,node48
mgr: node44(active), standbys: node41, node48
osd: 7 osds: 7 up, 7 in
data:
pools: 1 pools, 256 pgs
objects: 151k objects, 604 GB
usage: 1216 GB used, 5482 GB / 6699 GB avail
pgs: 21.484% pgs not active
198 active+clean
55 incomplete
2 active+clean+scrubbing+deep
1 active+clean+scrubbing
root@node41:~# ceph osd tree --cluster loks
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 6.54225 root default
-7 1.81918 host node41
4 hdd 0.90959 osd.4 up 1.00000 1.00000
5 hdd 0.90959 osd.5 up 1.00000 1.00000
-9 0.90709 host node42
6 hdd 0.90709 osd.6 up 1.00000 1.00000
-3 1.81619 host node44
0 hdd 0.90810 osd.0 up 1.00000 1.00000
1 hdd 0.90810 osd.1 up 1.00000 1.00000
-5 1.99979 host node48
3 hdd 0.99989 osd.3 up 1.00000 1.00000
7 hdd 0.99989 osd.7 up 1.00000 1.00000
root@node41:~#
Thanks!
admin
2,930 Posts
Quote from admin on December 4, 2018, 8:56 amHi
What happened to osd 2 ? did you try to start 2&3 per command line ? what was the output ? As stated starting ths osds is the best thing to do. even if they do not start we can retrieve data as long as the physical disks are not corrupt.
Now we need to see why there in-active/in-complete pgs, can you do what was done prev by quering such pgs and look at the the recovery status section as before and post a couple of them if they are different.
Hi
What happened to osd 2 ? did you try to start 2&3 per command line ? what was the output ? As stated starting ths osds is the best thing to do. even if they do not start we can retrieve data as long as the physical disks are not corrupt.
Now we need to see why there in-active/in-complete pgs, can you do what was done prev by quering such pgs and look at the the recovery status section as before and post a couple of them if they are different.