Production nodes goes down randomly !!
pradeep suresh
32 Posts
September 3, 2022, 2:00 pmQuote from pradeep suresh on September 3, 2022, 2:00 pmHi Team,
Recently our petasan nodes went down randomly. We have a 4 node cluster setup and the below was found on the kernal logs.
And we observed few of the SCSI disks restarted , ideally, which is not supposed to happen because of the replication on the other nodes.
I Mainly looked into the kernel logs.
Or What logs should be considered to identify the issue.
LOGS :
Sep 2 14:09:57 ps-node-04 kernel: [81719.499480] libceph: mon2 (1)10.62.0.12:6789 session established
Sep 2 14:09:57 ps-node-04 kernel: [81719.500736] libceph: osd5 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500736] libceph: osd6 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd7 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd8 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd9 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500738] libceph: osd20 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500738] libceph: osd22 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500739] libceph: osd29 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500835] libceph: osd27 down
Sep 2 14:10:11 ps-node-04 kernel: [81733.223594] rbd1: p1
Sep 2 14:10:11 ps-node-04 kernel: [81733.223598] rbd1: p1 size 629137409 extends beyond EOD, truncated
Sep 2 14:10:11 ps-node-04 kernel: [81733.223933] rbd: rbd1: capacity 322122547200 features 0x1
Sep 2 14:10:42 ps-node-04 kernel: [81764.116729] Alternate GPT is invalid, using primary GPT.
Sep 2 14:10:42 ps-node-04 kernel: [81764.116734] rbd2: p1
Sep 2 14:10:42 ps-node-04 kernel: [81764.117107] rbd: rbd2: capacity 268435456000 features 0x1
Sep 2 14:11:13 ps-node-04 kernel: [81795.162411] Alternate GPT is invalid, using primary GPT.
Sep 2 14:11:13 ps-node-04 kernel: [81795.162417] rbd3: p1
Sep 2 14:11:13 ps-node-04 kernel: [81795.162793] rbd: rbd3: capacity 4398046511104 features 0x1
Sep 2 14:11:43 ps-node-04 kernel: [81825.825304] rbd4: p1
Sep 2 14:11:43 ps-node-04 kernel: [81825.825308] rbd4: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:11:43 ps-node-04 kernel: [81825.825678] rbd: rbd4: capacity 1099511627776 features 0x1
Sep 2 14:12:12 ps-node-04 kernel: [81854.905845] rbd: rbd5: capacity 2199023255552 features 0x1
Sep 2 14:12:41 ps-node-04 kernel: [81883.824586] rbd6: p1
Sep 2 14:12:41 ps-node-04 kernel: [81883.824589] rbd6: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:12:41 ps-node-04 kernel: [81883.824945] rbd: rbd6: capacity 268435456000 features 0x1
Sep 2 14:13:12 ps-node-04 kernel: [81914.802699] rbd7: p1
Sep 2 14:13:12 ps-node-04 kernel: [81914.802704] rbd7: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:13:12 ps-node-04 kernel: [81914.803224] rbd: rbd7: capacity 268435456000 features 0x1
Sep 2 14:13:41 ps-node-04 kernel: [81943.510862] rbd8: p1
Sep 2 14:13:41 ps-node-04 kernel: [81943.510867] rbd8: p1 size 629137409 extends beyond EOD, truncated
Sep 2 14:13:41 ps-node-04 kernel: [81943.511261] rbd: rbd8: capacity 322122547200 features 0x1
Sep 2 14:14:12 ps-node-04 kernel: [81974.193978] rbd9: p1
Sep 2 14:14:12 ps-node-04 kernel: [81974.193982] rbd9: p1 size 314564609 extends beyond EOD, truncated
Sep 2 14:14:12 ps-node-04 kernel: [81974.194343] rbd: rbd9: capacity 161061273600 features 0x1
Sep 2 14:14:43 ps-node-04 kernel: [82005.879924] rbd10: p1
Sep 2 14:14:43 ps-node-04 kernel: [82005.879927] rbd10: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:14:43 ps-node-04 kernel: [82005.880274] rbd: rbd10: capacity 268435456000 features 0x1
Sep 2 14:15:13 ps-node-04 kernel: [82035.425422] rbd11: p1
Sep 2 14:15:13 ps-node-04 kernel: [82035.425426] rbd11: p1 size 314564609 extends beyond EOD, truncated
Sep 2 14:15:13 ps-node-04 kernel: [82035.425793] rbd: rbd11: capacity 161061273600 features 0x1
Sep 2 14:15:44 ps-node-04 kernel: [82066.376598] Alternate GPT is invalid, using primary GPT.
Sep 2 14:15:44 ps-node-04 kernel: [82066.376602] rbd12:
Sep 2 14:15:44 ps-node-04 kernel: [82066.376975] rbd: rbd12: capacity 966367641600 features 0x1
Sep 2 14:16:16 ps-node-04 kernel: [82098.085441] rbd13: p1
Sep 2 14:16:16 ps-node-04 kernel: [82098.085446] rbd13: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:16:16 ps-node-04 kernel: [82098.085839] rbd: rbd13: capacity 1099511627776 features 0x1
Sep 2 14:16:44 ps-node-04 kernel: [82126.845416] Alternate GPT is invalid, using primary GPT.
Sep 2 14:16:44 ps-node-04 kernel: [82126.845422] rbd14: p1
Sep 2 14:16:44 ps-node-04 kernel: [82126.845798] rbd: rbd14: capacity 322122547200 features 0x1
Sep 2 14:17:15 ps-node-04 kernel: [82157.251991] rbd15: p1
Sep 2 14:17:15 ps-node-04 kernel: [82157.252362] rbd: rbd15: capacity 2199023255552 features 0x1
Sep 2 14:18:00 ps-node-04 kernel: [82202.144512] rbd16: p1
Sep 2 14:18:00 ps-node-04 kernel: [82202.144516] rbd16: p1 size 1048567809 extends beyond EOD, truncated
Sep 2 14:18:00 ps-node-04 kernel: [82202.144833] rbd: rbd16: capacity 536870912000 features 0x1
Sep 2 14:18:29 ps-node-04 kernel: [82231.742030] sd 0:0:2:0: [sdc] tag#0 Sense Key : Recovered Error [current] [descriptor]
Sep 2 14:18:29 ps-node-04 kernel: [82231.742033] sd 0:0:2:0: [sdc] tag#0 Add. Sense: Defect list not found
Sep 2 14:18:31 ps-node-04 kernel: [82233.448213] rbd17: p1
Sep 2 14:18:31 ps-node-04 kernel: [82233.448217] rbd17: p1 size 1048567809 extends beyond EOD, truncated
Sep 2 14:18:31 ps-node-04 kernel: [82233.448706] rbd: rbd17: capacity 536870912000 features 0x1
Sep 2 14:19:02 ps-node-04 kernel: [82264.784578] rbd18: p1
Sep 2 14:19:02 ps-node-04 kernel: [82264.784582] rbd18: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:19:02 ps-node-04 kernel: [82264.784944] rbd: rbd18: capacity 1099511627776 features 0x1
Sep 2 14:19:32 ps-node-04 kernel: [82294.412074] rbd19: p1
Sep 2 14:19:32 ps-node-04 kernel: [82294.412079] rbd19: p1 size 1258283009 extends beyond EOD, truncated
Sep 2 14:19:32 ps-node-04 kernel: [82294.412465] rbd: rbd19: capacity 644245094400 features 0x1
Sep 2 14:19:39 ps-node-04 kernel: [82301.201325] libceph: osd5 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201327] libceph: osd6 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd7 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd8 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd9 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201329] libceph: osd20 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201330] libceph: osd22 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201330] libceph: osd27 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201331] libceph: osd29 weight 0x0 (out)
Hi Team,
Recently our petasan nodes went down randomly. We have a 4 node cluster setup and the below was found on the kernal logs.
And we observed few of the SCSI disks restarted , ideally, which is not supposed to happen because of the replication on the other nodes.
I Mainly looked into the kernel logs.
Or What logs should be considered to identify the issue.
LOGS :
Sep 2 14:09:57 ps-node-04 kernel: [81719.499480] libceph: mon2 (1)10.62.0.12:6789 session established
Sep 2 14:09:57 ps-node-04 kernel: [81719.500736] libceph: osd5 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500736] libceph: osd6 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd7 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd8 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd9 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500738] libceph: osd20 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500738] libceph: osd22 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500739] libceph: osd29 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500835] libceph: osd27 down
Sep 2 14:10:11 ps-node-04 kernel: [81733.223594] rbd1: p1
Sep 2 14:10:11 ps-node-04 kernel: [81733.223598] rbd1: p1 size 629137409 extends beyond EOD, truncated
Sep 2 14:10:11 ps-node-04 kernel: [81733.223933] rbd: rbd1: capacity 322122547200 features 0x1
Sep 2 14:10:42 ps-node-04 kernel: [81764.116729] Alternate GPT is invalid, using primary GPT.
Sep 2 14:10:42 ps-node-04 kernel: [81764.116734] rbd2: p1
Sep 2 14:10:42 ps-node-04 kernel: [81764.117107] rbd: rbd2: capacity 268435456000 features 0x1
Sep 2 14:11:13 ps-node-04 kernel: [81795.162411] Alternate GPT is invalid, using primary GPT.
Sep 2 14:11:13 ps-node-04 kernel: [81795.162417] rbd3: p1
Sep 2 14:11:13 ps-node-04 kernel: [81795.162793] rbd: rbd3: capacity 4398046511104 features 0x1
Sep 2 14:11:43 ps-node-04 kernel: [81825.825304] rbd4: p1
Sep 2 14:11:43 ps-node-04 kernel: [81825.825308] rbd4: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:11:43 ps-node-04 kernel: [81825.825678] rbd: rbd4: capacity 1099511627776 features 0x1
Sep 2 14:12:12 ps-node-04 kernel: [81854.905845] rbd: rbd5: capacity 2199023255552 features 0x1
Sep 2 14:12:41 ps-node-04 kernel: [81883.824586] rbd6: p1
Sep 2 14:12:41 ps-node-04 kernel: [81883.824589] rbd6: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:12:41 ps-node-04 kernel: [81883.824945] rbd: rbd6: capacity 268435456000 features 0x1
Sep 2 14:13:12 ps-node-04 kernel: [81914.802699] rbd7: p1
Sep 2 14:13:12 ps-node-04 kernel: [81914.802704] rbd7: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:13:12 ps-node-04 kernel: [81914.803224] rbd: rbd7: capacity 268435456000 features 0x1
Sep 2 14:13:41 ps-node-04 kernel: [81943.510862] rbd8: p1
Sep 2 14:13:41 ps-node-04 kernel: [81943.510867] rbd8: p1 size 629137409 extends beyond EOD, truncated
Sep 2 14:13:41 ps-node-04 kernel: [81943.511261] rbd: rbd8: capacity 322122547200 features 0x1
Sep 2 14:14:12 ps-node-04 kernel: [81974.193978] rbd9: p1
Sep 2 14:14:12 ps-node-04 kernel: [81974.193982] rbd9: p1 size 314564609 extends beyond EOD, truncated
Sep 2 14:14:12 ps-node-04 kernel: [81974.194343] rbd: rbd9: capacity 161061273600 features 0x1
Sep 2 14:14:43 ps-node-04 kernel: [82005.879924] rbd10: p1
Sep 2 14:14:43 ps-node-04 kernel: [82005.879927] rbd10: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:14:43 ps-node-04 kernel: [82005.880274] rbd: rbd10: capacity 268435456000 features 0x1
Sep 2 14:15:13 ps-node-04 kernel: [82035.425422] rbd11: p1
Sep 2 14:15:13 ps-node-04 kernel: [82035.425426] rbd11: p1 size 314564609 extends beyond EOD, truncated
Sep 2 14:15:13 ps-node-04 kernel: [82035.425793] rbd: rbd11: capacity 161061273600 features 0x1
Sep 2 14:15:44 ps-node-04 kernel: [82066.376598] Alternate GPT is invalid, using primary GPT.
Sep 2 14:15:44 ps-node-04 kernel: [82066.376602] rbd12:
Sep 2 14:15:44 ps-node-04 kernel: [82066.376975] rbd: rbd12: capacity 966367641600 features 0x1
Sep 2 14:16:16 ps-node-04 kernel: [82098.085441] rbd13: p1
Sep 2 14:16:16 ps-node-04 kernel: [82098.085446] rbd13: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:16:16 ps-node-04 kernel: [82098.085839] rbd: rbd13: capacity 1099511627776 features 0x1
Sep 2 14:16:44 ps-node-04 kernel: [82126.845416] Alternate GPT is invalid, using primary GPT.
Sep 2 14:16:44 ps-node-04 kernel: [82126.845422] rbd14: p1
Sep 2 14:16:44 ps-node-04 kernel: [82126.845798] rbd: rbd14: capacity 322122547200 features 0x1
Sep 2 14:17:15 ps-node-04 kernel: [82157.251991] rbd15: p1
Sep 2 14:17:15 ps-node-04 kernel: [82157.252362] rbd: rbd15: capacity 2199023255552 features 0x1
Sep 2 14:18:00 ps-node-04 kernel: [82202.144512] rbd16: p1
Sep 2 14:18:00 ps-node-04 kernel: [82202.144516] rbd16: p1 size 1048567809 extends beyond EOD, truncated
Sep 2 14:18:00 ps-node-04 kernel: [82202.144833] rbd: rbd16: capacity 536870912000 features 0x1
Sep 2 14:18:29 ps-node-04 kernel: [82231.742030] sd 0:0:2:0: [sdc] tag#0 Sense Key : Recovered Error [current] [descriptor]
Sep 2 14:18:29 ps-node-04 kernel: [82231.742033] sd 0:0:2:0: [sdc] tag#0 Add. Sense: Defect list not found
Sep 2 14:18:31 ps-node-04 kernel: [82233.448213] rbd17: p1
Sep 2 14:18:31 ps-node-04 kernel: [82233.448217] rbd17: p1 size 1048567809 extends beyond EOD, truncated
Sep 2 14:18:31 ps-node-04 kernel: [82233.448706] rbd: rbd17: capacity 536870912000 features 0x1
Sep 2 14:19:02 ps-node-04 kernel: [82264.784578] rbd18: p1
Sep 2 14:19:02 ps-node-04 kernel: [82264.784582] rbd18: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:19:02 ps-node-04 kernel: [82264.784944] rbd: rbd18: capacity 1099511627776 features 0x1
Sep 2 14:19:32 ps-node-04 kernel: [82294.412074] rbd19: p1
Sep 2 14:19:32 ps-node-04 kernel: [82294.412079] rbd19: p1 size 1258283009 extends beyond EOD, truncated
Sep 2 14:19:32 ps-node-04 kernel: [82294.412465] rbd: rbd19: capacity 644245094400 features 0x1
Sep 2 14:19:39 ps-node-04 kernel: [82301.201325] libceph: osd5 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201327] libceph: osd6 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd7 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd8 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd9 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201329] libceph: osd20 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201330] libceph: osd22 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201330] libceph: osd27 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201331] libceph: osd29 weight 0x0 (out)
admin
2,930 Posts
September 4, 2022, 10:27 amQuote from admin on September 4, 2022, 10:27 amLook at errors on dashboard Cluster Status, is it ok or does it show error ?
show ceph status:
ceph status
ceph health detail
disable iSCSI fencing or reduce the backfill speed in maintenance tab if your disks are stressed by recovery.
Look at errors on dashboard Cluster Status, is it ok or does it show error ?
show ceph status:
ceph status
ceph health detail
disable iSCSI fencing or reduce the backfill speed in maintenance tab if your disks are stressed by recovery.
Production nodes goes down randomly !!
pradeep suresh
32 Posts
Quote from pradeep suresh on September 3, 2022, 2:00 pmHi Team,
Recently our petasan nodes went down randomly. We have a 4 node cluster setup and the below was found on the kernal logs.
And we observed few of the SCSI disks restarted , ideally, which is not supposed to happen because of the replication on the other nodes.
I Mainly looked into the kernel logs.
Or What logs should be considered to identify the issue.LOGS :
Sep 2 14:09:57 ps-node-04 kernel: [81719.499480] libceph: mon2 (1)10.62.0.12:6789 session established
Sep 2 14:09:57 ps-node-04 kernel: [81719.500736] libceph: osd5 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500736] libceph: osd6 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd7 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd8 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd9 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500738] libceph: osd20 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500738] libceph: osd22 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500739] libceph: osd29 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500835] libceph: osd27 down
Sep 2 14:10:11 ps-node-04 kernel: [81733.223594] rbd1: p1
Sep 2 14:10:11 ps-node-04 kernel: [81733.223598] rbd1: p1 size 629137409 extends beyond EOD, truncated
Sep 2 14:10:11 ps-node-04 kernel: [81733.223933] rbd: rbd1: capacity 322122547200 features 0x1
Sep 2 14:10:42 ps-node-04 kernel: [81764.116729] Alternate GPT is invalid, using primary GPT.
Sep 2 14:10:42 ps-node-04 kernel: [81764.116734] rbd2: p1
Sep 2 14:10:42 ps-node-04 kernel: [81764.117107] rbd: rbd2: capacity 268435456000 features 0x1
Sep 2 14:11:13 ps-node-04 kernel: [81795.162411] Alternate GPT is invalid, using primary GPT.
Sep 2 14:11:13 ps-node-04 kernel: [81795.162417] rbd3: p1
Sep 2 14:11:13 ps-node-04 kernel: [81795.162793] rbd: rbd3: capacity 4398046511104 features 0x1
Sep 2 14:11:43 ps-node-04 kernel: [81825.825304] rbd4: p1
Sep 2 14:11:43 ps-node-04 kernel: [81825.825308] rbd4: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:11:43 ps-node-04 kernel: [81825.825678] rbd: rbd4: capacity 1099511627776 features 0x1
Sep 2 14:12:12 ps-node-04 kernel: [81854.905845] rbd: rbd5: capacity 2199023255552 features 0x1
Sep 2 14:12:41 ps-node-04 kernel: [81883.824586] rbd6: p1
Sep 2 14:12:41 ps-node-04 kernel: [81883.824589] rbd6: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:12:41 ps-node-04 kernel: [81883.824945] rbd: rbd6: capacity 268435456000 features 0x1
Sep 2 14:13:12 ps-node-04 kernel: [81914.802699] rbd7: p1
Sep 2 14:13:12 ps-node-04 kernel: [81914.802704] rbd7: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:13:12 ps-node-04 kernel: [81914.803224] rbd: rbd7: capacity 268435456000 features 0x1
Sep 2 14:13:41 ps-node-04 kernel: [81943.510862] rbd8: p1
Sep 2 14:13:41 ps-node-04 kernel: [81943.510867] rbd8: p1 size 629137409 extends beyond EOD, truncated
Sep 2 14:13:41 ps-node-04 kernel: [81943.511261] rbd: rbd8: capacity 322122547200 features 0x1
Sep 2 14:14:12 ps-node-04 kernel: [81974.193978] rbd9: p1
Sep 2 14:14:12 ps-node-04 kernel: [81974.193982] rbd9: p1 size 314564609 extends beyond EOD, truncated
Sep 2 14:14:12 ps-node-04 kernel: [81974.194343] rbd: rbd9: capacity 161061273600 features 0x1
Sep 2 14:14:43 ps-node-04 kernel: [82005.879924] rbd10: p1
Sep 2 14:14:43 ps-node-04 kernel: [82005.879927] rbd10: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:14:43 ps-node-04 kernel: [82005.880274] rbd: rbd10: capacity 268435456000 features 0x1
Sep 2 14:15:13 ps-node-04 kernel: [82035.425422] rbd11: p1
Sep 2 14:15:13 ps-node-04 kernel: [82035.425426] rbd11: p1 size 314564609 extends beyond EOD, truncated
Sep 2 14:15:13 ps-node-04 kernel: [82035.425793] rbd: rbd11: capacity 161061273600 features 0x1
Sep 2 14:15:44 ps-node-04 kernel: [82066.376598] Alternate GPT is invalid, using primary GPT.
Sep 2 14:15:44 ps-node-04 kernel: [82066.376602] rbd12:
Sep 2 14:15:44 ps-node-04 kernel: [82066.376975] rbd: rbd12: capacity 966367641600 features 0x1
Sep 2 14:16:16 ps-node-04 kernel: [82098.085441] rbd13: p1
Sep 2 14:16:16 ps-node-04 kernel: [82098.085446] rbd13: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:16:16 ps-node-04 kernel: [82098.085839] rbd: rbd13: capacity 1099511627776 features 0x1
Sep 2 14:16:44 ps-node-04 kernel: [82126.845416] Alternate GPT is invalid, using primary GPT.
Sep 2 14:16:44 ps-node-04 kernel: [82126.845422] rbd14: p1
Sep 2 14:16:44 ps-node-04 kernel: [82126.845798] rbd: rbd14: capacity 322122547200 features 0x1
Sep 2 14:17:15 ps-node-04 kernel: [82157.251991] rbd15: p1
Sep 2 14:17:15 ps-node-04 kernel: [82157.252362] rbd: rbd15: capacity 2199023255552 features 0x1
Sep 2 14:18:00 ps-node-04 kernel: [82202.144512] rbd16: p1
Sep 2 14:18:00 ps-node-04 kernel: [82202.144516] rbd16: p1 size 1048567809 extends beyond EOD, truncated
Sep 2 14:18:00 ps-node-04 kernel: [82202.144833] rbd: rbd16: capacity 536870912000 features 0x1
Sep 2 14:18:29 ps-node-04 kernel: [82231.742030] sd 0:0:2:0: [sdc] tag#0 Sense Key : Recovered Error [current] [descriptor]
Sep 2 14:18:29 ps-node-04 kernel: [82231.742033] sd 0:0:2:0: [sdc] tag#0 Add. Sense: Defect list not found
Sep 2 14:18:31 ps-node-04 kernel: [82233.448213] rbd17: p1
Sep 2 14:18:31 ps-node-04 kernel: [82233.448217] rbd17: p1 size 1048567809 extends beyond EOD, truncated
Sep 2 14:18:31 ps-node-04 kernel: [82233.448706] rbd: rbd17: capacity 536870912000 features 0x1
Sep 2 14:19:02 ps-node-04 kernel: [82264.784578] rbd18: p1
Sep 2 14:19:02 ps-node-04 kernel: [82264.784582] rbd18: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:19:02 ps-node-04 kernel: [82264.784944] rbd: rbd18: capacity 1099511627776 features 0x1
Sep 2 14:19:32 ps-node-04 kernel: [82294.412074] rbd19: p1
Sep 2 14:19:32 ps-node-04 kernel: [82294.412079] rbd19: p1 size 1258283009 extends beyond EOD, truncated
Sep 2 14:19:32 ps-node-04 kernel: [82294.412465] rbd: rbd19: capacity 644245094400 features 0x1
Sep 2 14:19:39 ps-node-04 kernel: [82301.201325] libceph: osd5 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201327] libceph: osd6 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd7 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd8 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd9 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201329] libceph: osd20 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201330] libceph: osd22 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201330] libceph: osd27 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201331] libceph: osd29 weight 0x0 (out)
Hi Team,
Recently our petasan nodes went down randomly. We have a 4 node cluster setup and the below was found on the kernal logs.
And we observed few of the SCSI disks restarted , ideally, which is not supposed to happen because of the replication on the other nodes.
I Mainly looked into the kernel logs.
Or What logs should be considered to identify the issue.
LOGS :
Sep 2 14:09:57 ps-node-04 kernel: [81719.499480] libceph: mon2 (1)10.62.0.12:6789 session established
Sep 2 14:09:57 ps-node-04 kernel: [81719.500736] libceph: osd5 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500736] libceph: osd6 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd7 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd8 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500737] libceph: osd9 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500738] libceph: osd20 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500738] libceph: osd22 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500739] libceph: osd29 down
Sep 2 14:09:57 ps-node-04 kernel: [81719.500835] libceph: osd27 down
Sep 2 14:10:11 ps-node-04 kernel: [81733.223594] rbd1: p1
Sep 2 14:10:11 ps-node-04 kernel: [81733.223598] rbd1: p1 size 629137409 extends beyond EOD, truncated
Sep 2 14:10:11 ps-node-04 kernel: [81733.223933] rbd: rbd1: capacity 322122547200 features 0x1
Sep 2 14:10:42 ps-node-04 kernel: [81764.116729] Alternate GPT is invalid, using primary GPT.
Sep 2 14:10:42 ps-node-04 kernel: [81764.116734] rbd2: p1
Sep 2 14:10:42 ps-node-04 kernel: [81764.117107] rbd: rbd2: capacity 268435456000 features 0x1
Sep 2 14:11:13 ps-node-04 kernel: [81795.162411] Alternate GPT is invalid, using primary GPT.
Sep 2 14:11:13 ps-node-04 kernel: [81795.162417] rbd3: p1
Sep 2 14:11:13 ps-node-04 kernel: [81795.162793] rbd: rbd3: capacity 4398046511104 features 0x1
Sep 2 14:11:43 ps-node-04 kernel: [81825.825304] rbd4: p1
Sep 2 14:11:43 ps-node-04 kernel: [81825.825308] rbd4: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:11:43 ps-node-04 kernel: [81825.825678] rbd: rbd4: capacity 1099511627776 features 0x1
Sep 2 14:12:12 ps-node-04 kernel: [81854.905845] rbd: rbd5: capacity 2199023255552 features 0x1
Sep 2 14:12:41 ps-node-04 kernel: [81883.824586] rbd6: p1
Sep 2 14:12:41 ps-node-04 kernel: [81883.824589] rbd6: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:12:41 ps-node-04 kernel: [81883.824945] rbd: rbd6: capacity 268435456000 features 0x1
Sep 2 14:13:12 ps-node-04 kernel: [81914.802699] rbd7: p1
Sep 2 14:13:12 ps-node-04 kernel: [81914.802704] rbd7: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:13:12 ps-node-04 kernel: [81914.803224] rbd: rbd7: capacity 268435456000 features 0x1
Sep 2 14:13:41 ps-node-04 kernel: [81943.510862] rbd8: p1
Sep 2 14:13:41 ps-node-04 kernel: [81943.510867] rbd8: p1 size 629137409 extends beyond EOD, truncated
Sep 2 14:13:41 ps-node-04 kernel: [81943.511261] rbd: rbd8: capacity 322122547200 features 0x1
Sep 2 14:14:12 ps-node-04 kernel: [81974.193978] rbd9: p1
Sep 2 14:14:12 ps-node-04 kernel: [81974.193982] rbd9: p1 size 314564609 extends beyond EOD, truncated
Sep 2 14:14:12 ps-node-04 kernel: [81974.194343] rbd: rbd9: capacity 161061273600 features 0x1
Sep 2 14:14:43 ps-node-04 kernel: [82005.879924] rbd10: p1
Sep 2 14:14:43 ps-node-04 kernel: [82005.879927] rbd10: p1 size 524279809 extends beyond EOD, truncated
Sep 2 14:14:43 ps-node-04 kernel: [82005.880274] rbd: rbd10: capacity 268435456000 features 0x1
Sep 2 14:15:13 ps-node-04 kernel: [82035.425422] rbd11: p1
Sep 2 14:15:13 ps-node-04 kernel: [82035.425426] rbd11: p1 size 314564609 extends beyond EOD, truncated
Sep 2 14:15:13 ps-node-04 kernel: [82035.425793] rbd: rbd11: capacity 161061273600 features 0x1
Sep 2 14:15:44 ps-node-04 kernel: [82066.376598] Alternate GPT is invalid, using primary GPT.
Sep 2 14:15:44 ps-node-04 kernel: [82066.376602] rbd12:
Sep 2 14:15:44 ps-node-04 kernel: [82066.376975] rbd: rbd12: capacity 966367641600 features 0x1
Sep 2 14:16:16 ps-node-04 kernel: [82098.085441] rbd13: p1
Sep 2 14:16:16 ps-node-04 kernel: [82098.085446] rbd13: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:16:16 ps-node-04 kernel: [82098.085839] rbd: rbd13: capacity 1099511627776 features 0x1
Sep 2 14:16:44 ps-node-04 kernel: [82126.845416] Alternate GPT is invalid, using primary GPT.
Sep 2 14:16:44 ps-node-04 kernel: [82126.845422] rbd14: p1
Sep 2 14:16:44 ps-node-04 kernel: [82126.845798] rbd: rbd14: capacity 322122547200 features 0x1
Sep 2 14:17:15 ps-node-04 kernel: [82157.251991] rbd15: p1
Sep 2 14:17:15 ps-node-04 kernel: [82157.252362] rbd: rbd15: capacity 2199023255552 features 0x1
Sep 2 14:18:00 ps-node-04 kernel: [82202.144512] rbd16: p1
Sep 2 14:18:00 ps-node-04 kernel: [82202.144516] rbd16: p1 size 1048567809 extends beyond EOD, truncated
Sep 2 14:18:00 ps-node-04 kernel: [82202.144833] rbd: rbd16: capacity 536870912000 features 0x1
Sep 2 14:18:29 ps-node-04 kernel: [82231.742030] sd 0:0:2:0: [sdc] tag#0 Sense Key : Recovered Error [current] [descriptor]
Sep 2 14:18:29 ps-node-04 kernel: [82231.742033] sd 0:0:2:0: [sdc] tag#0 Add. Sense: Defect list not found
Sep 2 14:18:31 ps-node-04 kernel: [82233.448213] rbd17: p1
Sep 2 14:18:31 ps-node-04 kernel: [82233.448217] rbd17: p1 size 1048567809 extends beyond EOD, truncated
Sep 2 14:18:31 ps-node-04 kernel: [82233.448706] rbd: rbd17: capacity 536870912000 features 0x1
Sep 2 14:19:02 ps-node-04 kernel: [82264.784578] rbd18: p1
Sep 2 14:19:02 ps-node-04 kernel: [82264.784582] rbd18: p1 size 2147475457 extends beyond EOD, truncated
Sep 2 14:19:02 ps-node-04 kernel: [82264.784944] rbd: rbd18: capacity 1099511627776 features 0x1
Sep 2 14:19:32 ps-node-04 kernel: [82294.412074] rbd19: p1
Sep 2 14:19:32 ps-node-04 kernel: [82294.412079] rbd19: p1 size 1258283009 extends beyond EOD, truncated
Sep 2 14:19:32 ps-node-04 kernel: [82294.412465] rbd: rbd19: capacity 644245094400 features 0x1
Sep 2 14:19:39 ps-node-04 kernel: [82301.201325] libceph: osd5 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201327] libceph: osd6 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd7 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd8 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201328] libceph: osd9 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201329] libceph: osd20 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201330] libceph: osd22 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201330] libceph: osd27 weight 0x0 (out)
Sep 2 14:19:39 ps-node-04 kernel: [82301.201331] libceph: osd29 weight 0x0 (out)
admin
2,930 Posts
Quote from admin on September 4, 2022, 10:27 amLook at errors on dashboard Cluster Status, is it ok or does it show error ?
show ceph status:
ceph status
ceph health detail
disable iSCSI fencing or reduce the backfill speed in maintenance tab if your disks are stressed by recovery.
Look at errors on dashboard Cluster Status, is it ok or does it show error ?
show ceph status:
ceph status
ceph health detail
disable iSCSI fencing or reduce the backfill speed in maintenance tab if your disks are stressed by recovery.