Vlan configuration
erazmus
40 Posts
March 13, 2018, 6:26 pmQuote from erazmus on March 13, 2018, 6:26 pm
Quote from admin on March 13, 2018, 6:08 pm
Hi,
Can you run the du command to see how much data these OSDs have for the problem PG.
Sadly 0 bytes for all of them
Quote from admin on March 13, 2018, 6:08 pm
Hi,
Can you run the du command to see how much data these OSDs have for the problem PG.
Sadly 0 bytes for all of them
admin
2,930 Posts
March 13, 2018, 6:52 pmQuote from admin on March 13, 2018, 6:52 pmThe original copies of PG 1.e0e were on 52,56,65 so they are all gone. The current assigned OSDs 2,35,23 do not have any copies. it does not look good. We can tell Ceph to forget about trying to find the data but will end up with empty data for this PG, the disks will have 1/4096 of data lost and the file system may or may not repair.
The original copies of PG 1.e0e were on 52,56,65 so they are all gone. The current assigned OSDs 2,35,23 do not have any copies. it does not look good. We can tell Ceph to forget about trying to find the data but will end up with empty data for this PG, the disks will have 1/4096 of data lost and the file system may or may not repair.
erazmus
40 Posts
March 13, 2018, 7:06 pmQuote from erazmus on March 13, 2018, 7:06 pmI'm okay with data loss. This is just backup of backup.
I'm okay with data loss. This is just backup of backup.
admin
2,930 Posts
March 13, 2018, 7:40 pmQuote from admin on March 13, 2018, 7:40 pmDo you want start with a fresh new pool ( erase all ) or maintain existing with some pg data gone ?
Do you want start with a fresh new pool ( erase all ) or maintain existing with some pg data gone ?
erazmus
40 Posts
March 13, 2018, 8:06 pmQuote from erazmus on March 13, 2018, 8:06 pmI wouldn't mind attempting to repair, because it will reduce the amount of data I need to ship, but if it doesn't work, then an erase all will be option 2.
I wouldn't mind attempting to repair, because it will reduce the amount of data I need to ship, but if it doesn't work, then an erase all will be option 2.
admin
2,930 Posts
March 13, 2018, 9:00 pmQuote from admin on March 13, 2018, 9:00 pmAccording to: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012771.html
The general consensus from those threads is that as long as down_osds_we_would_probe is pointing to any OSD that can't be reached, those PGs will remain stuck incomplete and can't be cured by force_create_pg or even "ceph osd lost".
So we need to create new empty OSDs with the same IDs of the stuck OSD 52,56,65
The command to create an OSD with specific id, is quite lengthy:
http://docs.ceph.com/docs/jewel/install/manual-deployment/
see the ADDING OSDS -> LONG FORM section
The fresh approach (delete all) is
ceph osd pool delete rbd rbd --yes-i-really-really-mean-it --cluster CLUSTER_NAME
ceph osd pool create rbd 4096 4096 --cluster CLUSTER_NAME
ceph osd pool set rbd size 3 --cluster CLUSTER_NAME
ceph osd pool set rbd min_size 2 --cluster CLUSTER_NAME
ceph osd pool application enable rbd rbd --cluster CLUSTER_NAME
According to: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012771.html
The general consensus from those threads is that as long as down_osds_we_would_probe is pointing to any OSD that can't be reached, those PGs will remain stuck incomplete and can't be cured by force_create_pg or even "ceph osd lost".
So we need to create new empty OSDs with the same IDs of the stuck OSD 52,56,65
The command to create an OSD with specific id, is quite lengthy:
http://docs.ceph.com/docs/jewel/install/manual-deployment/
see the ADDING OSDS -> LONG FORM section
The fresh approach (delete all) is
ceph osd pool delete rbd rbd --yes-i-really-really-mean-it --cluster CLUSTER_NAME
ceph osd pool create rbd 4096 4096 --cluster CLUSTER_NAME
ceph osd pool set rbd size 3 --cluster CLUSTER_NAME
ceph osd pool set rbd min_size 2 --cluster CLUSTER_NAME
ceph osd pool application enable rbd rbd --cluster CLUSTER_NAME
Last edited on March 13, 2018, 9:01 pm by admin · #46
erazmus
40 Posts
March 13, 2018, 9:36 pmQuote from erazmus on March 13, 2018, 9:36 pmThanks. I looked at the manual creation option, but there were too many unknowns in there for me, so I've opted for the delete all.
Question - I'm unable to create any OSDs - it says they are creating but they don't show up. Is this because of the hybrid 1.4/2.0 state I'm in? Should I complete migrating my nodes all to 2.0 before creating any OIDs (all of the deleted drives), or do I have another problem stopping me from creating OSDs?
Thanks. I looked at the manual creation option, but there were too many unknowns in there for me, so I've opted for the delete all.
Question - I'm unable to create any OSDs - it says they are creating but they don't show up. Is this because of the hybrid 1.4/2.0 state I'm in? Should I complete migrating my nodes all to 2.0 before creating any OIDs (all of the deleted drives), or do I have another problem stopping me from creating OSDs?
admin
2,930 Posts
March 13, 2018, 10:07 pmQuote from admin on March 13, 2018, 10:07 pmIs this happening on 2.0 nodes or 1.4 nodes ?
When you upgrade an existing 1.4 node to 2.0 : do all OSDs come up, or some or none ?
Is this happening on 2.0 nodes or 1.4 nodes ?
When you upgrade an existing 1.4 node to 2.0 : do all OSDs come up, or some or none ?
Last edited on March 13, 2018, 10:09 pm by admin · #48
Vlan configuration
erazmus
40 Posts
Quote from erazmus on March 13, 2018, 6:26 pmQuote from admin on March 13, 2018, 6:08 pmHi,
Can you run the du command to see how much data these OSDs have for the problem PG.
Sadly 0 bytes for all of them
Quote from admin on March 13, 2018, 6:08 pmHi,
Can you run the du command to see how much data these OSDs have for the problem PG.
Sadly 0 bytes for all of them
admin
2,930 Posts
Quote from admin on March 13, 2018, 6:52 pmThe original copies of PG 1.e0e were on 52,56,65 so they are all gone. The current assigned OSDs 2,35,23 do not have any copies. it does not look good. We can tell Ceph to forget about trying to find the data but will end up with empty data for this PG, the disks will have 1/4096 of data lost and the file system may or may not repair.
The original copies of PG 1.e0e were on 52,56,65 so they are all gone. The current assigned OSDs 2,35,23 do not have any copies. it does not look good. We can tell Ceph to forget about trying to find the data but will end up with empty data for this PG, the disks will have 1/4096 of data lost and the file system may or may not repair.
erazmus
40 Posts
Quote from erazmus on March 13, 2018, 7:06 pmI'm okay with data loss. This is just backup of backup.
I'm okay with data loss. This is just backup of backup.
admin
2,930 Posts
Quote from admin on March 13, 2018, 7:40 pmDo you want start with a fresh new pool ( erase all ) or maintain existing with some pg data gone ?
Do you want start with a fresh new pool ( erase all ) or maintain existing with some pg data gone ?
erazmus
40 Posts
Quote from erazmus on March 13, 2018, 8:06 pmI wouldn't mind attempting to repair, because it will reduce the amount of data I need to ship, but if it doesn't work, then an erase all will be option 2.
I wouldn't mind attempting to repair, because it will reduce the amount of data I need to ship, but if it doesn't work, then an erase all will be option 2.
admin
2,930 Posts
Quote from admin on March 13, 2018, 9:00 pmAccording to: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012771.html
The general consensus from those threads is that as long as down_osds_we_would_probe is pointing to any OSD that can't be reached, those PGs will remain stuck incomplete and can't be cured by force_create_pg or even "ceph osd lost".
So we need to create new empty OSDs with the same IDs of the stuck OSD 52,56,65
The command to create an OSD with specific id, is quite lengthy:
http://docs.ceph.com/docs/jewel/install/manual-deployment/
see the ADDING OSDS -> LONG FORM sectionThe fresh approach (delete all) is
ceph osd pool delete rbd rbd --yes-i-really-really-mean-it --cluster CLUSTER_NAME
ceph osd pool create rbd 4096 4096 --cluster CLUSTER_NAME
ceph osd pool set rbd size 3 --cluster CLUSTER_NAME
ceph osd pool set rbd min_size 2 --cluster CLUSTER_NAME
ceph osd pool application enable rbd rbd --cluster CLUSTER_NAME
According to: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-September/012771.html
The general consensus from those threads is that as long as down_osds_we_would_probe is pointing to any OSD that can't be reached, those PGs will remain stuck incomplete and can't be cured by force_create_pg or even "ceph osd lost".
So we need to create new empty OSDs with the same IDs of the stuck OSD 52,56,65
The command to create an OSD with specific id, is quite lengthy:
http://docs.ceph.com/docs/jewel/install/manual-deployment/
see the ADDING OSDS -> LONG FORM section
The fresh approach (delete all) is
ceph osd pool delete rbd rbd --yes-i-really-really-mean-it --cluster CLUSTER_NAME
ceph osd pool create rbd 4096 4096 --cluster CLUSTER_NAME
ceph osd pool set rbd size 3 --cluster CLUSTER_NAME
ceph osd pool set rbd min_size 2 --cluster CLUSTER_NAME
ceph osd pool application enable rbd rbd --cluster CLUSTER_NAME
erazmus
40 Posts
Quote from erazmus on March 13, 2018, 9:36 pmThanks. I looked at the manual creation option, but there were too many unknowns in there for me, so I've opted for the delete all.
Question - I'm unable to create any OSDs - it says they are creating but they don't show up. Is this because of the hybrid 1.4/2.0 state I'm in? Should I complete migrating my nodes all to 2.0 before creating any OIDs (all of the deleted drives), or do I have another problem stopping me from creating OSDs?
Thanks. I looked at the manual creation option, but there were too many unknowns in there for me, so I've opted for the delete all.
Question - I'm unable to create any OSDs - it says they are creating but they don't show up. Is this because of the hybrid 1.4/2.0 state I'm in? Should I complete migrating my nodes all to 2.0 before creating any OIDs (all of the deleted drives), or do I have another problem stopping me from creating OSDs?
admin
2,930 Posts
Quote from admin on March 13, 2018, 10:07 pmIs this happening on 2.0 nodes or 1.4 nodes ?
When you upgrade an existing 1.4 node to 2.0 : do all OSDs come up, or some or none ?
Is this happening on 2.0 nodes or 1.4 nodes ?
When you upgrade an existing 1.4 node to 2.0 : do all OSDs come up, or some or none ?