Error parsing config using cli
Pages: 1 2
admin
2,930 Posts
March 30, 2018, 1:21 pmQuote from admin on March 30, 2018, 1:21 pmIt does look this is a bug, maybe timing related..
If this happens only on large images, what sizes roughly ?
If you wait 20 min after stopping a large disk then delete it, does the problem happen.
When you get the add disk/already exist error, can you please show the output of
consul kv get -recurse PetaSAN/Disks
If you see the id of a disk (such as 00001) you just deleted, try force delete it from the consul system
consul kv delete -recurse PetaSAN/Disks/00001
Does this fix it ? can you add disks now ?
Thanks for your help reporting this..
It does look this is a bug, maybe timing related..
If this happens only on large images, what sizes roughly ?
If you wait 20 min after stopping a large disk then delete it, does the problem happen.
When you get the add disk/already exist error, can you please show the output of
consul kv get -recurse PetaSAN/Disks
If you see the id of a disk (such as 00001) you just deleted, try force delete it from the consul system
consul kv delete -recurse PetaSAN/Disks/00001
Does this fix it ? can you add disks now ?
Thanks for your help reporting this..
Last edited on March 30, 2018, 1:23 pm by admin · #11
protocol6v
85 Posts
March 30, 2018, 4:06 pmQuote from protocol6v on March 30, 2018, 4:06 pmOK, so I can't seem to remove the current 100TB disk that I created which means i can't recreate the "Disk already exists" issue. Meanwhile, i created a small 30GB test and was able to mount, unmount and delete it no problem.
I went back, rebooted all nodes, and the 100TB disk is still listed in the WebUI, but unable to delete. Get the error deleting disk. Consul kv get -recurse PetaSAN/Disks produces no output. Interestingly, after running this command, the WebUI removed the Name, IQN and reports active paths as N/A now.
Then tried "detaching" the disk, which succeeded, but then still cannot delete it. Get the generic "Alert! Error Deleting disk."
Then went back, created a 150GB disk with 8 paths, and was able to start it, stop it and delete it. It took much longer to delete the 150GB disk than it did the 30GB, but did eventually do it.
Again, activated all maintenance settings (set them all to off) rebooted all four nodes, set all back to "on", and the 100TB disk still shows in the list as detached with no name, IQN or path status. Consul kv get -recurse command still has no output, and still cannot delete the disk. I tried to re-attach the disk, ( i believe i used a different name than the original) and it attached just fine. Now the output of consul kv get is:
root@bd-ceph-sd1:~# consul kv get -recurse PetaSAN/Disks
PetaSAN/Disks/00001:disk
PetaSAN/Disks/00001/1:bd-ceph-sd1
PetaSAN/Disks/00001/2:bd-ceph-sd2
PetaSAN/Disks/00001/3:bd-ceph-sd4
PetaSAN/Disks/00001/4:bd-ceph-sd1
PetaSAN/Disks/00001/5:bd-ceph-sd4
PetaSAN/Disks/00001/6:bd-ceph-sd3
PetaSAN/Disks/00001/7:bd-ceph-sd3
PetaSAN/Disks/00001/8:bd-ceph-sd2
I am now attempting to delete it again, but this time i'm going to wait an hour or so before touching anything again. Will update again then.
Glad I can be of some sort of help identifying issues. Really love the prospect of this software, and am willing to help improve it any way possible.
OK, so I can't seem to remove the current 100TB disk that I created which means i can't recreate the "Disk already exists" issue. Meanwhile, i created a small 30GB test and was able to mount, unmount and delete it no problem.
I went back, rebooted all nodes, and the 100TB disk is still listed in the WebUI, but unable to delete. Get the error deleting disk. Consul kv get -recurse PetaSAN/Disks produces no output. Interestingly, after running this command, the WebUI removed the Name, IQN and reports active paths as N/A now.
Then tried "detaching" the disk, which succeeded, but then still cannot delete it. Get the generic "Alert! Error Deleting disk."
Then went back, created a 150GB disk with 8 paths, and was able to start it, stop it and delete it. It took much longer to delete the 150GB disk than it did the 30GB, but did eventually do it.
Again, activated all maintenance settings (set them all to off) rebooted all four nodes, set all back to "on", and the 100TB disk still shows in the list as detached with no name, IQN or path status. Consul kv get -recurse command still has no output, and still cannot delete the disk. I tried to re-attach the disk, ( i believe i used a different name than the original) and it attached just fine. Now the output of consul kv get is:
root@bd-ceph-sd1:~# consul kv get -recurse PetaSAN/Disks
PetaSAN/Disks/00001:disk
PetaSAN/Disks/00001/1:bd-ceph-sd1
PetaSAN/Disks/00001/2:bd-ceph-sd2
PetaSAN/Disks/00001/3:bd-ceph-sd4
PetaSAN/Disks/00001/4:bd-ceph-sd1
PetaSAN/Disks/00001/5:bd-ceph-sd4
PetaSAN/Disks/00001/6:bd-ceph-sd3
PetaSAN/Disks/00001/7:bd-ceph-sd3
PetaSAN/Disks/00001/8:bd-ceph-sd2
I am now attempting to delete it again, but this time i'm going to wait an hour or so before touching anything again. Will update again then.
Glad I can be of some sort of help identifying issues. Really love the prospect of this software, and am willing to help improve it any way possible.
admin
2,930 Posts
April 2, 2018, 12:17 pmQuote from admin on April 2, 2018, 12:17 pmThere are 2 issues you had, based on our testing:
- The "Disk Already Exists" error which prevents you from adding new disks, we cannot reproduce this. If you can reproduce this please let us know. Have you done any manual cli commands that could affect this ? This is a more serious issue than the second problem.
- The "Error Deleting disk" is different issue. It happens if you delete a disk and close the ui before it comes back. A second attempt at deleting the disk will show this error since the disk is being deleted already. For 100TB it took us about 1.5 hours which is what ceph command takes, but from the ui it is too long to wait. We will change the ui to make this an asynchronous job operation so it will show you a "deleting" status also there could be ways in ceph to speed up the delete process.
There are 2 issues you had, based on our testing:
- The "Disk Already Exists" error which prevents you from adding new disks, we cannot reproduce this. If you can reproduce this please let us know. Have you done any manual cli commands that could affect this ? This is a more serious issue than the second problem.
- The "Error Deleting disk" is different issue. It happens if you delete a disk and close the ui before it comes back. A second attempt at deleting the disk will show this error since the disk is being deleted already. For 100TB it took us about 1.5 hours which is what ceph command takes, but from the ui it is too long to wait. We will change the ui to make this an asynchronous job operation so it will show you a "deleting" status also there could be ways in ceph to speed up the delete process.
Last edited on April 2, 2018, 12:19 pm by admin · #13
protocol6v
85 Posts
April 3, 2018, 12:26 pmQuote from protocol6v on April 3, 2018, 12:26 pmI cannot seem to reproduce the Disk already exists error, no matter what I do. I did not do any manual CLI tasks, but I cannot remember if I rebooted any nodes without disabling maintenance items or not. I'm also wondering if it could have been a browser cache issue somehow.
The error deleting disk seems just as you described, felt like a web timeout issue. When I came back to the cluster after almost 48 hours, the disk finally had disappeared.
Will check back if I run into it again. Thanks!
I cannot seem to reproduce the Disk already exists error, no matter what I do. I did not do any manual CLI tasks, but I cannot remember if I rebooted any nodes without disabling maintenance items or not. I'm also wondering if it could have been a browser cache issue somehow.
The error deleting disk seems just as you described, felt like a web timeout issue. When I came back to the cluster after almost 48 hours, the disk finally had disappeared.
Will check back if I run into it again. Thanks!
Pages: 1 2
Error parsing config using cli
admin
2,930 Posts
Quote from admin on March 30, 2018, 1:21 pmIt does look this is a bug, maybe timing related..
If this happens only on large images, what sizes roughly ?
If you wait 20 min after stopping a large disk then delete it, does the problem happen.
When you get the add disk/already exist error, can you please show the output of
consul kv get -recurse PetaSAN/Disks
If you see the id of a disk (such as 00001) you just deleted, try force delete it from the consul system
consul kv delete -recurse PetaSAN/Disks/00001
Does this fix it ? can you add disks now ?
Thanks for your help reporting this..
It does look this is a bug, maybe timing related..
If this happens only on large images, what sizes roughly ?
If you wait 20 min after stopping a large disk then delete it, does the problem happen.
When you get the add disk/already exist error, can you please show the output of
consul kv get -recurse PetaSAN/Disks
If you see the id of a disk (such as 00001) you just deleted, try force delete it from the consul system
consul kv delete -recurse PetaSAN/Disks/00001
Does this fix it ? can you add disks now ?
Thanks for your help reporting this..
protocol6v
85 Posts
Quote from protocol6v on March 30, 2018, 4:06 pmOK, so I can't seem to remove the current 100TB disk that I created which means i can't recreate the "Disk already exists" issue. Meanwhile, i created a small 30GB test and was able to mount, unmount and delete it no problem.
I went back, rebooted all nodes, and the 100TB disk is still listed in the WebUI, but unable to delete. Get the error deleting disk. Consul kv get -recurse PetaSAN/Disks produces no output. Interestingly, after running this command, the WebUI removed the Name, IQN and reports active paths as N/A now.
Then tried "detaching" the disk, which succeeded, but then still cannot delete it. Get the generic "Alert! Error Deleting disk."
Then went back, created a 150GB disk with 8 paths, and was able to start it, stop it and delete it. It took much longer to delete the 150GB disk than it did the 30GB, but did eventually do it.
Again, activated all maintenance settings (set them all to off) rebooted all four nodes, set all back to "on", and the 100TB disk still shows in the list as detached with no name, IQN or path status. Consul kv get -recurse command still has no output, and still cannot delete the disk. I tried to re-attach the disk, ( i believe i used a different name than the original) and it attached just fine. Now the output of consul kv get is:
root@bd-ceph-sd1:~# consul kv get -recurse PetaSAN/Disks
PetaSAN/Disks/00001:disk
PetaSAN/Disks/00001/1:bd-ceph-sd1
PetaSAN/Disks/00001/2:bd-ceph-sd2
PetaSAN/Disks/00001/3:bd-ceph-sd4
PetaSAN/Disks/00001/4:bd-ceph-sd1
PetaSAN/Disks/00001/5:bd-ceph-sd4
PetaSAN/Disks/00001/6:bd-ceph-sd3
PetaSAN/Disks/00001/7:bd-ceph-sd3
PetaSAN/Disks/00001/8:bd-ceph-sd2I am now attempting to delete it again, but this time i'm going to wait an hour or so before touching anything again. Will update again then.
Glad I can be of some sort of help identifying issues. Really love the prospect of this software, and am willing to help improve it any way possible.
OK, so I can't seem to remove the current 100TB disk that I created which means i can't recreate the "Disk already exists" issue. Meanwhile, i created a small 30GB test and was able to mount, unmount and delete it no problem.
I went back, rebooted all nodes, and the 100TB disk is still listed in the WebUI, but unable to delete. Get the error deleting disk. Consul kv get -recurse PetaSAN/Disks produces no output. Interestingly, after running this command, the WebUI removed the Name, IQN and reports active paths as N/A now.
Then tried "detaching" the disk, which succeeded, but then still cannot delete it. Get the generic "Alert! Error Deleting disk."
Then went back, created a 150GB disk with 8 paths, and was able to start it, stop it and delete it. It took much longer to delete the 150GB disk than it did the 30GB, but did eventually do it.
Again, activated all maintenance settings (set them all to off) rebooted all four nodes, set all back to "on", and the 100TB disk still shows in the list as detached with no name, IQN or path status. Consul kv get -recurse command still has no output, and still cannot delete the disk. I tried to re-attach the disk, ( i believe i used a different name than the original) and it attached just fine. Now the output of consul kv get is:
root@bd-ceph-sd1:~# consul kv get -recurse PetaSAN/Disks
PetaSAN/Disks/00001:disk
PetaSAN/Disks/00001/1:bd-ceph-sd1
PetaSAN/Disks/00001/2:bd-ceph-sd2
PetaSAN/Disks/00001/3:bd-ceph-sd4
PetaSAN/Disks/00001/4:bd-ceph-sd1
PetaSAN/Disks/00001/5:bd-ceph-sd4
PetaSAN/Disks/00001/6:bd-ceph-sd3
PetaSAN/Disks/00001/7:bd-ceph-sd3
PetaSAN/Disks/00001/8:bd-ceph-sd2
I am now attempting to delete it again, but this time i'm going to wait an hour or so before touching anything again. Will update again then.
Glad I can be of some sort of help identifying issues. Really love the prospect of this software, and am willing to help improve it any way possible.
admin
2,930 Posts
Quote from admin on April 2, 2018, 12:17 pmThere are 2 issues you had, based on our testing:
- The "Disk Already Exists" error which prevents you from adding new disks, we cannot reproduce this. If you can reproduce this please let us know. Have you done any manual cli commands that could affect this ? This is a more serious issue than the second problem.
- The "Error Deleting disk" is different issue. It happens if you delete a disk and close the ui before it comes back. A second attempt at deleting the disk will show this error since the disk is being deleted already. For 100TB it took us about 1.5 hours which is what ceph command takes, but from the ui it is too long to wait. We will change the ui to make this an asynchronous job operation so it will show you a "deleting" status also there could be ways in ceph to speed up the delete process.
There are 2 issues you had, based on our testing:
- The "Disk Already Exists" error which prevents you from adding new disks, we cannot reproduce this. If you can reproduce this please let us know. Have you done any manual cli commands that could affect this ? This is a more serious issue than the second problem.
- The "Error Deleting disk" is different issue. It happens if you delete a disk and close the ui before it comes back. A second attempt at deleting the disk will show this error since the disk is being deleted already. For 100TB it took us about 1.5 hours which is what ceph command takes, but from the ui it is too long to wait. We will change the ui to make this an asynchronous job operation so it will show you a "deleting" status also there could be ways in ceph to speed up the delete process.
protocol6v
85 Posts
Quote from protocol6v on April 3, 2018, 12:26 pmI cannot seem to reproduce the Disk already exists error, no matter what I do. I did not do any manual CLI tasks, but I cannot remember if I rebooted any nodes without disabling maintenance items or not. I'm also wondering if it could have been a browser cache issue somehow.
The error deleting disk seems just as you described, felt like a web timeout issue. When I came back to the cluster after almost 48 hours, the disk finally had disappeared.
Will check back if I run into it again. Thanks!
I cannot seem to reproduce the Disk already exists error, no matter what I do. I did not do any manual CLI tasks, but I cannot remember if I rebooted any nodes without disabling maintenance items or not. I'm also wondering if it could have been a browser cache issue somehow.
The error deleting disk seems just as you described, felt like a web timeout issue. When I came back to the cluster after almost 48 hours, the disk finally had disappeared.
Will check back if I run into it again. Thanks!