iscsi ips distribution
maxthetor
24 Posts
June 10, 2017, 1:03 amQuote from maxthetor on June 10, 2017, 1:03 amI went to check how the distribution of the iscsi disks in the cluster was and the petasan system informs that the two ips are in the same host.
Active Paths
Disk 00002
IP
Assigned Node
10.0.2.101
san01
10.0.3.101
san01
How to get around this?
---------------------------------------------------------------------------------------------------------------------------------------------
fui verificar como estava a distribuicao dos discos iscsi no cluster e o sistema petasan informa que os dois ips estao no mesmo host.
como contornar isso?
I went to check how the distribution of the iscsi disks in the cluster was and the petasan system informs that the two ips are in the same host.
Active Paths
Disk 00002
IP
Assigned Node
10.0.2.101
san01
10.0.3.101
san01
How to get around this?
---------------------------------------------------------------------------------------------------------------------------------------------
fui verificar como estava a distribuicao dos discos iscsi no cluster e o sistema petasan informa que os dois ips estao no mesmo host.
como contornar isso?
Last edited on June 10, 2017, 3:47 am · #1
admin
2,930 Posts
June 10, 2017, 10:15 amQuote from admin on June 10, 2017, 10:15 amIn normal cases when all nodes have the same number of paths and the load is similar, the ips should be distributed evenly. If you already have nodes that have more paths than others, the ones with lesser paths will be favored. This will happen for new nodes added or an older nodes that were restarted since their paths would have been taken. Note that assigning more than one path for a lun to a node is perfectly acceptable, in fact this is what traditional SANs do.
We do automated tests for ip distribution on 500 disks and their distribution balancing accuracy is about 98%, we can achieve better results at the expense of making the response time slower. The distribution logic is based on how even the existing assignment is as well as trying to distribute paths for the same lun on different nodes.
Currently if you do not like the distribution on a disk you can stop it and restart it. In the future we plan to allow the admin to manually distribute paths to specific nodes as well as have the system dynamically switch paths at runtime based on load stats.
What i suggest for your case is to first check if all you nodes are loaded equally before adding a new disk. If this is the case and you still get uneven distribution, then please let me know the output of:
ceph status --cluster CLUSTER_NAME
if you have an issue with one of the monitors or sometimes if the nodes are out of clock sync this may occur.
In normal cases when all nodes have the same number of paths and the load is similar, the ips should be distributed evenly. If you already have nodes that have more paths than others, the ones with lesser paths will be favored. This will happen for new nodes added or an older nodes that were restarted since their paths would have been taken. Note that assigning more than one path for a lun to a node is perfectly acceptable, in fact this is what traditional SANs do.
We do automated tests for ip distribution on 500 disks and their distribution balancing accuracy is about 98%, we can achieve better results at the expense of making the response time slower. The distribution logic is based on how even the existing assignment is as well as trying to distribute paths for the same lun on different nodes.
Currently if you do not like the distribution on a disk you can stop it and restart it. In the future we plan to allow the admin to manually distribute paths to specific nodes as well as have the system dynamically switch paths at runtime based on load stats.
What i suggest for your case is to first check if all you nodes are loaded equally before adding a new disk. If this is the case and you still get uneven distribution, then please let me know the output of:
ceph status --cluster CLUSTER_NAME
if you have an issue with one of the monitors or sometimes if the nodes are out of clock sync this may occur.
Last edited on June 10, 2017, 10:39 am · #2
maxthetor
24 Posts
June 10, 2017, 1:05 pmQuote from maxthetor on June 10, 2017, 1:05 pm
>>In normal cases when all nodes have the same number of paths and the load is similar, the ips should be distributed evenly. If you already have nodes >>that have more paths than others, the ones with lesser paths will be favored. This will happen for new nodes added or an older nodes that were restarted >>since their paths would have been taken. Note that assigning more than one path for a lun to a node is perfectly acceptable, in fact this is what >>traditional SANs do.
my setup have 3 nodes, 3 osd.
Only two paths are possible, because there are only two iscsi subnet, right?
>>We do automated tests for ip distribution on 500 disks and their distribution balancing accuracy is about 98%, we can achieve better results at the >>expense of making the response time slower. The distribution logic is based on how even the existing assignment is as well as trying to distribute paths >>for the same lun on different nodes.
I believe that here should have a rule so that a node does not assume two ips of the same lun. If this is not possible because there is no other online node or any other reason, it should generate a warning on the dashboard / iscsi disk list, and a button to redistribute the addresses manually.
>>Currently if you do not like the distribution on a disk you can stop it and restart it. In the future we plan to allow the admin to manually distribute paths >>to specific nodes as well as have the system dynamically switch paths at runtime based on load stats.
The idea of stopping and restarting lun iscsi I did not like because I had problems with vmware considering the dead lun on some hosts and the virtual machines crashing and corrupting the disks. I lost my vcenter server by trying this.
These new features would be great
>>What i suggest for your case is to first check if all you nodes are loaded equally before adding a new disk. If this is the case and you still get uneven >>distribution, then please let me know the output of:
This result I realized when I was doing some failover tests restarting the nodes. It was never at the time of the creation of the disks.
>>if you have an issue with one of the monitors or sometimes if the nodes are out of clock sync this may occur.
Exactly, this occurred by restarting nodes.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> eu acredito que aqui deveria ter um regra para que um node nao assuma dois ips da mesma lun. caso nao seja possivel isto por nao existir outros node online ou >>qualquer outro motivo, deveria gerar um warning na dashboard/iscsi disk list, e um botao para redistribuir os enderecos manualmente.
>>a ideia de parar e reiniciar a lun iscsi eu nao gostei, porque eu tive problemas com o vmware considerando a "lun" morta em alguns hosts e as maquinas virtuais >>dando crash e corrompendo os discos. Eu perdi meu servidor vcenter tentando isso.
>>esses novos recursos seriam otimos.
>>esse resultado eu percebi quando estava fazendo alguns testes de failover reiniciando os nodes. nunca foi no momento da criacao dos disks.
>>exato, isso ocorreu reiniciando nodes.
>>In normal cases when all nodes have the same number of paths and the load is similar, the ips should be distributed evenly. If you already have nodes >>that have more paths than others, the ones with lesser paths will be favored. This will happen for new nodes added or an older nodes that were restarted >>since their paths would have been taken. Note that assigning more than one path for a lun to a node is perfectly acceptable, in fact this is what >>traditional SANs do.
my setup have 3 nodes, 3 osd.
Only two paths are possible, because there are only two iscsi subnet, right?
>>We do automated tests for ip distribution on 500 disks and their distribution balancing accuracy is about 98%, we can achieve better results at the >>expense of making the response time slower. The distribution logic is based on how even the existing assignment is as well as trying to distribute paths >>for the same lun on different nodes.
I believe that here should have a rule so that a node does not assume two ips of the same lun. If this is not possible because there is no other online node or any other reason, it should generate a warning on the dashboard / iscsi disk list, and a button to redistribute the addresses manually.
>>Currently if you do not like the distribution on a disk you can stop it and restart it. In the future we plan to allow the admin to manually distribute paths >>to specific nodes as well as have the system dynamically switch paths at runtime based on load stats.
The idea of stopping and restarting lun iscsi I did not like because I had problems with vmware considering the dead lun on some hosts and the virtual machines crashing and corrupting the disks. I lost my vcenter server by trying this.
These new features would be great
>>What i suggest for your case is to first check if all you nodes are loaded equally before adding a new disk. If this is the case and you still get uneven >>distribution, then please let me know the output of:
This result I realized when I was doing some failover tests restarting the nodes. It was never at the time of the creation of the disks.
>>if you have an issue with one of the monitors or sometimes if the nodes are out of clock sync this may occur.
Exactly, this occurred by restarting nodes.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> eu acredito que aqui deveria ter um regra para que um node nao assuma dois ips da mesma lun. caso nao seja possivel isto por nao existir outros node online ou >>qualquer outro motivo, deveria gerar um warning na dashboard/iscsi disk list, e um botao para redistribuir os enderecos manualmente.
>>a ideia de parar e reiniciar a lun iscsi eu nao gostei, porque eu tive problemas com o vmware considerando a "lun" morta em alguns hosts e as maquinas virtuais >>dando crash e corrompendo os discos. Eu perdi meu servidor vcenter tentando isso.
>>esses novos recursos seriam otimos.
>>esse resultado eu percebi quando estava fazendo alguns testes de failover reiniciando os nodes. nunca foi no momento da criacao dos disks.
>>exato, isso ocorreu reiniciando nodes.
admin
2,930 Posts
June 10, 2017, 2:11 pmQuote from admin on June 10, 2017, 2:11 pmHi,
If i understand correctly, this was because you had restarted some nodes so the when you added your disk, some nodes already had paths more that others.
Only two paths are possible, because there are only two iscsi subnet, right?
Actually we allow up to 8 paths per lun, they will be distributed 4 paths on subnet 1 and 4 on subnet 2.
I believe that here should have a rule so that a node does not assume two ips of the same lun.
We try to avoid this by putting a weighted penalty but we do not forbid it, there is also a weighted penalty for not taking too many paths vs other nodes.
Hi,
If i understand correctly, this was because you had restarted some nodes so the when you added your disk, some nodes already had paths more that others.
Only two paths are possible, because there are only two iscsi subnet, right?
Actually we allow up to 8 paths per lun, they will be distributed 4 paths on subnet 1 and 4 on subnet 2.
I believe that here should have a rule so that a node does not assume two ips of the same lun.
We try to avoid this by putting a weighted penalty but we do not forbid it, there is also a weighted penalty for not taking too many paths vs other nodes.
Last edited on June 10, 2017, 2:21 pm · #4
maxthetor
24 Posts
June 10, 2017, 3:56 pmQuote from maxthetor on June 10, 2017, 3:56 pm
>>Quote from admin on June 10, 2017, 2:11 pm
Hi,
If i understand correctly, this was because you had restarted some nodes so the when you added your disk, some nodes already had paths more that others.
Yes.
>>We try to avoid this by putting a weighted penalty but we do not forbid it, there is also a weighted penalty for not taking too many paths vs other nodes.
Do you have any idea how to get around this in small clusters? 3,4 nodes.
I believe this does not happen in large clusters,> 16 nodes.
>>Quote from admin on June 10, 2017, 2:11 pm
Hi,
If i understand correctly, this was because you had restarted some nodes so the when you added your disk, some nodes already had paths more that others.
Yes.
>>We try to avoid this by putting a weighted penalty but we do not forbid it, there is also a weighted penalty for not taking too many paths vs other nodes.
Do you have any idea how to get around this in small clusters? 3,4 nodes.
I believe this does not happen in large clusters,> 16 nodes.
admin
2,930 Posts
June 10, 2017, 5:48 pmQuote from admin on June 10, 2017, 5:48 pmThis balancing between not assigning ips to the same host vs a host taking above than average load should work well for small and large clusters. Just to understand your case better, can you give me details on how many paths each node had before you added the new 2 path disk which got assigned to one node. If there was a large difference in load assigned to the other 2 nodes ( caused by the restart you did ) then the assignment may be correct, else please also send me the output of:
ceph status --cluster CLUSTER_NAME
One thing you can do until we look into this, create your new disk with 8 paths which will surely get distributed across many nodes, then you do not have to use all of them, choose 2 of them that correspond to the nodes you want and manually configure your ESXi adapter with these 2 values (do not use auto discover).
This balancing between not assigning ips to the same host vs a host taking above than average load should work well for small and large clusters. Just to understand your case better, can you give me details on how many paths each node had before you added the new 2 path disk which got assigned to one node. If there was a large difference in load assigned to the other 2 nodes ( caused by the restart you did ) then the assignment may be correct, else please also send me the output of:
ceph status --cluster CLUSTER_NAME
One thing you can do until we look into this, create your new disk with 8 paths which will surely get distributed across many nodes, then you do not have to use all of them, choose 2 of them that correspond to the nodes you want and manually configure your ESXi adapter with these 2 values (do not use auto discover).
iscsi ips distribution
maxthetor
24 Posts
Quote from maxthetor on June 10, 2017, 1:03 amI went to check how the distribution of the iscsi disks in the cluster was and the petasan system informs that the two ips are in the same host.
Active Paths
Disk 00002
IP Assigned Node 10.0.2.101 san01 10.0.3.101 san01 How to get around this?
---------------------------------------------------------------------------------------------------------------------------------------------
fui verificar como estava a distribuicao dos discos iscsi no cluster e o sistema petasan informa que os dois ips estao no mesmo host.
como contornar isso?
I went to check how the distribution of the iscsi disks in the cluster was and the petasan system informs that the two ips are in the same host.
Active Paths
Disk 00002
IP | Assigned Node |
---|---|
10.0.2.101 | san01 |
10.0.3.101 | san01 |
How to get around this?
---------------------------------------------------------------------------------------------------------------------------------------------
fui verificar como estava a distribuicao dos discos iscsi no cluster e o sistema petasan informa que os dois ips estao no mesmo host.
como contornar isso?
admin
2,930 Posts
Quote from admin on June 10, 2017, 10:15 amIn normal cases when all nodes have the same number of paths and the load is similar, the ips should be distributed evenly. If you already have nodes that have more paths than others, the ones with lesser paths will be favored. This will happen for new nodes added or an older nodes that were restarted since their paths would have been taken. Note that assigning more than one path for a lun to a node is perfectly acceptable, in fact this is what traditional SANs do.
We do automated tests for ip distribution on 500 disks and their distribution balancing accuracy is about 98%, we can achieve better results at the expense of making the response time slower. The distribution logic is based on how even the existing assignment is as well as trying to distribute paths for the same lun on different nodes.
Currently if you do not like the distribution on a disk you can stop it and restart it. In the future we plan to allow the admin to manually distribute paths to specific nodes as well as have the system dynamically switch paths at runtime based on load stats.
What i suggest for your case is to first check if all you nodes are loaded equally before adding a new disk. If this is the case and you still get uneven distribution, then please let me know the output of:
ceph status --cluster CLUSTER_NAME
if you have an issue with one of the monitors or sometimes if the nodes are out of clock sync this may occur.
In normal cases when all nodes have the same number of paths and the load is similar, the ips should be distributed evenly. If you already have nodes that have more paths than others, the ones with lesser paths will be favored. This will happen for new nodes added or an older nodes that were restarted since their paths would have been taken. Note that assigning more than one path for a lun to a node is perfectly acceptable, in fact this is what traditional SANs do.
We do automated tests for ip distribution on 500 disks and their distribution balancing accuracy is about 98%, we can achieve better results at the expense of making the response time slower. The distribution logic is based on how even the existing assignment is as well as trying to distribute paths for the same lun on different nodes.
Currently if you do not like the distribution on a disk you can stop it and restart it. In the future we plan to allow the admin to manually distribute paths to specific nodes as well as have the system dynamically switch paths at runtime based on load stats.
What i suggest for your case is to first check if all you nodes are loaded equally before adding a new disk. If this is the case and you still get uneven distribution, then please let me know the output of:
ceph status --cluster CLUSTER_NAME
if you have an issue with one of the monitors or sometimes if the nodes are out of clock sync this may occur.
maxthetor
24 Posts
Quote from maxthetor on June 10, 2017, 1:05 pm>>In normal cases when all nodes have the same number of paths and the load is similar, the ips should be distributed evenly. If you already have nodes >>that have more paths than others, the ones with lesser paths will be favored. This will happen for new nodes added or an older nodes that were restarted >>since their paths would have been taken. Note that assigning more than one path for a lun to a node is perfectly acceptable, in fact this is what >>traditional SANs do.my setup have 3 nodes, 3 osd.Only two paths are possible, because there are only two iscsi subnet, right?>>We do automated tests for ip distribution on 500 disks and their distribution balancing accuracy is about 98%, we can achieve better results at the >>expense of making the response time slower. The distribution logic is based on how even the existing assignment is as well as trying to distribute paths >>for the same lun on different nodes.
I believe that here should have a rule so that a node does not assume two ips of the same lun. If this is not possible because there is no other online node or any other reason, it should generate a warning on the dashboard / iscsi disk list, and a button to redistribute the addresses manually.
>>Currently if you do not like the distribution on a disk you can stop it and restart it. In the future we plan to allow the admin to manually distribute paths >>to specific nodes as well as have the system dynamically switch paths at runtime based on load stats.
The idea of stopping and restarting lun iscsi I did not like because I had problems with vmware considering the dead lun on some hosts and the virtual machines crashing and corrupting the disks. I lost my vcenter server by trying this.
These new features would be great
>>What i suggest for your case is to first check if all you nodes are loaded equally before adding a new disk. If this is the case and you still get uneven >>distribution, then please let me know the output of:
This result I realized when I was doing some failover tests restarting the nodes. It was never at the time of the creation of the disks.
>>if you have an issue with one of the monitors or sometimes if the nodes are out of clock sync this may occur.
Exactly, this occurred by restarting nodes.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> eu acredito que aqui deveria ter um regra para que um node nao assuma dois ips da mesma lun. caso nao seja possivel isto por nao existir outros node online ou >>qualquer outro motivo, deveria gerar um warning na dashboard/iscsi disk list, e um botao para redistribuir os enderecos manualmente.
>>a ideia de parar e reiniciar a lun iscsi eu nao gostei, porque eu tive problemas com o vmware considerando a "lun" morta em alguns hosts e as maquinas virtuais >>dando crash e corrompendo os discos. Eu perdi meu servidor vcenter tentando isso.
>>esses novos recursos seriam otimos.>>esse resultado eu percebi quando estava fazendo alguns testes de failover reiniciando os nodes. nunca foi no momento da criacao dos disks.
>>exato, isso ocorreu reiniciando nodes.
>>In normal cases when all nodes have the same number of paths and the load is similar, the ips should be distributed evenly. If you already have nodes >>that have more paths than others, the ones with lesser paths will be favored. This will happen for new nodes added or an older nodes that were restarted >>since their paths would have been taken. Note that assigning more than one path for a lun to a node is perfectly acceptable, in fact this is what >>traditional SANs do.my setup have 3 nodes, 3 osd.Only two paths are possible, because there are only two iscsi subnet, right?>>We do automated tests for ip distribution on 500 disks and their distribution balancing accuracy is about 98%, we can achieve better results at the >>expense of making the response time slower. The distribution logic is based on how even the existing assignment is as well as trying to distribute paths >>for the same lun on different nodes.
I believe that here should have a rule so that a node does not assume two ips of the same lun. If this is not possible because there is no other online node or any other reason, it should generate a warning on the dashboard / iscsi disk list, and a button to redistribute the addresses manually.
>>Currently if you do not like the distribution on a disk you can stop it and restart it. In the future we plan to allow the admin to manually distribute paths >>to specific nodes as well as have the system dynamically switch paths at runtime based on load stats.
The idea of stopping and restarting lun iscsi I did not like because I had problems with vmware considering the dead lun on some hosts and the virtual machines crashing and corrupting the disks. I lost my vcenter server by trying this.
These new features would be great
>>What i suggest for your case is to first check if all you nodes are loaded equally before adding a new disk. If this is the case and you still get uneven >>distribution, then please let me know the output of:
This result I realized when I was doing some failover tests restarting the nodes. It was never at the time of the creation of the disks.
>>if you have an issue with one of the monitors or sometimes if the nodes are out of clock sync this may occur.
Exactly, this occurred by restarting nodes.
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> eu acredito que aqui deveria ter um regra para que um node nao assuma dois ips da mesma lun. caso nao seja possivel isto por nao existir outros node online ou >>qualquer outro motivo, deveria gerar um warning na dashboard/iscsi disk list, e um botao para redistribuir os enderecos manualmente.
>>a ideia de parar e reiniciar a lun iscsi eu nao gostei, porque eu tive problemas com o vmware considerando a "lun" morta em alguns hosts e as maquinas virtuais >>dando crash e corrompendo os discos. Eu perdi meu servidor vcenter tentando isso.
>>esses novos recursos seriam otimos.
>>esse resultado eu percebi quando estava fazendo alguns testes de failover reiniciando os nodes. nunca foi no momento da criacao dos disks.
>>exato, isso ocorreu reiniciando nodes.
admin
2,930 Posts
Quote from admin on June 10, 2017, 2:11 pmHi,
If i understand correctly, this was because you had restarted some nodes so the when you added your disk, some nodes already had paths more that others.
Only two paths are possible, because there are only two iscsi subnet, right?
Actually we allow up to 8 paths per lun, they will be distributed 4 paths on subnet 1 and 4 on subnet 2.
I believe that here should have a rule so that a node does not assume two ips of the same lun.
We try to avoid this by putting a weighted penalty but we do not forbid it, there is also a weighted penalty for not taking too many paths vs other nodes.
Hi,
If i understand correctly, this was because you had restarted some nodes so the when you added your disk, some nodes already had paths more that others.
Only two paths are possible, because there are only two iscsi subnet, right?
Actually we allow up to 8 paths per lun, they will be distributed 4 paths on subnet 1 and 4 on subnet 2.
I believe that here should have a rule so that a node does not assume two ips of the same lun.
We try to avoid this by putting a weighted penalty but we do not forbid it, there is also a weighted penalty for not taking too many paths vs other nodes.
maxthetor
24 Posts
Quote from maxthetor on June 10, 2017, 3:56 pm>>Quote from admin on June 10, 2017, 2:11 pmHi,
If i understand correctly, this was because you had restarted some nodes so the when you added your disk, some nodes already had paths more that others.
Yes.
>>We try to avoid this by putting a weighted penalty but we do not forbid it, there is also a weighted penalty for not taking too many paths vs other nodes.
Do you have any idea how to get around this in small clusters? 3,4 nodes.
I believe this does not happen in large clusters,> 16 nodes.
>>Quote from admin on June 10, 2017, 2:11 pmHi,
If i understand correctly, this was because you had restarted some nodes so the when you added your disk, some nodes already had paths more that others.
Yes.
>>We try to avoid this by putting a weighted penalty but we do not forbid it, there is also a weighted penalty for not taking too many paths vs other nodes.
Do you have any idea how to get around this in small clusters? 3,4 nodes.
I believe this does not happen in large clusters,> 16 nodes.
admin
2,930 Posts
Quote from admin on June 10, 2017, 5:48 pmThis balancing between not assigning ips to the same host vs a host taking above than average load should work well for small and large clusters. Just to understand your case better, can you give me details on how many paths each node had before you added the new 2 path disk which got assigned to one node. If there was a large difference in load assigned to the other 2 nodes ( caused by the restart you did ) then the assignment may be correct, else please also send me the output of:
ceph status --cluster CLUSTER_NAME
One thing you can do until we look into this, create your new disk with 8 paths which will surely get distributed across many nodes, then you do not have to use all of them, choose 2 of them that correspond to the nodes you want and manually configure your ESXi adapter with these 2 values (do not use auto discover).
This balancing between not assigning ips to the same host vs a host taking above than average load should work well for small and large clusters. Just to understand your case better, can you give me details on how many paths each node had before you added the new 2 path disk which got assigned to one node. If there was a large difference in load assigned to the other 2 nodes ( caused by the restart you did ) then the assignment may be correct, else please also send me the output of:
ceph status --cluster CLUSTER_NAME
One thing you can do until we look into this, create your new disk with 8 paths which will surely get distributed across many nodes, then you do not have to use all of them, choose 2 of them that correspond to the nodes you want and manually configure your ESXi adapter with these 2 values (do not use auto discover).