Networking considerations
Ste
125 Posts
April 25, 2020, 6:31 pmQuote from Ste on April 25, 2020, 6:31 pmIn my datacenter all connections between servers and storage are on a dedicated 10Gb VLAN,
so VMWare interfaces that connect to the Datastore SAN and SAN interfaces themselves are all
10Gb SFP+. The same philosophy I adopted for the iSCSI interfaces for PetaSAN,
assigning my two 10Gb ports to iSCSI subnet 1 and 2, while management and backend
networks use the other two 1Gb ports.
After studying a bit deeper how Ceph/petasan works, I'm afraid that this configuration
is not the best choice. The standard CRUSH hyerarchy for small clusters is root-host-osd,
with 3 nodes and replica 3, every PG is replicated on all 3 hosts. Redundancy is also used
for load balancing when reading, this means that data are extracted from all 3 nodes.
When an iSCSI volume is attached to a client, its IP is mapped on one petasan host only
and the client talks directly only with that node (say node1). But node1 retrieves/sends data
also from/to node2 and node3, using the backend network.
Much more important, when recovering/rebalancing, a lot of data is transmitted from node to node
across the backend network.
All this preamble to say that, in my opinion, is much more important for the backend network to
be on 10Gb interface. For this reason I'm thinking to give up iSCSI redundancy and use one of the
two 10Gb ports for the backend network. Yes, that single iSCSI port can fail and I will loose
connection, but an ethernet port "maybe" sometimes fails, most of the time even not, while backend
network is used every single moment in a cluster life.
What do you think about this considerations, whould this change allow an improvement in
cluster performance ?
And in case, would it be possible to modify network setup now that the cluster in up and running,
or would this require a complete re-install ?
Thanks for any suggestion and bye. S.
In my datacenter all connections between servers and storage are on a dedicated 10Gb VLAN,
so VMWare interfaces that connect to the Datastore SAN and SAN interfaces themselves are all
10Gb SFP+. The same philosophy I adopted for the iSCSI interfaces for PetaSAN,
assigning my two 10Gb ports to iSCSI subnet 1 and 2, while management and backend
networks use the other two 1Gb ports.
After studying a bit deeper how Ceph/petasan works, I'm afraid that this configuration
is not the best choice. The standard CRUSH hyerarchy for small clusters is root-host-osd,
with 3 nodes and replica 3, every PG is replicated on all 3 hosts. Redundancy is also used
for load balancing when reading, this means that data are extracted from all 3 nodes.
When an iSCSI volume is attached to a client, its IP is mapped on one petasan host only
and the client talks directly only with that node (say node1). But node1 retrieves/sends data
also from/to node2 and node3, using the backend network.
Much more important, when recovering/rebalancing, a lot of data is transmitted from node to node
across the backend network.
All this preamble to say that, in my opinion, is much more important for the backend network to
be on 10Gb interface. For this reason I'm thinking to give up iSCSI redundancy and use one of the
two 10Gb ports for the backend network. Yes, that single iSCSI port can fail and I will loose
connection, but an ethernet port "maybe" sometimes fails, most of the time even not, while backend
network is used every single moment in a cluster life.
What do you think about this considerations, whould this change allow an improvement in
cluster performance ?
And in case, would it be possible to modify network setup now that the cluster in up and running,
or would this require a complete re-install ?
Thanks for any suggestion and bye. S.
admin
2,930 Posts
April 25, 2020, 7:41 pmQuote from admin on April 25, 2020, 7:41 pmYes, the backend network should be at least as fast at the iSCSI networks combined. Any writes you do on the iSCSI networks, you will be written x number of replicas on the backend network + as you mentioned if there is any recovery.
One idea is to bond the two 10Gb and put all networks on that bond, or you can keep the management on the 1Gb.
You can reconfigure the cluster networks, this is easily done by editing /opt/petasan/config/cluster_info.json and restarting. naturally there will be downtime.
Yes, the backend network should be at least as fast at the iSCSI networks combined. Any writes you do on the iSCSI networks, you will be written x number of replicas on the backend network + as you mentioned if there is any recovery.
One idea is to bond the two 10Gb and put all networks on that bond, or you can keep the management on the 1Gb.
You can reconfigure the cluster networks, this is easily done by editing /opt/petasan/config/cluster_info.json and restarting. naturally there will be downtime.
Last edited on April 25, 2020, 7:43 pm by admin · #2
Ste
125 Posts
April 25, 2020, 8:00 pmQuote from Ste on April 25, 2020, 8:00 pmThank you Admin for your reply, I certainly will do this change, because one of the main purposes of this cluster is, other than having storage space, to make experiments and get experience to use for future bigger ones. 😉
Is there a particular benchmark I can run before and after the network change, to have some numbers to share and think on ?
Thank you Admin for your reply, I certainly will do this change, because one of the main purposes of this cluster is, other than having storage space, to make experiments and get experience to use for future bigger ones. 😉
Is there a particular benchmark I can run before and after the network change, to have some numbers to share and think on ?
admin
2,930 Posts
April 25, 2020, 8:23 pmQuote from admin on April 25, 2020, 8:23 pmfrom the benchmark page, you can run the throughput and iops tests.
from the benchmark page, you can run the throughput and iops tests.
Ste
125 Posts
April 26, 2020, 5:39 pmQuote from Ste on April 26, 2020, 5:39 pm
Quote from admin on April 25, 2020, 7:41 pm
You can reconfigure the cluster networks, this is easily done by editing /opt/petasan/config/cluster_info.json and restarting. naturally there will be downtime.
I guess it is better to set all maintenace flags to off in the web interface, to avoid recovery/rebalance during reconfiguration, is it correct ?
Quote from admin on April 25, 2020, 7:41 pm
You can reconfigure the cluster networks, this is easily done by editing /opt/petasan/config/cluster_info.json and restarting. naturally there will be downtime.
I guess it is better to set all maintenace flags to off in the web interface, to avoid recovery/rebalance during reconfiguration, is it correct ?
admin
2,930 Posts
April 26, 2020, 7:46 pmQuote from admin on April 26, 2020, 7:46 pmI would say it is not needed. You need to copy the cluster_info.json file to all nodes, stop all nodes, adjust any switch settings if needed for bonds/vlans, then restart all nodes..it is an all down/all up step, there will not be any rebalance happening.
instead of editing the cluster_info.json file, although you can, i recommend you have it generated by installing PetaSAN on a single vm node and deploy a new temporary cluster specifying your network settings, then grab the generated config file.
I would say it is not needed. You need to copy the cluster_info.json file to all nodes, stop all nodes, adjust any switch settings if needed for bonds/vlans, then restart all nodes..it is an all down/all up step, there will not be any rebalance happening.
instead of editing the cluster_info.json file, although you can, i recommend you have it generated by installing PetaSAN on a single vm node and deploy a new temporary cluster specifying your network settings, then grab the generated config file.
Ste
125 Posts
April 27, 2020, 11:56 amQuote from Ste on April 27, 2020, 11:56 amNetwork config successfully changed 😉 Thanks.
Network config successfully changed 😉 Thanks.
Networking considerations
Ste
125 Posts
Quote from Ste on April 25, 2020, 6:31 pmIn my datacenter all connections between servers and storage are on a dedicated 10Gb VLAN,
so VMWare interfaces that connect to the Datastore SAN and SAN interfaces themselves are all
10Gb SFP+. The same philosophy I adopted for the iSCSI interfaces for PetaSAN,
assigning my two 10Gb ports to iSCSI subnet 1 and 2, while management and backend
networks use the other two 1Gb ports.After studying a bit deeper how Ceph/petasan works, I'm afraid that this configuration
is not the best choice. The standard CRUSH hyerarchy for small clusters is root-host-osd,
with 3 nodes and replica 3, every PG is replicated on all 3 hosts. Redundancy is also used
for load balancing when reading, this means that data are extracted from all 3 nodes.
When an iSCSI volume is attached to a client, its IP is mapped on one petasan host only
and the client talks directly only with that node (say node1). But node1 retrieves/sends data
also from/to node2 and node3, using the backend network.
Much more important, when recovering/rebalancing, a lot of data is transmitted from node to node
across the backend network.All this preamble to say that, in my opinion, is much more important for the backend network to
be on 10Gb interface. For this reason I'm thinking to give up iSCSI redundancy and use one of the
two 10Gb ports for the backend network. Yes, that single iSCSI port can fail and I will loose
connection, but an ethernet port "maybe" sometimes fails, most of the time even not, while backend
network is used every single moment in a cluster life.What do you think about this considerations, whould this change allow an improvement in
cluster performance ?
And in case, would it be possible to modify network setup now that the cluster in up and running,
or would this require a complete re-install ?Thanks for any suggestion and bye. S.
In my datacenter all connections between servers and storage are on a dedicated 10Gb VLAN,
so VMWare interfaces that connect to the Datastore SAN and SAN interfaces themselves are all
10Gb SFP+. The same philosophy I adopted for the iSCSI interfaces for PetaSAN,
assigning my two 10Gb ports to iSCSI subnet 1 and 2, while management and backend
networks use the other two 1Gb ports.
After studying a bit deeper how Ceph/petasan works, I'm afraid that this configuration
is not the best choice. The standard CRUSH hyerarchy for small clusters is root-host-osd,
with 3 nodes and replica 3, every PG is replicated on all 3 hosts. Redundancy is also used
for load balancing when reading, this means that data are extracted from all 3 nodes.
When an iSCSI volume is attached to a client, its IP is mapped on one petasan host only
and the client talks directly only with that node (say node1). But node1 retrieves/sends data
also from/to node2 and node3, using the backend network.
Much more important, when recovering/rebalancing, a lot of data is transmitted from node to node
across the backend network.
All this preamble to say that, in my opinion, is much more important for the backend network to
be on 10Gb interface. For this reason I'm thinking to give up iSCSI redundancy and use one of the
two 10Gb ports for the backend network. Yes, that single iSCSI port can fail and I will loose
connection, but an ethernet port "maybe" sometimes fails, most of the time even not, while backend
network is used every single moment in a cluster life.
What do you think about this considerations, whould this change allow an improvement in
cluster performance ?
And in case, would it be possible to modify network setup now that the cluster in up and running,
or would this require a complete re-install ?
Thanks for any suggestion and bye. S.
admin
2,930 Posts
Quote from admin on April 25, 2020, 7:41 pmYes, the backend network should be at least as fast at the iSCSI networks combined. Any writes you do on the iSCSI networks, you will be written x number of replicas on the backend network + as you mentioned if there is any recovery.
One idea is to bond the two 10Gb and put all networks on that bond, or you can keep the management on the 1Gb.
You can reconfigure the cluster networks, this is easily done by editing /opt/petasan/config/cluster_info.json and restarting. naturally there will be downtime.
Yes, the backend network should be at least as fast at the iSCSI networks combined. Any writes you do on the iSCSI networks, you will be written x number of replicas on the backend network + as you mentioned if there is any recovery.
One idea is to bond the two 10Gb and put all networks on that bond, or you can keep the management on the 1Gb.
You can reconfigure the cluster networks, this is easily done by editing /opt/petasan/config/cluster_info.json and restarting. naturally there will be downtime.
Ste
125 Posts
Quote from Ste on April 25, 2020, 8:00 pmThank you Admin for your reply, I certainly will do this change, because one of the main purposes of this cluster is, other than having storage space, to make experiments and get experience to use for future bigger ones. 😉
Is there a particular benchmark I can run before and after the network change, to have some numbers to share and think on ?
Thank you Admin for your reply, I certainly will do this change, because one of the main purposes of this cluster is, other than having storage space, to make experiments and get experience to use for future bigger ones. 😉
Is there a particular benchmark I can run before and after the network change, to have some numbers to share and think on ?
admin
2,930 Posts
Quote from admin on April 25, 2020, 8:23 pmfrom the benchmark page, you can run the throughput and iops tests.
from the benchmark page, you can run the throughput and iops tests.
Ste
125 Posts
Quote from Ste on April 26, 2020, 5:39 pmQuote from admin on April 25, 2020, 7:41 pmYou can reconfigure the cluster networks, this is easily done by editing /opt/petasan/config/cluster_info.json and restarting. naturally there will be downtime.
I guess it is better to set all maintenace flags to off in the web interface, to avoid recovery/rebalance during reconfiguration, is it correct ?
Quote from admin on April 25, 2020, 7:41 pmYou can reconfigure the cluster networks, this is easily done by editing /opt/petasan/config/cluster_info.json and restarting. naturally there will be downtime.
I guess it is better to set all maintenace flags to off in the web interface, to avoid recovery/rebalance during reconfiguration, is it correct ?
admin
2,930 Posts
Quote from admin on April 26, 2020, 7:46 pmI would say it is not needed. You need to copy the cluster_info.json file to all nodes, stop all nodes, adjust any switch settings if needed for bonds/vlans, then restart all nodes..it is an all down/all up step, there will not be any rebalance happening.
instead of editing the cluster_info.json file, although you can, i recommend you have it generated by installing PetaSAN on a single vm node and deploy a new temporary cluster specifying your network settings, then grab the generated config file.
I would say it is not needed. You need to copy the cluster_info.json file to all nodes, stop all nodes, adjust any switch settings if needed for bonds/vlans, then restart all nodes..it is an all down/all up step, there will not be any rebalance happening.
instead of editing the cluster_info.json file, although you can, i recommend you have it generated by installing PetaSAN on a single vm node and deploy a new temporary cluster specifying your network settings, then grab the generated config file.
Ste
125 Posts
Quote from Ste on April 27, 2020, 11:56 amNetwork config successfully changed 😉 Thanks.
Network config successfully changed 😉 Thanks.