size of iscsi disks
R3LZX
50 Posts
September 23, 2019, 11:29 amQuote from R3LZX on September 23, 2019, 11:29 amso originally I had two ISCSI disks that were created and this week I had a massive spike in latency / read writes, it was so bad that servers running on those particular disks were inoperable and crashed. after restarting them and moving some of them around to another storage node all seems to be fine with the servers still on the drives, so the question is what size is a good size for storage? should I keep each iscsi at or about 5TB? with equalogics and other appliances it's recommended to avoid corruption, does Ceph also get corrupted over a certain threshold? A google search did not immediately point me anywhere
original iscsi sizes were 20TB each
we are a Vmware 6.5 server environment and petasan is bare metal on all three servers
so originally I had two ISCSI disks that were created and this week I had a massive spike in latency / read writes, it was so bad that servers running on those particular disks were inoperable and crashed. after restarting them and moving some of them around to another storage node all seems to be fine with the servers still on the drives, so the question is what size is a good size for storage? should I keep each iscsi at or about 5TB? with equalogics and other appliances it's recommended to avoid corruption, does Ceph also get corrupted over a certain threshold? A google search did not immediately point me anywhere
original iscsi sizes were 20TB each
we are a Vmware 6.5 server environment and petasan is bare metal on all three servers
Last edited on September 23, 2019, 11:30 am by R3LZX · #1
admin
2,930 Posts
September 23, 2019, 1:05 pmQuote from admin on September 23, 2019, 1:05 pmYou workload would be divided equally on the OSDs, so the size of the iSCSI disks have little effect.
The latency spike should be investigated: is it caused by hardware/network problems ? or is it due to the workload itself can increase at some point (example due to backups) that at those points your hardware is under-powered. If it is the later, you should study your workload well and also have benchmarked your cluster to know what performance it can give you and if you need to increase nodes/disks. Look at your % utilization charts for disk, cpu.
If it is a network hardware, make sure you have bonded nics and have redundant isolated networks.
ESXi is will not tolerate too high latency and can drop the datastore in such case. There is no corruption in Ceph, maybe the other SANs do some caching so size is an issue, but this is not the case.
One more thing: if you are doing replication or taking snapshots, then a large disk will take more resources to snapshot than a smaller disk ( but will be same resources if you snapshot many smaller disks with same added size )
You workload would be divided equally on the OSDs, so the size of the iSCSI disks have little effect.
The latency spike should be investigated: is it caused by hardware/network problems ? or is it due to the workload itself can increase at some point (example due to backups) that at those points your hardware is under-powered. If it is the later, you should study your workload well and also have benchmarked your cluster to know what performance it can give you and if you need to increase nodes/disks. Look at your % utilization charts for disk, cpu.
If it is a network hardware, make sure you have bonded nics and have redundant isolated networks.
ESXi is will not tolerate too high latency and can drop the datastore in such case. There is no corruption in Ceph, maybe the other SANs do some caching so size is an issue, but this is not the case.
One more thing: if you are doing replication or taking snapshots, then a large disk will take more resources to snapshot than a smaller disk ( but will be same resources if you snapshot many smaller disks with same added size )
Last edited on September 23, 2019, 1:11 pm by admin · #2
R3LZX
50 Posts
October 1, 2019, 11:32 amQuote from R3LZX on October 1, 2019, 11:32 amI have not seen this happen again, and IOPS are much much better running benchmark, since the product was only running a month, I am wondering if maybe the scrubbing issue was what caused it (now resolved thank you)
I have not seen this happen again, and IOPS are much much better running benchmark, since the product was only running a month, I am wondering if maybe the scrubbing issue was what caused it (now resolved thank you)
size of iscsi disks
R3LZX
50 Posts
Quote from R3LZX on September 23, 2019, 11:29 amso originally I had two ISCSI disks that were created and this week I had a massive spike in latency / read writes, it was so bad that servers running on those particular disks were inoperable and crashed. after restarting them and moving some of them around to another storage node all seems to be fine with the servers still on the drives, so the question is what size is a good size for storage? should I keep each iscsi at or about 5TB? with equalogics and other appliances it's recommended to avoid corruption, does Ceph also get corrupted over a certain threshold? A google search did not immediately point me anywhere
original iscsi sizes were 20TB each
we are a Vmware 6.5 server environment and petasan is bare metal on all three servers
so originally I had two ISCSI disks that were created and this week I had a massive spike in latency / read writes, it was so bad that servers running on those particular disks were inoperable and crashed. after restarting them and moving some of them around to another storage node all seems to be fine with the servers still on the drives, so the question is what size is a good size for storage? should I keep each iscsi at or about 5TB? with equalogics and other appliances it's recommended to avoid corruption, does Ceph also get corrupted over a certain threshold? A google search did not immediately point me anywhere
original iscsi sizes were 20TB each
we are a Vmware 6.5 server environment and petasan is bare metal on all three servers
admin
2,930 Posts
Quote from admin on September 23, 2019, 1:05 pmYou workload would be divided equally on the OSDs, so the size of the iSCSI disks have little effect.
The latency spike should be investigated: is it caused by hardware/network problems ? or is it due to the workload itself can increase at some point (example due to backups) that at those points your hardware is under-powered. If it is the later, you should study your workload well and also have benchmarked your cluster to know what performance it can give you and if you need to increase nodes/disks. Look at your % utilization charts for disk, cpu.
If it is a network hardware, make sure you have bonded nics and have redundant isolated networks.
ESXi is will not tolerate too high latency and can drop the datastore in such case. There is no corruption in Ceph, maybe the other SANs do some caching so size is an issue, but this is not the case.
One more thing: if you are doing replication or taking snapshots, then a large disk will take more resources to snapshot than a smaller disk ( but will be same resources if you snapshot many smaller disks with same added size )
You workload would be divided equally on the OSDs, so the size of the iSCSI disks have little effect.
The latency spike should be investigated: is it caused by hardware/network problems ? or is it due to the workload itself can increase at some point (example due to backups) that at those points your hardware is under-powered. If it is the later, you should study your workload well and also have benchmarked your cluster to know what performance it can give you and if you need to increase nodes/disks. Look at your % utilization charts for disk, cpu.
If it is a network hardware, make sure you have bonded nics and have redundant isolated networks.
ESXi is will not tolerate too high latency and can drop the datastore in such case. There is no corruption in Ceph, maybe the other SANs do some caching so size is an issue, but this is not the case.
One more thing: if you are doing replication or taking snapshots, then a large disk will take more resources to snapshot than a smaller disk ( but will be same resources if you snapshot many smaller disks with same added size )
R3LZX
50 Posts
Quote from R3LZX on October 1, 2019, 11:32 amI have not seen this happen again, and IOPS are much much better running benchmark, since the product was only running a month, I am wondering if maybe the scrubbing issue was what caused it (now resolved thank you)
I have not seen this happen again, and IOPS are much much better running benchmark, since the product was only running a month, I am wondering if maybe the scrubbing issue was what caused it (now resolved thank you)