Forums - PetaSAN

ForumBug ReportingEsxi Server Freeze.
You need to log in to create posts and topics. Login · Register
Esxi Server Freeze.

Pages: 1 2 3 4

msalem
87 Posts

February 11, 2019, 7:14 pm
Quote from msalem on February 11, 2019, 7:14 pm
Hello Admin,

So I need the last recommendations from you .

I have six servers with the same specs on top , with 2 ssd and 10x8TB disks

I need to tolerate a two host failure and with EC setup .

Can this be done with low hardware template you have in PetaSan

And I will need to forward SNMP traffic off for monitoring and syslogs

Any concerns here

Thanks again for your support and great product

Hello Admin,

So I need the last recommendations from you .

I have six servers with the same specs on top , with 2 ssd and 10x8TB disks

I need to tolerate a two host failure and with EC setup .

Can this be done with low hardware template you have in PetaSan

And I will need to forward SNMP traffic off for monitoring and syslogs

Any concerns here

Thanks again for your support and great product

#31

admin
2,930 Posts

February 12, 2019, 10:01 am
Quote from admin on February 12, 2019, 10:01 am
For performance: EC will slow your iops and latencies, you should also test with replicated pool to see the speed drop. You may want to have a mix of pool types. As suggested you need at least a controller set with write back cache for spinning disks, also consider adding a couple of ssds and create a mix of fast and slow pools.

For EC k=4 m=2 profile: you need at least k+m=6 nodes. m=2 so it will tolerate 2 failures. If you set your min size to k+1 (default-recommended), your cluster will still be serving client io when 1 nodes fails but if 2 nodes fail your cluster will still have no data loss but will stop client io until it recovers. To recover back you need to re-operate one of the failed nodes or add a new one, the cluster will never re-create the lost chunk on nodes that already have chunks for same object since it defeats the purpose.

For snmp, yes we know some users do this manually by downloading the necessary debs since it is an Ubuntu os. Just make sure you do not over-write core packages like upgrading the kernel or storage related components.

For performance: EC will slow your iops and latencies, you should also test with replicated pool to see the speed drop. You may want to have a mix of pool types. As suggested you need at least a controller set with write back cache for spinning disks, also consider adding a couple of ssds and create a mix of fast and slow pools.

For EC k=4 m=2 profile: you need at least k+m=6 nodes. m=2 so it will tolerate 2 failures. If you set your min size to k+1 (default-recommended), your cluster will still be serving client io when 1 nodes fails but if 2 nodes fail your cluster will still have no data loss but will stop client io until it recovers. To recover back you need to re-operate one of the failed nodes or add a new one, the cluster will never re-create the lost chunk on nodes that already have chunks for same object since it defeats the purpose.

For snmp, yes we know some users do this manually by downloading the necessary debs since it is an Ubuntu os. Just make sure you do not over-write core packages like upgrading the kernel or storage related components.

Last edited on February 12, 2019, 10:01 am by admin · #32

admin
2,930 Posts

February 12, 2019, 10:23 am
Quote from admin on February 12, 2019, 10:23 am
to add to the above, the test you showed were for replicated pool or the ec pool ?

to add to the above, the test you showed were for replicated pool or the ec pool ?

#33

msalem
87 Posts

February 12, 2019, 12:11 pm
Quote from msalem on February 12, 2019, 12:11 pm
the tests were for EC Pool.

In addition we have SSD as caching so I dont really understand your comment on adding SSD to the mix.

https://ibb.co/tcfvGr9

This is one node, all six are the same setup.

My question was no the PG's and the Pool setup. we have in total 60xOSD's with 8TB space. what is the suitable setup.

https://ibb.co/jJRLMqP

--- when we installed PetaSAN we used the template (high-end Hardware) would that make a difference. ? or we should be using something else. !

Thanks

the tests were for EC Pool.

In addition we have SSD as caching so I dont really understand your comment on adding SSD to the mix.

This is one node, all six are the same setup.

My question was no the PG's and the Pool setup. we have in total 60xOSD's with 8TB space. what is the suitable setup.

--- when we installed PetaSAN we used the template (high-end Hardware) would that make a difference. ? or we should be using something else. !

Thanks

#34

msalem
87 Posts

February 12, 2019, 12:49 pm
Quote from msalem on February 12, 2019, 12:49 pm
About the Average Load, this is what I ment. when I have VM's running with some activity.

top - 07:47:43 up 6 days, 3:32, 1 user, load average: 8.27, 5.37, 3.01
Tasks: 557 total,   3 running, 553 sleeping,   1 stopped,   0 zombie
%Cpu(s): 7.5 us, 5.5 sy, 0.0 ni, 83.2 id, 2.8 wa, 0.0 hi, 1.0 si, 0.0 st
KiB Mem : 13190262+total, 10228278+free, 27511200 used, 2108644 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 10334070+avail Mem

Is this Load normal ? sometimes it spikes to 10 or 12 .. (load average: 8.27, 5.37, 3.01)

Thanks

About the Average Load, this is what I ment. when I have VM's running with some activity.

top - 07:47:43 up 6 days, 3:32, 1 user, load average: 8.27, 5.37, 3.01
Tasks: 557 total,   3 running, 553 sleeping,   1 stopped,   0 zombie
%Cpu(s): 7.5 us, 5.5 sy, 0.0 ni, 83.2 id, 2.8 wa, 0.0 hi, 1.0 si, 0.0 st
KiB Mem : 13190262+total, 10228278+free, 27511200 used, 2108644 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 10334070+avail Mem

Is this Load normal ? sometimes it spikes to 10 or 12 .. (load average: 8.27, 5.37, 3.01)

Thanks

#35

admin
2,930 Posts

February 12, 2019, 3:06 pm
Quote from admin on February 12, 2019, 3:06 pm
For EC 42 pools, the results you show are not bad for your hardware. For each read op a client/vm sees, you are doing 6 reads from different disks on different nodes. Assume your spinning disks can do 150 iops each and you have 60, so you can read 9000 ec chunks per sec which gives a total of 1500 client iops for your cluster. In contrast a replicated pool requires 1 chunk read per read op, so you will get 9k iops. If you use a controller with cache, you will get several times better performance ( at least 2.5x and up to 5x), an ssd pool will give much better result.

The 2 ssd jorunals are not caches. They have 2 functions: a write ahead log (wal) and db for the metada. While these do improve performance (you have better write iops than read), it is not a substitute for all flash, or even controller with cache to front the spinners.

For PG number per pool: the rule is to have about 100 PGs served by OSD. A size of 1024 for your pool will give 1024 / 60 x 6 = about 100 PGs per OSD.

The load you see may be ok, depending on you workload. I would recommend you look at the load from the PetaSAN dashboard charts under node stats, we track pretty much everything. Also when you benchmark you cluster before deployment it should give you a good indication of not just how fast it is but also how busy it gets with the various loads.

For EC 42 pools, the results you show are not bad for your hardware. For each read op a client/vm sees, you are doing 6 reads from different disks on different nodes. Assume your spinning disks can do 150 iops each and you have 60, so you can read 9000 ec chunks per sec which gives a total of 1500 client iops for your cluster. In contrast a replicated pool requires 1 chunk read per read op, so you will get 9k iops. If you use a controller with cache, you will get several times better performance ( at least 2.5x and up to 5x), an ssd pool will give much better result.

The 2 ssd jorunals are not caches. They have 2 functions: a write ahead log (wal) and db for the metada. While these do improve performance (you have better write iops than read), it is not a substitute for all flash, or even controller with cache to front the spinners.

For PG number per pool: the rule is to have about 100 PGs served by OSD. A size of 1024 for your pool will give 1024 / 60 x 6 = about 100 PGs per OSD.

The load you see may be ok, depending on you workload. I would recommend you look at the load from the PetaSAN dashboard charts under node stats, we track pretty much everything. Also when you benchmark you cluster before deployment it should give you a good indication of not just how fast it is but also how busy it gets with the various loads.

Last edited on February 12, 2019, 3:09 pm by admin · #36

Post Reply: Esxi Server Freeze.

Cancel

Pages: 1 2 3 4