Forums - PetaSAN

ForumGeneral DiscussionPre Production Questions?
You need to log in to create posts and topics. Login · Register
Pre Production Questions?

jcid
14 Posts

February 12, 2020, 2:18 am
Quote from jcid on February 12, 2020, 2:18 am
Hello,

As you know I am running a three node cluster each node with a 3TB disk totaling 9TB on the cluster. I also have a standalone 800G SSD as well on each host I was going to use for cache even though each host has 128Gb Ram. I'm just about to put her in production and had some questions. My questions are,

Is a journal recommended?

Is journaling or cacheing preferred?

What happens if a node fails or shuts down, will it effect the other nodes?

What happens if a data disk fails, will it effect the other nodes?

I have three 3TB disks for a 9TB cluster, is it best to only use 3TB out of the 9 or is it safe to use more?

What happens if the MGMT network is disconnected?

What happens if the Backend network is connected?

Can you safely remove a node from a cluster?

What is the process to replace a failed DATA disk on a node?

I appreciate the answers as I'd like to know what I'm gonna have to deal with some day 🙂

Hello,

As you know I am running a three node cluster each node with a 3TB disk totaling 9TB on the cluster. I also have a standalone 800G SSD as well on each host I was going to use for cache even though each host has 128Gb Ram. I'm just about to put her in production and had some questions. My questions are,

Is a journal recommended?

Is journaling or cacheing preferred?

What happens if a node fails or shuts down, will it effect the other nodes?

What happens if a data disk fails, will it effect the other nodes?

I have three 3TB disks for a 9TB cluster, is it best to only use 3TB out of the 9 or is it safe to use more?

What happens if the MGMT network is disconnected?

What happens if the Backend network is connected?

Can you safely remove a node from a cluster?

What is the process to replace a failed DATA disk on a node?

I appreciate the answers as I'd like to know what I'm gonna have to deal with some day 🙂

Last edited on February 12, 2020, 3:48 am by jcid · #1

admin
2,930 Posts

February 12, 2020, 1:35 pm
Quote from admin on February 12, 2020, 1:35 pm

Journals are recommended when using HDD. Some users use nvme as journals to SSDs.

Journals speed both read and write, writecache is better for writes. Do test yourself as it depends on your workload.

When a node shuts, io will stall for approx 20 s then resume as the cluster updates itself so clients do not deal with downed disks, then after 10 min a recovery will start to create the lost replicas on some other nodes.

If a disk fails, it is similar to a node failing, you can think of a node as a container of disks.

For replicated pools, it is recommended to use 3 replicas, 2 is not safe. You can however use EC pools, which gives much lower overhead (an 4--2 EC pool will have 1.5 % storage overhead), you get the same redundancy but at the cost of speed.

If Management node disconnects, you will not be able to access the node on this subnet for ui and management purpose but you storage and iSCSI is working. you can also use interface bonding to make it highly available.

If backend network goes down you have a problem and your cluster will not work. you should use interface bonding to make it highly available.

If a disks fails, you will be able to delete it from ui, you can put a new disk and add it as an OSD. The system has no concept of replace, rather adding and deleting

you can remove non-management nodes. management nodes should not be removed. if you wish to change hardware you can "replace" a management node with a new box, it will still use the same hostname/ips as old node

Journals are recommended when using HDD. Some users use nvme as journals to SSDs.

Journals speed both read and write, writecache is better for writes. Do test yourself as it depends on your workload.

When a node shuts, io will stall for approx 20 s then resume as the cluster updates itself so clients do not deal with downed disks, then after 10 min a recovery will start to create the lost replicas on some other nodes.

If a disk fails, it is similar to a node failing, you can think of a node as a container of disks.

For replicated pools, it is recommended to use 3 replicas, 2 is not safe. You can however use EC pools, which gives much lower overhead (an 4--2 EC pool will have 1.5 % storage overhead), you get the same redundancy but at the cost of speed.

If Management node disconnects, you will not be able to access the node on this subnet for ui and management purpose but you storage and iSCSI is working. you can also use interface bonding to make it highly available.

If backend network goes down you have a problem and your cluster will not work. you should use interface bonding to make it highly available.

If a disks fails, you will be able to delete it from ui, you can put a new disk and add it as an OSD. The system has no concept of replace, rather adding and deleting

you can remove non-management nodes. management nodes should not be removed. if you wish to change hardware you can "replace" a management node with a new box, it will still use the same hostname/ips as old node

Last edited on February 12, 2020, 1:37 pm by admin · #2

jcid
14 Posts

February 12, 2020, 2:15 pm
Quote from jcid on February 12, 2020, 2:15 pm
Perfect thank you!! and Great support!

Perfect thank you!! and Great support!

#3

jcid
14 Posts

February 12, 2020, 4:38 pm
Quote from jcid on February 12, 2020, 4:38 pm
Last questions,

Under Pool> Add Pool> "size" and "Min size" what are these? The replica counts?

Under CRUSH> Rules> does the default replicate rule place replicas on each separate node by default? If not what do i need to add to achieve this?

Thank You!

Last questions,

Under Pool> Add Pool> "size" and "Min size" what are these? The replica counts?

Under CRUSH> Rules> does the default replicate rule place replicas on each separate node by default? If not what do i need to add to achieve this?

Thank You!

#4

Post Reply: Pre Production Questions?

Cancel