Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

New Lab Cluster issues

I've installed a standard 3 node lab cluster ( cloud based)   each node is one OSD and one Journal  just for testing

I've installed the cluster 10 times (not exaggerating)  followed the quick start to the letter.

No matter what i do i always end up with this issue.

 

gluster> peer status
Number of Peers: 2

Hostname: 172.16.6.1
Uuid: 42a74f99-a842-419b-9552-d2d5386ce260
State: Peer in Cluster (Connected)

Hostname: 172.16.6.2
Uuid: a3ae1f16-c417-4be7-8ecf-8e6b60c52b9c
State: Peer in Cluster (Connected)
gluster>
root@ps-node-3:~# gluster vol create gfs-vol replica 3 172.16.6.1:/opt/petasan/config/gfs-brick 172.16.6.2:/opt/petasan/config/gfs-brick 172.16.6.3:/opt/petasan/config/gfs-brick

volume create: gfs-vol: failed: /opt/petasan/config/gfs-brick is already part of a volume

root@ps-node-3:~# gluster volume status
No volumes present

root@ps-node-3:~# gluster volume list
No volumes present in cluster

root@ps-node-3:~# ls /opt/petasan/config/

ls: cannot access '/opt/petasan/config/shared': Transport endpoint is not connected

certificates cluster_info.json crush etc flags gfs-brick lost+found node_info.json pages.json replication rolepages.json roles.json root services_interfaces.json shared stats tuning var

------

CIFSException

raise CIFSException(CIFSException.CIFS_CLUSTER_NOT_UP, '')

File "/usr/lib/python2.7/dist-packages/PetaSAN/backend/manage_cifs.py", line 214, in get_cifs_status

cifs_status = manage_cifs.get_cifs_status()

File "/usr/lib/python2.7/dist-packages/PetaSAN/web/admin_controller/manage_cifs.py", line 365, in get_cifs_status

Traceback (most recent call last):

21/11/2020 18:18:11 ERROR

 

As you will see node 3  172.16.6.3 is NOT in the gluster peer list  and i do not see how to get it to sync up.

I love the concept of this software and it will fill a need for a project i have coming up,  but if it can't deploy reliably i'm not sure.

 

what do you mean by cloud based ?

what is the output on all 3 nodes of:

systemctl status glusterd
gluster peer status
gluster vol status

 

I simply mean the nodes are all VMs   it's a lab and i'm not investing in hardware just to test.

--

root@ps-node-1:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; disabled; vendor preset: enabled)
Active: active (running) since Sun 2020-11-22 12:28:03 CST; 1min 51s ago
Process: 1365 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 1367 (glusterd)
Tasks: 8 (limit: 4666)
CGroup: /system.slice/glusterd.service
└─1367 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Nov 22 12:27:57 ps-node-1 systemd[1]: Starting GlusterFS, a clustered file-system server...
Nov 22 12:28:03 ps-node-1 systemd[1]: Started GlusterFS, a clustered file-system server.
root@ps-node-1:~# gluster peer status
Number of Peers: 2

Hostname: ps-node-3
Uuid: 1fec843b-85ab-44fd-a64a-ce9c292706be
State: Peer in Cluster (Connected)

Hostname: 172.16.6.2
Uuid: a3ae1f16-c417-4be7-8ecf-8e6b60c52b9c
State: Peer in Cluster (Connected)
root@ps-node-1:~# gluster vol status
No volumes present

--

root@ps-node-2:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; disabled; vendor preset: enabled)
Active: active (running) since Sun 2020-11-22 12:29:00 CST; 9s ago
Process: 1372 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 1373 (glusterd)
Tasks: 8 (limit: 4666)
CGroup: /system.slice/glusterd.service
└─1373 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Nov 22 12:28:55 ps-node-2 systemd[1]: Starting GlusterFS, a clustered file-system server...
Nov 22 12:29:00 ps-node-2 systemd[1]: Started GlusterFS, a clustered file-system server.
root@ps-node-2:~# gluster peer status
Number of Peers: 2

Hostname: 172.16.6.1
Uuid: 42a74f99-a842-419b-9552-d2d5386ce260
State: Peer in Cluster (Connected)

Hostname: ps-node-3
Uuid: 1fec843b-85ab-44fd-a64a-ce9c292706be
State: Peer in Cluster (Connected)
root@ps-node-2:~# gluster vol status
No volumes present

---

root@ps-node-3:~# systemctl status glusterd
● glusterd.service - GlusterFS, a clustered file-system server
Loaded: loaded (/lib/systemd/system/glusterd.service; disabled; vendor preset: enabled)
Active: active (running) since Sun 2020-11-22 12:28:40 CST; 1min 59s ago
Process: 1298 ExecStart=/usr/sbin/glusterd -p /var/run/glusterd.pid --log-level $LOG_LEVEL $GLUSTERD_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 1303 (glusterd)
Tasks: 8 (limit: 4666)
CGroup: /system.slice/glusterd.service
└─1303 /usr/sbin/glusterd -p /var/run/glusterd.pid --log-level INFO

Nov 22 12:28:33 ps-node-3 systemd[1]: Starting GlusterFS, a clustered file-system server...
Nov 22 12:28:40 ps-node-3 systemd[1]: Started GlusterFS, a clustered file-system server.
root@ps-node-3:~# gluster peer status
Number of Peers: 2

Hostname: 172.16.6.2
Uuid: a3ae1f16-c417-4be7-8ecf-8e6b60c52b9c
State: Peer in Cluster (Connected)

Hostname: 172.16.6.1
Uuid: 42a74f99-a842-419b-9552-d2d5386ce260
State: Peer in Cluster (Connected)
root@ps-node-3:~# gluster vol status
No volumes present

1. can you also post the content of

/opt/petasan/config/cluster_info.json

2. can you manually start the volume via

gluster vol start gfs-vol

3. on node 3 log file /opt/petasan/log/PetaSAN.log do you see any errors for  gfs-vol

4. can you double check your management network and backend networks are 2 distinct non overlapping subnets.

5. can you check if you have an external dns and if so does it resolve node names to management ips rather than backend ips.

cat /opt/petasan/config/cluster_info.json
{
"backend_1_base_ip": "172.16.6.0",
"backend_1_eth_name": "eth1",
"backend_1_mask": "255.255.0.0",
"backend_1_vlan_id": "",
"backend_2_base_ip": "",
"backend_2_eth_name": "",
"backend_2_mask": "",
"backend_2_vlan_id": "",
"bonds": [],
"default_pool": "both",
"default_pool_pgs": "256",
"default_pool_replicas": "3",
"eth_count": 2,
"jf_mtu_size": "",
"jumbo_frames": [],
"management_eth_name": "eth0",
"management_nodes": [
{
"backend_1_ip": "172.16.6.1",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "172.16.5.1",
"name": "ps-node-1"
},
{
"backend_1_ip": "172.16.6.2",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "172.16.5.2",
"name": "ps-node-2"
},
{
"backend_1_ip": "172.16.6.3",
"backend_2_ip": "",
"is_backup": false,
"is_cifs": true,
"is_iscsi": true,
"is_management": true,
"is_nfs": true,
"is_storage": true,
"management_ip": "172.16.5.3",
"name": "ps-node-3"
}
],
"name": "san",
"storage_engine": "bluestore"

gluster vol start gfs-vol
volume start: gfs-vol: failed: Volume gfs-vol does not exist

i can see the subnet mask is 255.255.0.0 so the subnets overlap, this will cause a lot of issues at the network layer.

make sure your management and backend are distinct subets, similarly check your management subnet

I'll reinstall it again tomorrow and let you know    just seems odd to me  if they have separate addresses

gotta love people who blame software because they don't understand simple networking.