Physical disk list not shown
afrima
17 Posts
January 22, 2018, 3:55 pmQuote from afrima on January 22, 2018, 3:55 pmHey guys,
we are having a problem with the disk list showing up. after removing a node, removing its OSDs and adding it and joining it to the same cluster, it recognizes the right number of disks (OSDs) but it doesn't show them in Manage Node > Node List> Physical Disk List.
It only shows the OSDs of the rejoined machine.
Hey guys,
we are having a problem with the disk list showing up. after removing a node, removing its OSDs and adding it and joining it to the same cluster, it recognizes the right number of disks (OSDs) but it doesn't show them in Manage Node > Node List> Physical Disk List.
It only shows the OSDs of the rejoined machine.
admin
2,930 Posts
January 22, 2018, 5:19 pmQuote from admin on January 22, 2018, 5:19 pmHi,
Can you please give more detail as it is not clear to me. What do you mean "disk list showing up" : is it blank ? "It only shows the OSDs of the rejoined machine" : show them where ?
also can you please also include the output of :
ceph osd tree --cluster CLUSTER_NAME
ceph-disk list
/opt/petasan/scripts/detect-disks.sh
Hi,
Can you please give more detail as it is not clear to me. What do you mean "disk list showing up" : is it blank ? "It only shows the OSDs of the rejoined machine" : show them where ?
also can you please also include the output of :
ceph osd tree --cluster CLUSTER_NAME
ceph-disk list
/opt/petasan/scripts/detect-disks.sh
Last edited on January 22, 2018, 5:19 pm by admin · #2
afrima
17 Posts
January 22, 2018, 6:17 pmQuote from afrima on January 22, 2018, 6:17 pmin the web interface, in the following path : Manage Node > Node List> Physical Disk List, when you want to check the physical disk list (disks) of management nodes, it keep hanging. the "physical Disk List" option only works for the rejoined node not for previous existing nodes. there is no error in ceph health and the entire cluster is working just fine. the only problem is it doesn't show the disks in the web interface as mentioned above.
ceph osd tree --cluster lab-cluster
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 3.45276 root default
-2 0.86319 host Nodemgmt3
8 0.43159 osd.8 up 1.00000 1.00000
9 0.43159 osd.9 up 1.00000 1.00000
-3 0.86319 host Nodemgmt
2 0.43159 osd.2 up 1.00000 1.00000
3 0.43159 osd.3 up 1.00000 1.00000
-4 0.86319 host Nodemgmt2
4 0.43159 osd.4 up 1.00000 1.00000
5 0.43159 osd.5 up 1.00000 1.00000
-5 0.86319 host Node4
0 0.43159 osd.0 up 1.00000 1.00000
1 0.43159 osd.1 up 1.00000 1.00000
ceph-disk list
/dev/sda :
/dev/sda2 other, ext4, mounted on /
/dev/sda1 other, ext4, mounted on /boot
/dev/sda4 other, ext4, mounted on /opt/petasan/config
/dev/sda3 other, ext4, mounted on /var/lib/ceph
/dev/sdb :
/dev/sdb2 ceph journal, for /dev/sdb1
/dev/sdb1 ceph data, active, cluster lab-cluster, osd.2, journal /dev/sdb2
/dev/sdc :
/dev/sdc2 ceph journal, for /dev/sdc1
/dev/sdc1 ceph data, active, cluster lab-cluster, osd.3, journal /dev/sdc2
/opt/petasan/scripts/detect-disks.sh
device=sda,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G4480QGN
device=sdb,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL6526013J480QGN
device=sdc,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G6480QGN
in the web interface, in the following path : Manage Node > Node List> Physical Disk List, when you want to check the physical disk list (disks) of management nodes, it keep hanging. the "physical Disk List" option only works for the rejoined node not for previous existing nodes. there is no error in ceph health and the entire cluster is working just fine. the only problem is it doesn't show the disks in the web interface as mentioned above.
ceph osd tree --cluster lab-cluster
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 3.45276 root default
-2 0.86319 host Nodemgmt3
8 0.43159 osd.8 up 1.00000 1.00000
9 0.43159 osd.9 up 1.00000 1.00000
-3 0.86319 host Nodemgmt
2 0.43159 osd.2 up 1.00000 1.00000
3 0.43159 osd.3 up 1.00000 1.00000
-4 0.86319 host Nodemgmt2
4 0.43159 osd.4 up 1.00000 1.00000
5 0.43159 osd.5 up 1.00000 1.00000
-5 0.86319 host Node4
0 0.43159 osd.0 up 1.00000 1.00000
1 0.43159 osd.1 up 1.00000 1.00000
ceph-disk list
/dev/sda :
/dev/sda2 other, ext4, mounted on /
/dev/sda1 other, ext4, mounted on /boot
/dev/sda4 other, ext4, mounted on /opt/petasan/config
/dev/sda3 other, ext4, mounted on /var/lib/ceph
/dev/sdb :
/dev/sdb2 ceph journal, for /dev/sdb1
/dev/sdb1 ceph data, active, cluster lab-cluster, osd.2, journal /dev/sdb2
/dev/sdc :
/dev/sdc2 ceph journal, for /dev/sdc1
/dev/sdc1 ceph data, active, cluster lab-cluster, osd.3, journal /dev/sdc2
/opt/petasan/scripts/detect-disks.sh
device=sda,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G4480QGN
device=sdb,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL6526013J480QGN
device=sdc,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G6480QGN
admin
2,930 Posts
January 22, 2018, 7:09 pmQuote from admin on January 22, 2018, 7:09 pmhmm..i understand what you say now but it is strange. So when you log with the admin web interface on a management node like Nodemgmt you can only open the physical disk list of Node4 but not for Nodemgmt2, Nodemgmt3 or even Nodemgmt itself ..correct ?
If you ssh on Nodemgmt can you ping to Nodemgmt, Nodemgmt2, Nodemgmt3 ? can you ssh to them from Nodemgmt ?
Can you please show the output on Nodemgmt of:
/etc/hosts
/opt/petasan/config/cluster_info.json
/etc/network/interfaces
/etc/hostname
Have you manually changed any ips, hostnames or made changes to crush map ?
Cheers..
hmm..i understand what you say now but it is strange. So when you log with the admin web interface on a management node like Nodemgmt you can only open the physical disk list of Node4 but not for Nodemgmt2, Nodemgmt3 or even Nodemgmt itself ..correct ?
If you ssh on Nodemgmt can you ping to Nodemgmt, Nodemgmt2, Nodemgmt3 ? can you ssh to them from Nodemgmt ?
Can you please show the output on Nodemgmt of:
/etc/hosts
/opt/petasan/config/cluster_info.json
/etc/network/interfaces
/etc/hostname
Have you manually changed any ips, hostnames or made changes to crush map ?
Cheers..
Last edited on January 22, 2018, 7:17 pm by admin · #4
afrima
17 Posts
January 22, 2018, 8:07 pmQuote from afrima on January 22, 2018, 8:07 pmthat's correct!
all other nodes are reachable from Nodemgmt.
i can ssh to Nodemgmt2 and Node4 from my Nodemgmt (but not to Nodemgmt3 which is weird!)
i haven't made any changes to the crush map, but for 1 during initial setup i once changed the host name of Nodemgmt once which caused no problem and has been working just fine since then.
here you can find the info you asked:
/etc/hosts
172.19.0.49 Node4
/opt/petasan/config/cluster_info.json
{
"backend_1_base_ip": "192.168.1.0",
"backend_1_eth_name": "eth0",
"backend_1_mask": "255.255.255.0",
"backend_2_base_ip": "192.168.2.0",
"backend_2_eth_name": "eth1",
"backend_2_mask": "255.255.255.0",
"bonds": [],
"eth_count": 4,
"iscsi_1_eth_name": "eth3",
"iscsi_2_eth_name": "eth3",
"jumbo_frames": [],
"management_eth_name": "eth2",
"management_nodes": [
{
"backend_1_ip": "192.168.1.11",
"backend_2_ip": "192.168.2.11",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.44",
"name": "Nodemgmt"
},
{
"backend_1_ip": "192.168.1.12",
"backend_2_ip": "192.168.2.12",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.45",
"name": "Nodemgmt2"
},
{
"backend_1_ip": "192.168.1.13",
"backend_2_ip": "192.168.2.13",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.46",
"name": "Nodemgmt3"
}
],
"name": "lab-cluster"
/etc/network/interfaces
auto eth2
iface eth2 inet static
address 172.19.0.44
netmask 255.255.255.0
gateway 172.19.0.2
dns-nameservers 172.19.0.10
/etc/hostname
Nodemgmt
that's correct!
all other nodes are reachable from Nodemgmt.
i can ssh to Nodemgmt2 and Node4 from my Nodemgmt (but not to Nodemgmt3 which is weird!)
i haven't made any changes to the crush map, but for 1 during initial setup i once changed the host name of Nodemgmt once which caused no problem and has been working just fine since then.
here you can find the info you asked:
/etc/hosts
172.19.0.49 Node4
/opt/petasan/config/cluster_info.json
{
"backend_1_base_ip": "192.168.1.0",
"backend_1_eth_name": "eth0",
"backend_1_mask": "255.255.255.0",
"backend_2_base_ip": "192.168.2.0",
"backend_2_eth_name": "eth1",
"backend_2_mask": "255.255.255.0",
"bonds": [],
"eth_count": 4,
"iscsi_1_eth_name": "eth3",
"iscsi_2_eth_name": "eth3",
"jumbo_frames": [],
"management_eth_name": "eth2",
"management_nodes": [
{
"backend_1_ip": "192.168.1.11",
"backend_2_ip": "192.168.2.11",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.44",
"name": "Nodemgmt"
},
{
"backend_1_ip": "192.168.1.12",
"backend_2_ip": "192.168.2.12",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.45",
"name": "Nodemgmt2"
},
{
"backend_1_ip": "192.168.1.13",
"backend_2_ip": "192.168.2.13",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.46",
"name": "Nodemgmt3"
}
],
"name": "lab-cluster"
/etc/network/interfaces
auto eth2
iface eth2 inet static
address 172.19.0.44
netmask 255.255.255.0
gateway 172.19.0.2
dns-nameservers 172.19.0.10
/etc/hostname
Nodemgmt
admin
2,930 Posts
January 22, 2018, 9:04 pmQuote from admin on January 22, 2018, 9:04 pmIt looks like /etc/hosts across all nodes is bad. The joining node should append itself to the existing cluster /etc/hosts then broadcast/sync the new file. If there was a temporary network issue, the joining node may not get the current /etc/hosts so ends up creating a file with just itself and broadcast it, if the later works all the cluster will get an /etc/hosts with 1 node. We have seen a case of this before if there was an issue during the joining of the last node like re-doing a deployment after network adjustments (switching cables) or maybe it was just a network glitch. It should not occcur in normal case but we will look into it so this does not occur.
To fix, go to any node and stop the file sync service (else the system will overwrite your changes):
systemctl stop petasan-file-sync
edit the /etc/hosts
172.19.0.44 Nodemgmt
172.19.0.45 Nodemgmt2
172.19.0.46 Nodemgmt3
172.19.0.49 Node4
sync the hosts file to all nodes:
/opt/petasan/scripts/util/sync_file.py /etc/hosts
restart the sync service on current node
systemctl start petasan-file-sync
Note the /opt/petasan/scripts/util/sync_file.py script was added to version 1.5, if you have an earlier version you can get it from:
https://drive.google.com/open?id=1FzvpVOnN96B2VN52o9lTwllrJ1lkyJRD
It looks like /etc/hosts across all nodes is bad. The joining node should append itself to the existing cluster /etc/hosts then broadcast/sync the new file. If there was a temporary network issue, the joining node may not get the current /etc/hosts so ends up creating a file with just itself and broadcast it, if the later works all the cluster will get an /etc/hosts with 1 node. We have seen a case of this before if there was an issue during the joining of the last node like re-doing a deployment after network adjustments (switching cables) or maybe it was just a network glitch. It should not occcur in normal case but we will look into it so this does not occur.
To fix, go to any node and stop the file sync service (else the system will overwrite your changes):
systemctl stop petasan-file-sync
edit the /etc/hosts
172.19.0.44 Nodemgmt
172.19.0.45 Nodemgmt2
172.19.0.46 Nodemgmt3
172.19.0.49 Node4
sync the hosts file to all nodes:
/opt/petasan/scripts/util/sync_file.py /etc/hosts
restart the sync service on current node
systemctl start petasan-file-sync
Note the /opt/petasan/scripts/util/sync_file.py script was added to version 1.5, if you have an earlier version you can get it from:
https://drive.google.com/open?id=1FzvpVOnN96B2VN52o9lTwllrJ1lkyJRD
Last edited on January 22, 2018, 9:05 pm by admin · #6
afrima
17 Posts
January 22, 2018, 9:28 pmQuote from afrima on January 22, 2018, 9:28 pmThanks! I'll give it a try and keep you informed 🙂
FYI, you guys are doing a great job
Thanks! I'll give it a try and keep you informed 🙂
FYI, you guys are doing a great job
admin
2,930 Posts
January 23, 2018, 6:23 amQuote from admin on January 23, 2018, 6:23 amforgot to add localhost to /etc/hosts, so at top add:
127.0.0.1 localhost
also just edit the file, not recreate it since it is actually a link
ls -l /etc/hosts
/etc/hosts -> /opt/petasan/config/etc/hosts
forgot to add localhost to /etc/hosts, so at top add:
127.0.0.1 localhost
also just edit the file, not recreate it since it is actually a link
ls -l /etc/hosts
/etc/hosts -> /opt/petasan/config/etc/hosts
Last edited on January 23, 2018, 6:33 am by admin · #8
Physical disk list not shown
afrima
17 Posts
Quote from afrima on January 22, 2018, 3:55 pmHey guys,
we are having a problem with the disk list showing up. after removing a node, removing its OSDs and adding it and joining it to the same cluster, it recognizes the right number of disks (OSDs) but it doesn't show them in Manage Node > Node List> Physical Disk List.
It only shows the OSDs of the rejoined machine.
Hey guys,
we are having a problem with the disk list showing up. after removing a node, removing its OSDs and adding it and joining it to the same cluster, it recognizes the right number of disks (OSDs) but it doesn't show them in Manage Node > Node List> Physical Disk List.
It only shows the OSDs of the rejoined machine.
admin
2,930 Posts
Quote from admin on January 22, 2018, 5:19 pmHi,
Can you please give more detail as it is not clear to me. What do you mean "disk list showing up" : is it blank ? "It only shows the OSDs of the rejoined machine" : show them where ?
also can you please also include the output of :
ceph osd tree --cluster CLUSTER_NAME
ceph-disk list
/opt/petasan/scripts/detect-disks.sh
Hi,
Can you please give more detail as it is not clear to me. What do you mean "disk list showing up" : is it blank ? "It only shows the OSDs of the rejoined machine" : show them where ?
also can you please also include the output of :
ceph osd tree --cluster CLUSTER_NAME
ceph-disk list
/opt/petasan/scripts/detect-disks.sh
afrima
17 Posts
Quote from afrima on January 22, 2018, 6:17 pmin the web interface, in the following path : Manage Node > Node List> Physical Disk List, when you want to check the physical disk list (disks) of management nodes, it keep hanging. the "physical Disk List" option only works for the rejoined node not for previous existing nodes. there is no error in ceph health and the entire cluster is working just fine. the only problem is it doesn't show the disks in the web interface as mentioned above.
ceph osd tree --cluster lab-cluster
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 3.45276 root default
-2 0.86319 host Nodemgmt3
8 0.43159 osd.8 up 1.00000 1.00000
9 0.43159 osd.9 up 1.00000 1.00000
-3 0.86319 host Nodemgmt
2 0.43159 osd.2 up 1.00000 1.00000
3 0.43159 osd.3 up 1.00000 1.00000
-4 0.86319 host Nodemgmt2
4 0.43159 osd.4 up 1.00000 1.00000
5 0.43159 osd.5 up 1.00000 1.00000
-5 0.86319 host Node4
0 0.43159 osd.0 up 1.00000 1.00000
1 0.43159 osd.1 up 1.00000 1.00000ceph-disk list
/dev/sda :
/dev/sda2 other, ext4, mounted on /
/dev/sda1 other, ext4, mounted on /boot
/dev/sda4 other, ext4, mounted on /opt/petasan/config
/dev/sda3 other, ext4, mounted on /var/lib/ceph
/dev/sdb :
/dev/sdb2 ceph journal, for /dev/sdb1
/dev/sdb1 ceph data, active, cluster lab-cluster, osd.2, journal /dev/sdb2
/dev/sdc :
/dev/sdc2 ceph journal, for /dev/sdc1
/dev/sdc1 ceph data, active, cluster lab-cluster, osd.3, journal /dev/sdc2/opt/petasan/scripts/detect-disks.sh
device=sda,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G4480QGN
device=sdb,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL6526013J480QGN
device=sdc,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G6480QGN
in the web interface, in the following path : Manage Node > Node List> Physical Disk List, when you want to check the physical disk list (disks) of management nodes, it keep hanging. the "physical Disk List" option only works for the rejoined node not for previous existing nodes. there is no error in ceph health and the entire cluster is working just fine. the only problem is it doesn't show the disks in the web interface as mentioned above.
ceph osd tree --cluster lab-cluster
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 3.45276 root default
-2 0.86319 host Nodemgmt3
8 0.43159 osd.8 up 1.00000 1.00000
9 0.43159 osd.9 up 1.00000 1.00000
-3 0.86319 host Nodemgmt
2 0.43159 osd.2 up 1.00000 1.00000
3 0.43159 osd.3 up 1.00000 1.00000
-4 0.86319 host Nodemgmt2
4 0.43159 osd.4 up 1.00000 1.00000
5 0.43159 osd.5 up 1.00000 1.00000
-5 0.86319 host Node4
0 0.43159 osd.0 up 1.00000 1.00000
1 0.43159 osd.1 up 1.00000 1.00000
ceph-disk list
/dev/sda :
/dev/sda2 other, ext4, mounted on /
/dev/sda1 other, ext4, mounted on /boot
/dev/sda4 other, ext4, mounted on /opt/petasan/config
/dev/sda3 other, ext4, mounted on /var/lib/ceph
/dev/sdb :
/dev/sdb2 ceph journal, for /dev/sdb1
/dev/sdb1 ceph data, active, cluster lab-cluster, osd.2, journal /dev/sdb2
/dev/sdc :
/dev/sdc2 ceph journal, for /dev/sdc1
/dev/sdc1 ceph data, active, cluster lab-cluster, osd.3, journal /dev/sdc2
/opt/petasan/scripts/detect-disks.sh
device=sda,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G4480QGN
device=sdb,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL6526013J480QGN
device=sdc,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G6480QGN
admin
2,930 Posts
Quote from admin on January 22, 2018, 7:09 pmhmm..i understand what you say now but it is strange. So when you log with the admin web interface on a management node like Nodemgmt you can only open the physical disk list of Node4 but not for Nodemgmt2, Nodemgmt3 or even Nodemgmt itself ..correct ?
If you ssh on Nodemgmt can you ping to Nodemgmt, Nodemgmt2, Nodemgmt3 ? can you ssh to them from Nodemgmt ?
Can you please show the output on Nodemgmt of:
/etc/hosts
/opt/petasan/config/cluster_info.json
/etc/network/interfaces
/etc/hostname
Have you manually changed any ips, hostnames or made changes to crush map ?
Cheers..
hmm..i understand what you say now but it is strange. So when you log with the admin web interface on a management node like Nodemgmt you can only open the physical disk list of Node4 but not for Nodemgmt2, Nodemgmt3 or even Nodemgmt itself ..correct ?
If you ssh on Nodemgmt can you ping to Nodemgmt, Nodemgmt2, Nodemgmt3 ? can you ssh to them from Nodemgmt ?
Can you please show the output on Nodemgmt of:
/etc/hosts
/opt/petasan/config/cluster_info.json
/etc/network/interfaces
/etc/hostname
Have you manually changed any ips, hostnames or made changes to crush map ?
Cheers..
afrima
17 Posts
Quote from afrima on January 22, 2018, 8:07 pmthat's correct!
all other nodes are reachable from Nodemgmt.
i can ssh to Nodemgmt2 and Node4 from my Nodemgmt (but not to Nodemgmt3 which is weird!)
i haven't made any changes to the crush map, but for 1 during initial setup i once changed the host name of Nodemgmt once which caused no problem and has been working just fine since then.
here you can find the info you asked:
/etc/hosts
172.19.0.49 Node4
/opt/petasan/config/cluster_info.json
{
"backend_1_base_ip": "192.168.1.0",
"backend_1_eth_name": "eth0",
"backend_1_mask": "255.255.255.0",
"backend_2_base_ip": "192.168.2.0",
"backend_2_eth_name": "eth1",
"backend_2_mask": "255.255.255.0",
"bonds": [],
"eth_count": 4,
"iscsi_1_eth_name": "eth3",
"iscsi_2_eth_name": "eth3",
"jumbo_frames": [],
"management_eth_name": "eth2",
"management_nodes": [
{
"backend_1_ip": "192.168.1.11",
"backend_2_ip": "192.168.2.11",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.44",
"name": "Nodemgmt"
},
{
"backend_1_ip": "192.168.1.12",
"backend_2_ip": "192.168.2.12",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.45",
"name": "Nodemgmt2"
},
{
"backend_1_ip": "192.168.1.13",
"backend_2_ip": "192.168.2.13",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.46",
"name": "Nodemgmt3"
}
],
"name": "lab-cluster"/etc/network/interfaces
auto eth2
iface eth2 inet static
address 172.19.0.44
netmask 255.255.255.0
gateway 172.19.0.2
dns-nameservers 172.19.0.10/etc/hostname
Nodemgmt
that's correct!
all other nodes are reachable from Nodemgmt.
i can ssh to Nodemgmt2 and Node4 from my Nodemgmt (but not to Nodemgmt3 which is weird!)
i haven't made any changes to the crush map, but for 1 during initial setup i once changed the host name of Nodemgmt once which caused no problem and has been working just fine since then.
here you can find the info you asked:
/etc/hosts
172.19.0.49 Node4
/opt/petasan/config/cluster_info.json
{
"backend_1_base_ip": "192.168.1.0",
"backend_1_eth_name": "eth0",
"backend_1_mask": "255.255.255.0",
"backend_2_base_ip": "192.168.2.0",
"backend_2_eth_name": "eth1",
"backend_2_mask": "255.255.255.0",
"bonds": [],
"eth_count": 4,
"iscsi_1_eth_name": "eth3",
"iscsi_2_eth_name": "eth3",
"jumbo_frames": [],
"management_eth_name": "eth2",
"management_nodes": [
{
"backend_1_ip": "192.168.1.11",
"backend_2_ip": "192.168.2.11",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.44",
"name": "Nodemgmt"
},
{
"backend_1_ip": "192.168.1.12",
"backend_2_ip": "192.168.2.12",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.45",
"name": "Nodemgmt2"
},
{
"backend_1_ip": "192.168.1.13",
"backend_2_ip": "192.168.2.13",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.46",
"name": "Nodemgmt3"
}
],
"name": "lab-cluster"
/etc/network/interfaces
auto eth2
iface eth2 inet static
address 172.19.0.44
netmask 255.255.255.0
gateway 172.19.0.2
dns-nameservers 172.19.0.10
/etc/hostname
Nodemgmt
admin
2,930 Posts
Quote from admin on January 22, 2018, 9:04 pmIt looks like /etc/hosts across all nodes is bad. The joining node should append itself to the existing cluster /etc/hosts then broadcast/sync the new file. If there was a temporary network issue, the joining node may not get the current /etc/hosts so ends up creating a file with just itself and broadcast it, if the later works all the cluster will get an /etc/hosts with 1 node. We have seen a case of this before if there was an issue during the joining of the last node like re-doing a deployment after network adjustments (switching cables) or maybe it was just a network glitch. It should not occcur in normal case but we will look into it so this does not occur.
To fix, go to any node and stop the file sync service (else the system will overwrite your changes):
systemctl stop petasan-file-syncedit the /etc/hosts
172.19.0.44 Nodemgmt
172.19.0.45 Nodemgmt2
172.19.0.46 Nodemgmt3
172.19.0.49 Node4sync the hosts file to all nodes:
/opt/petasan/scripts/util/sync_file.py /etc/hostsrestart the sync service on current node
systemctl start petasan-file-syncNote the /opt/petasan/scripts/util/sync_file.py script was added to version 1.5, if you have an earlier version you can get it from:
https://drive.google.com/open?id=1FzvpVOnN96B2VN52o9lTwllrJ1lkyJRD
It looks like /etc/hosts across all nodes is bad. The joining node should append itself to the existing cluster /etc/hosts then broadcast/sync the new file. If there was a temporary network issue, the joining node may not get the current /etc/hosts so ends up creating a file with just itself and broadcast it, if the later works all the cluster will get an /etc/hosts with 1 node. We have seen a case of this before if there was an issue during the joining of the last node like re-doing a deployment after network adjustments (switching cables) or maybe it was just a network glitch. It should not occcur in normal case but we will look into it so this does not occur.
To fix, go to any node and stop the file sync service (else the system will overwrite your changes):
systemctl stop petasan-file-sync
edit the /etc/hosts
172.19.0.44 Nodemgmt
172.19.0.45 Nodemgmt2
172.19.0.46 Nodemgmt3
172.19.0.49 Node4
sync the hosts file to all nodes:
/opt/petasan/scripts/util/sync_file.py /etc/hosts
restart the sync service on current node
systemctl start petasan-file-sync
Note the /opt/petasan/scripts/util/sync_file.py script was added to version 1.5, if you have an earlier version you can get it from:
https://drive.google.com/open?id=1FzvpVOnN96B2VN52o9lTwllrJ1lkyJRD
afrima
17 Posts
Quote from afrima on January 22, 2018, 9:28 pmThanks! I'll give it a try and keep you informed 🙂
FYI, you guys are doing a great job
Thanks! I'll give it a try and keep you informed 🙂
FYI, you guys are doing a great job
admin
2,930 Posts
Quote from admin on January 23, 2018, 6:23 amforgot to add localhost to /etc/hosts, so at top add:
127.0.0.1 localhost
also just edit the file, not recreate it since it is actually a link
ls -l /etc/hosts
/etc/hosts -> /opt/petasan/config/etc/hosts
forgot to add localhost to /etc/hosts, so at top add:
127.0.0.1 localhost
also just edit the file, not recreate it since it is actually a link
ls -l /etc/hosts
/etc/hosts -> /opt/petasan/config/etc/hosts