Forums - PetaSAN

ForumGeneral DiscussionPhysical disk list not shown
You need to log in to create posts and topics. Login · Register
Physical disk list not shown

afrima
17 Posts

January 22, 2018, 3:55 pm
Quote from afrima on January 22, 2018, 3:55 pm
Hey guys,

we are having a problem with the disk list showing up. after removing a node, removing its OSDs and adding it and joining it to the same cluster, it recognizes the right number of disks (OSDs) but it doesn't show them in Manage Node > Node List> Physical Disk List.

It only shows the OSDs of the rejoined machine.

Hey guys,

we are having a problem with the disk list showing up. after removing a node, removing its OSDs and adding it and joining it to the same cluster, it recognizes the right number of disks (OSDs) but it doesn't show them in Manage Node > Node List> Physical Disk List.

It only shows the OSDs of the rejoined machine.

#1

admin
2,930 Posts

January 22, 2018, 5:19 pm
Quote from admin on January 22, 2018, 5:19 pm
Hi,

Can you please give more detail as it is not clear to me. What do you mean "disk list showing up" : is it blank ? "It only shows the OSDs of the rejoined machine" : show them where ?

also can you please also include the output of :

ceph osd tree --cluster CLUSTER_NAME

ceph-disk list

/opt/petasan/scripts/detect-disks.sh

Hi,

Can you please give more detail as it is not clear to me. What do you mean "disk list showing up" : is it blank ? "It only shows the OSDs of the rejoined machine" : show them where ?

also can you please also include the output of :

ceph osd tree --cluster CLUSTER_NAME

ceph-disk list

/opt/petasan/scripts/detect-disks.sh

Last edited on January 22, 2018, 5:19 pm by admin · #2

afrima
17 Posts

January 22, 2018, 6:17 pm
Quote from afrima on January 22, 2018, 6:17 pm
in the web interface, in the following path : Manage Node > Node List> Physical Disk List, when you want to check the physical disk list (disks) of management nodes, it keep hanging. the "physical Disk List" option only works for the rejoined node not for previous existing nodes. there is no error in ceph health and the entire cluster is working just fine. the only problem is it doesn't show the disks in the web interface as mentioned above.

ceph osd tree --cluster lab-cluster
ID WEIGHT TYPE NAME          UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 3.45276 root default
-2 0.86319     host Nodemgmt3
8 0.43159         osd.8           up 1.00000          1.00000
9 0.43159         osd.9           up 1.00000          1.00000
-3 0.86319     host Nodemgmt
2 0.43159         osd.2           up 1.00000          1.00000
3 0.43159         osd.3           up 1.00000          1.00000
-4 0.86319     host Nodemgmt2
4 0.43159         osd.4           up 1.00000          1.00000
5 0.43159         osd.5           up 1.00000          1.00000
-5 0.86319     host Node4
0 0.43159         osd.0           up 1.00000          1.00000
1 0.43159         osd.1           up 1.00000          1.00000

ceph-disk list
/dev/sda :
/dev/sda2 other, ext4, mounted on /
/dev/sda1 other, ext4, mounted on /boot
/dev/sda4 other, ext4, mounted on /opt/petasan/config
/dev/sda3 other, ext4, mounted on /var/lib/ceph
/dev/sdb :
/dev/sdb2 ceph journal, for /dev/sdb1
/dev/sdb1 ceph data, active, cluster lab-cluster, osd.2, journal /dev/sdb2
/dev/sdc :
/dev/sdc2 ceph journal, for /dev/sdc1
/dev/sdc1 ceph data, active, cluster lab-cluster, osd.3, journal /dev/sdc2

/opt/petasan/scripts/detect-disks.sh
device=sda,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G4480QGN
device=sdb,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL6526013J480QGN
device=sdc,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G6480QGN

in the web interface, in the following path : Manage Node > Node List> Physical Disk List, when you want to check the physical disk list (disks) of management nodes, it keep hanging. the "physical Disk List" option only works for the rejoined node not for previous existing nodes. there is no error in ceph health and the entire cluster is working just fine. the only problem is it doesn't show the disks in the web interface as mentioned above.

ceph osd tree --cluster lab-cluster
ID WEIGHT TYPE NAME          UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 3.45276 root default
-2 0.86319     host Nodemgmt3
8 0.43159         osd.8           up 1.00000          1.00000
9 0.43159         osd.9           up 1.00000          1.00000
-3 0.86319     host Nodemgmt
2 0.43159         osd.2           up 1.00000          1.00000
3 0.43159         osd.3           up 1.00000          1.00000
-4 0.86319     host Nodemgmt2
4 0.43159         osd.4           up 1.00000          1.00000
5 0.43159         osd.5           up 1.00000          1.00000
-5 0.86319     host Node4
0 0.43159         osd.0           up 1.00000          1.00000
1 0.43159         osd.1           up 1.00000          1.00000

ceph-disk list
/dev/sda :
/dev/sda2 other, ext4, mounted on /
/dev/sda1 other, ext4, mounted on /boot
/dev/sda4 other, ext4, mounted on /opt/petasan/config
/dev/sda3 other, ext4, mounted on /var/lib/ceph
/dev/sdb :
/dev/sdb2 ceph journal, for /dev/sdb1
/dev/sdb1 ceph data, active, cluster lab-cluster, osd.2, journal /dev/sdb2
/dev/sdc :
/dev/sdc2 ceph journal, for /dev/sdc1
/dev/sdc1 ceph data, active, cluster lab-cluster, osd.3, journal /dev/sdc2

/opt/petasan/scripts/detect-disks.sh
device=sda,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G4480QGN
device=sdb,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL6526013J480QGN
device=sdc,size=937703088,bus=SATA,fixed=Yes,ssd=Yes,vendor=,model=INTEL_SSDSC2BB480G4,serial=PHWL652601G6480QGN

#3

admin
2,930 Posts

January 22, 2018, 7:09 pm
Quote from admin on January 22, 2018, 7:09 pm
hmm..i understand what you say now but it is strange. So when you log with the admin web interface on a management node like Nodemgmt you can only open the physical disk list of  Node4 but not for Nodemgmt2, Nodemgmt3 or even Nodemgmt itself ..correct ?

If you ssh on  Nodemgmt can you ping  to Nodemgmt,  Nodemgmt2,   Nodemgmt3 ? can you ssh to them from Nodemgmt ?

Can you please show the output on Nodemgmt of:

/etc/hosts

/opt/petasan/config/cluster_info.json

/etc/network/interfaces

/etc/hostname

Have you manually changed any ips, hostnames or made changes to crush map ?

Cheers..

hmm..i understand what you say now but it is strange. So when you log with the admin web interface on a management node like Nodemgmt you can only open the physical disk list of  Node4 but not for Nodemgmt2, Nodemgmt3 or even Nodemgmt itself ..correct ?

If you ssh on  Nodemgmt can you ping  to Nodemgmt,  Nodemgmt2,   Nodemgmt3 ? can you ssh to them from Nodemgmt ?

Can you please show the output on Nodemgmt of:

/etc/hosts

/opt/petasan/config/cluster_info.json

/etc/network/interfaces

/etc/hostname

Have you manually changed any ips, hostnames or made changes to crush map ?

Cheers..

Last edited on January 22, 2018, 7:17 pm by admin · #4

afrima
17 Posts

January 22, 2018, 8:07 pm
Quote from afrima on January 22, 2018, 8:07 pm
that's correct!

all other nodes are reachable from Nodemgmt.

i can ssh to Nodemgmt2 and Node4 from my Nodemgmt (but not to Nodemgmt3 which is weird!)

i haven't made any changes to the crush map, but for 1 during initial setup i once changed the host name of Nodemgmt once which caused no problem and has been working just fine since then.

here you can find the info you asked:

/etc/hosts

172.19.0.49   Node4

/opt/petasan/config/cluster_info.json

{
"backend_1_base_ip": "192.168.1.0",
"backend_1_eth_name": "eth0",
"backend_1_mask": "255.255.255.0",
"backend_2_base_ip": "192.168.2.0",
"backend_2_eth_name": "eth1",
"backend_2_mask": "255.255.255.0",
"bonds": [],
"eth_count": 4,
"iscsi_1_eth_name": "eth3",
"iscsi_2_eth_name": "eth3",
"jumbo_frames": [],
"management_eth_name": "eth2",
"management_nodes": [
{
"backend_1_ip": "192.168.1.11",
"backend_2_ip": "192.168.2.11",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.44",
"name": "Nodemgmt"
},
{
"backend_1_ip": "192.168.1.12",
"backend_2_ip": "192.168.2.12",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.45",
"name": "Nodemgmt2"
},
{
"backend_1_ip": "192.168.1.13",
"backend_2_ip": "192.168.2.13",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.46",
"name": "Nodemgmt3"
}
],
"name": "lab-cluster"

/etc/network/interfaces

auto eth2
iface eth2 inet static
address 172.19.0.44
netmask 255.255.255.0
gateway 172.19.0.2
dns-nameservers 172.19.0.10

/etc/hostname

Nodemgmt

that's correct!

all other nodes are reachable from Nodemgmt.

i can ssh to Nodemgmt2 and Node4 from my Nodemgmt (but not to Nodemgmt3 which is weird!)

i haven't made any changes to the crush map, but for 1 during initial setup i once changed the host name of Nodemgmt once which caused no problem and has been working just fine since then.

here you can find the info you asked:

/etc/hosts

172.19.0.49   Node4

/opt/petasan/config/cluster_info.json

{
"backend_1_base_ip": "192.168.1.0",
"backend_1_eth_name": "eth0",
"backend_1_mask": "255.255.255.0",
"backend_2_base_ip": "192.168.2.0",
"backend_2_eth_name": "eth1",
"backend_2_mask": "255.255.255.0",
"bonds": [],
"eth_count": 4,
"iscsi_1_eth_name": "eth3",
"iscsi_2_eth_name": "eth3",
"jumbo_frames": [],
"management_eth_name": "eth2",
"management_nodes": [
{
"backend_1_ip": "192.168.1.11",
"backend_2_ip": "192.168.2.11",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.44",
"name": "Nodemgmt"
},
{
"backend_1_ip": "192.168.1.12",
"backend_2_ip": "192.168.2.12",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.45",
"name": "Nodemgmt2"
},
{
"backend_1_ip": "192.168.1.13",
"backend_2_ip": "192.168.2.13",
"is_iscsi": true,
"is_management": true,
"is_storage": true,
"management_ip": "172.19.0.46",
"name": "Nodemgmt3"
}
],
"name": "lab-cluster"

/etc/network/interfaces

auto eth2
iface eth2 inet static
address 172.19.0.44
netmask 255.255.255.0
gateway 172.19.0.2
dns-nameservers 172.19.0.10

/etc/hostname

Nodemgmt

#5

admin
2,930 Posts

January 22, 2018, 9:04 pm
Quote from admin on January 22, 2018, 9:04 pm
It looks like /etc/hosts across all nodes is bad. The joining node should append itself to the existing cluster /etc/hosts then broadcast/sync the new file. If there was a temporary network issue, the joining node may not get the current /etc/hosts so ends up creating a file with just itself and broadcast it, if the later works all the cluster will get an /etc/hosts with 1 node. We have seen a case of this before if there was an issue during the joining of the last node like re-doing a deployment after network adjustments (switching cables) or maybe it was just a network glitch. It should not occcur in normal case but we will look into it so this does not occur.

To fix, go to any node and stop the file sync service (else the system will overwrite your changes):
systemctl stop petasan-file-sync

edit the /etc/hosts
172.19.0.44 Nodemgmt
172.19.0.45 Nodemgmt2
172.19.0.46 Nodemgmt3
172.19.0.49 Node4

sync the hosts file to all nodes:
/opt/petasan/scripts/util/sync_file.py /etc/hosts

restart the sync service on current node
systemctl start petasan-file-sync

Note the /opt/petasan/scripts/util/sync_file.py script was added to version 1.5, if you have an earlier version you can get it from:
https://drive.google.com/open?id=1FzvpVOnN96B2VN52o9lTwllrJ1lkyJRD

It looks like /etc/hosts across all nodes is bad. The joining node should append itself to the existing cluster /etc/hosts then broadcast/sync the new file. If there was a temporary network issue, the joining node may not get the current /etc/hosts so ends up creating a file with just itself and broadcast it, if the later works all the cluster will get an /etc/hosts with 1 node. We have seen a case of this before if there was an issue during the joining of the last node like re-doing a deployment after network adjustments (switching cables) or maybe it was just a network glitch. It should not occcur in normal case but we will look into it so this does not occur.

To fix, go to any node and stop the file sync service (else the system will overwrite your changes):
systemctl stop petasan-file-sync

edit the /etc/hosts
172.19.0.44 Nodemgmt
172.19.0.45 Nodemgmt2
172.19.0.46 Nodemgmt3
172.19.0.49 Node4

sync the hosts file to all nodes:
/opt/petasan/scripts/util/sync_file.py /etc/hosts

restart the sync service on current node
systemctl start petasan-file-sync

Note the /opt/petasan/scripts/util/sync_file.py script was added to version 1.5, if you have an earlier version you can get it from:
https://drive.google.com/open?id=1FzvpVOnN96B2VN52o9lTwllrJ1lkyJRD

Last edited on January 22, 2018, 9:05 pm by admin · #6

afrima
17 Posts

January 22, 2018, 9:28 pm
Quote from afrima on January 22, 2018, 9:28 pm
Thanks! I'll give it a try and keep you informed 🙂

FYI, you guys are doing a great job

Thanks! I'll give it a try and keep you informed 🙂

FYI, you guys are doing a great job

#7

admin
2,930 Posts

January 23, 2018, 6:23 am
Quote from admin on January 23, 2018, 6:23 am
forgot to add localhost to /etc/hosts, so at top add:

127.0.0.1 localhost

also just edit the file, not recreate it since it is actually a link

ls -l /etc/hosts
/etc/hosts -> /opt/petasan/config/etc/hosts

forgot to add localhost to /etc/hosts, so at top add:

127.0.0.1 localhost

also just edit the file, not recreate it since it is actually a link

ls -l /etc/hosts
/etc/hosts -> /opt/petasan/config/etc/hosts

Last edited on January 23, 2018, 6:33 am by admin · #8

Post Reply: Physical disk list not shown

Cancel