Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Monitoring.

Pages: 1 2

Hello Admin,

 

I wanted to ask about Monitoring, can PetaSAN monitor the size of the LUNs and report alerts.   -  or can we install an agent like SolorWind or Nagios agent (NRPE) on the server.

 

Thanks

We do have alarm notifications if the raw storage available is low. This is sent by email to all users who register to receive notifications.

We do not have on a per lun basis, typically the %  used should be done by the client OS / filesystem, the filesystem knows exactly what free space is present, the SAN does not know this at the block level.

Some users do install extra monitoring tools, we are neither recommending or against this, just be aware that the upgrade installer does wipe the main partition clean when upgrading, any config stuff you want to keep should be saved in the /opt/petasan/config mount partition which we leave. you can create sym links from there if needed

Hey,

29/11/2018 10:14:24 INFO     Executing parted -s /dev/sdk mklabel gpt
29/11/2018 10:14:24 INFO     Executing partprobe /dev/sdk
29/11/2018 10:14:27 INFO     Starting ceph-disk zap /dev/sdk
29/11/2018 10:14:30 INFO     ceph-disk zap done
29/11/2018 10:14:30 INFO     Auto select journal for disk sdk.
29/11/2018 10:14:31 INFO     User selected Auto journal, selected device is sdi disk for disk sdk.
29/11/2018 10:14:31 INFO     Start prepare osd sdk
29/11/2018 10:14:31 INFO     Starting ceph-disk prepare --zap-disk  --bluestore --block-dev /dev/sdk --block.db /dev/sdi --cluster TME
29/11/2018 10:14:35 ERROR    Error executing ceph-disk prepare --zap-disk  --bluestore --block-dev /dev/sdk --block.db /dev/sdi --cluster TME

 

-- - I keep getting this error when adding a disk, do we have a CLI commands to add the disks.
Thanks

The failing command  you want to run, is shown in the log:

ceph-disk prepare --zap-disk  --bluestore --block-dev /dev/sdk --block.db /dev/sdi --cluster TME

The most likely cause is running out of space on the journal drive, can you check drive /dev/sdi has an empty 60 GB  partition left ? The second likely cause is if you have too little ram.

You can also check the ceph-disk logs at

/opt/petasan/log/ceph-disk.log

 

seems like sdi is still good.

/dev/sdi2              127M  946K  126M   1% /boot/efi
/dev/sdi4              9.8G   23M  9.7G   1% /var/lib/ceph
/dev/sdi5              196G   61M  196G   1% /opt/petasan/config
/dev/sdf1               94M  5.5M   89M   6% /var/lib/ceph/osd/TME-21

 

however - this is the ceph-disk logs.

 

Running command: /sbin/sgdisk --new=4:0:+61440M --change-name=4:ceph block.db --partition-guid=4:c061777c-1511-4948-a5e6-d6fc95dc7e51 --typecode=4:30cd0809-c2b2-499c-8879-2d6b785292be --mbrtogpt -- /dev/sdj
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
Writing zeros to existing partitions on /dev/sdk
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
Zapping partition table on /dev/sdk
Running command: /sbin/sgdisk --zap-all -- /dev/sdk
Running command: /sbin/sgdisk --clear --mbrtogpt -- /dev/sdk
Calling partprobe on zapped device /dev/sdk
Running command: /sbin/udevadm settle --timeout=600
Running command: /usr/bin/flock -s /dev/sdk /sbin/partprobe /dev/sdk
Running command: /sbin/udevadm settle --timeout=600
Running command: /usr/bin/ceph-osd --cluster=TME --show-config-value=fsid
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
Will colocate block with data on /dev/sdk
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup bluestore_block_size
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup bluestore_block_db_size
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup bluestore_block_wal_size
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_mkfs_type
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_fs_type
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_mkfs_options_xfs
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_fs_mkfs_options_xfs
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_mount_options_xfs
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_fs_mount_options_xfs
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
Writing zeros to existing partitions on /dev/sdk
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
Zapping partition table on /dev/sdk
Running command: /sbin/sgdisk --zap-all -- /dev/sdk
Running command: /sbin/sgdisk --clear --mbrtogpt -- /dev/sdk
Calling partprobe on zapped device /dev/sdk
Running command: /sbin/udevadm settle --timeout=600
Running command: /usr/bin/flock -s /dev/sdk /sbin/partprobe /dev/sdk
Running command: /sbin/udevadm settle --timeout=600
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
Creating osd partition on /dev/sdk
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
name = data
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
Creating data partition num 1 size 100 on /dev/sdk
Running command: /sbin/sgdisk --new=1:0:+100M --change-name=1:ceph data --partition-guid=1:44943aa5-3516-4efa-928f-f12ed1e2b686 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdk
Calling partprobe on created device /dev/sdk
Running command: /sbin/udevadm settle --timeout=600
Running command: /usr/bin/flock -s /dev/sdk /sbin/partprobe /dev/sdk
Running command: /sbin/udevadm settle --timeout=600
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
get_dm_uuid /dev/sdk uuid path is /sys/dev/block/8:160/dm/uuid
get_dm_uuid /dev/sdk1 uuid path is /sys/dev/block/8:161/dm/uuid
get_dm_uuid /dev/sdj uuid path is /sys/dev/block/8:144/dm/uuid
OSD will not be hot-swappable if block.db is not the same device as the osd data
get_dm_uuid /dev/sdj uuid path is /sys/dev/block/8:144/dm/uuid
name = block.db
get_dm_uuid /dev/sdj uuid path is /sys/dev/block/8:144/dm/uuid
Running command: /sbin/parted --machine -- /dev/sdj print
get_free_partition_index: analyzing BYT;
/dev/sdj:240GB:scsi:512:4096:gpt:ATA INTEL SSDSC2KB24:;
1:1049kB:64.4GB:64.4GB::ceph block.db:;
2:64.4GB:129GB:64.4GB::ceph block.db:;
3:129GB:193GB:64.4GB::ceph block.db:;

Creating block.db partition num 4 size 61440 on /dev/sdj
Running command: /sbin/sgdisk --new=4:0:+61440M --change-name=4:ceph block.db --partition-guid=4:0c69de1c-ef3e-4886-9004-ffb2e976ec57 --typecode=4:30cd0809-c2b2-499c-8879-2d6b785292be --mbrtogpt -- /dev/sdj
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
Writing zeros to existing partitions on /dev/sdn
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
Zapping partition table on /dev/sdn
Running command: /sbin/sgdisk --zap-all -- /dev/sdn
Running command: /sbin/sgdisk --clear --mbrtogpt -- /dev/sdn
Calling partprobe on zapped device /dev/sdn
Running command: /sbin/udevadm settle --timeout=600
Running command: /usr/bin/flock -s /dev/sdn /sbin/partprobe /dev/sdn
Running command: /sbin/udevadm settle --timeout=600
Running command: /usr/bin/ceph-osd --cluster=TME --show-config-value=fsid
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
Will colocate block with data on /dev/sdn
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup bluestore_block_size
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup bluestore_block_db_size
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup bluestore_block_wal_size
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_mkfs_type
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_fs_type
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_mkfs_options_xfs
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_fs_mkfs_options_xfs
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_mount_options_xfs
Running command: /usr/bin/ceph-conf --cluster=TME --name=osd. --lookup osd_fs_mount_options_xfs
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
Writing zeros to existing partitions on /dev/sdn
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
Zapping partition table on /dev/sdn
Running command: /sbin/sgdisk --zap-all -- /dev/sdn
Running command: /sbin/sgdisk --clear --mbrtogpt -- /dev/sdn
Calling partprobe on zapped device /dev/sdn
Running command: /sbin/udevadm settle --timeout=600
Running command: /usr/bin/flock -s /dev/sdn /sbin/partprobe /dev/sdn
Running command: /sbin/udevadm settle --timeout=600
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
Creating osd partition on /dev/sdn
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
name = data
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
Creating data partition num 1 size 100 on /dev/sdn
Running command: /sbin/sgdisk --new=1:0:+100M --change-name=1:ceph data --partition-guid=1:23e1062f-4b26-40bd-80c6-fc87e2b2fc17 --typecode=1:89c57f98-2fe5-4dc0-89c1-f3ad0ceff2be --mbrtogpt -- /dev/sdn
Calling partprobe on created device /dev/sdn
Running command: /sbin/udevadm settle --timeout=600
Running command: /usr/bin/flock -s /dev/sdn /sbin/partprobe /dev/sdn
Running command: /sbin/udevadm settle --timeout=600
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
get_dm_uuid /dev/sdn uuid path is /sys/dev/block/8:208/dm/uuid
get_dm_uuid /dev/sdn1 uuid path is /sys/dev/block/8:209/dm/uuid
get_dm_uuid /dev/sdj uuid path is /sys/dev/block/8:144/dm/uuid
OSD will not be hot-swappable if block.db is not the same device as the osd data
get_dm_uuid /dev/sdj uuid path is /sys/dev/block/8:144/dm/uuid
name = block.db
get_dm_uuid /dev/sdj uuid path is /sys/dev/block/8:144/dm/uuid
Running command: /sbin/parted --machine -- /dev/sdj print
get_free_partition_index: analyzing BYT;
/dev/sdj:240GB:scsi:512:4096:gpt:ATA INTEL SSDSC2KB24:;
1:1049kB:64.4GB:64.4GB::ceph block.db:;
2:64.4GB:129GB:64.4GB::ceph block.db:;
3:129GB:193GB:64.4GB::ceph block.db:;

Creating block.db partition num 4 size 61440 on /dev/sdj
Running command: /sbin/sgdisk --new=4:0:+61440M --change-name=4:ceph block.db --partition-guid=4:6625abec-bdc4-49be-96c6-3479c929d194 --typecode=4:30cd0809-c2b2-499c-8879-2d6b785292be --mbrtogpt -- /dev/sdj

 

 

 

 

One thing to keep in mind,

 

These server were running 2.1, but we did a fresh installation with 2.2, one thing we noticed, every time we reinstall it does not come out to be clean, we always need to wipe out the disks and redo the install.

Maybe there was an older PetaSAN OS boot disk left from a prev install

Can you show the output of

ceph-disk list

cat /etc/fstab
blkid -s UUID -o value /dev/sdi3
blkid -s UUID -o value /dev/sdi4
blkid -s UUID -o value /dev/sdi5
If your boot disk is /dev/sdx rather than sdi
blkid -s UUID -o value /dev/sdx3
blkid -s UUID -o value /dev/sdx4
blkid -s UUID -o value /dev/sdx5

From the physical disk list in the ui:
What disk does PetaSAN show as OS/system disk ? is sdi shown as jounral ?

Hey Admin,

I just wiped out the disks again, now there is an issue we sometime we face during installation that randomly happens. When creating a new cluster. other nodes fail to join,

Error List
Error connecting to first node on backend 1 interface
Error connecting to first node on backend 2 interface

however I see the shell to access the host and I am able to ping all the interfaces on all nodes, using the backend IPs.

 

Anything we are missing here ?

 

Thanks

 

 

We do ourselves wipe the disks that we use: the os disk + any disk you select during node deployment for usage as journal or osds. We do not wipe disks that you did not select. There could be potential for problems if you had a prev system disk that was left unused in the system.

The errors you see are networking issues, this is a ping test we do from the joining nodes on all interfaces to all other management nodes, we do it prior to joining. If it sometimes fails, it is probably intermittent network problem.

We tried pinging from the backend and it works fine .

Something in the script is not pushing out, there is no firewall or anytihng blocking it , same switch same vlan

Anything we can do to bypass this , ?

Pages: 1 2