Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

Additional system tools for monitoring performance of PetaSAN solution

Hello colleagues,
Please add additional system tools which can be used to monitoring performance of PetaSAN solution and resolve troubleshooting issue.

Sometimes is very tricky to find an application/process that consuming lots of system resources is a bit difficult under top/htop.
Because top/htop command doesn’t have a ability to highlights programs that are eating too much of CPU, RAM, other resources.

1. Please add next tool for SSH session(access from console):
Glances is a cross-platform monitoring tool which aims to present a maximum of information in a minimum of space.
https://github.com/nicolargo/glances
https://github.com/nicolargo/glancesautoinstall

2. Please add next tool for WEB session(access from browser):
NetData is a system for distributed real-time performance and health monitoring.
https://github.com/firehol/netdata
https://github.com/firehol/netdata/wiki/Installation

3. Please add next tool - Check_MK Monitoring Agent (access from Standard Corporate Monitoring System):
The Check_MK Monitoring System is a comprehensive Open-Source-Solution for IT-Monitoring developed around the proven Nagios-core.
http://mathias-kettner.com/check_mk_download.php?HTML=yes
http://mathias-kettner.com/check_mk_download_version.php?HTML=yes&version=1.2.8p20&edition=cre

All these solutions are very cool and I have used it in production environments for some years.
These tools help in troubleshooting.

Thank you for understanding.

 

Thank you for your feedback. we will for sure study them.

The current plan was to add node health stats as an extra page you open from the node list (node by node basis).

Since we already use collectd for data collection and Grafana to display cluster stats, we were thinking for enabling more collectd agents/plugins to monitor node cpu/men/disks/interfaces. We also plan to add SMART health checks for disks.

Do you think it is better to delegate node monitoring to be viewed by an external system or better to be within our UI ? i can see pros and cons for each method.

My Basic idea is use two methods at once :
1. Local node WEB monitoring - NetData
2. External Monitoring System - Check_MK

For detailed dashboard maybe the best use this complex: collectd+telegraf+influxdb+grafana
https://www.influxdata.com/open-source/
https://www.influxdata.com/products/
InfluxData’s products are simple, without external dependencies, yet flexible enough for complex deployments. Get started collecting, storing, visualizing and alerting on time-series data in minutes – not days or weeks.