Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

504 Gateway Time-out on every node, 100% disk usage, I believe on a journal drive

Pages: 1 2

can you do the suggestion of my earlier post to look at the monitors: are they started ? can you start them or they error out ? do they communicate with each other when they start.

note having things in the mon logs : are you sure they are even running ?

Please can you explain "look at the monitors" further?

I have been starting services manually - ceph is up now - but when I try and start smbd I get the following error:

Mar 14 09:47:13 san02 cifs_service.py[2001532]: mkdir: cannot create directory ‘/opt/petasan/config/shared’: Transport endpoint is not connected
Mar 14 09:47:13 san02 cifs_service.py[2001534]: touch: cannot touch '/opt/petasan/config/shared/ctdb/nodes': Transport endpoint is not connected
Mar 14 09:47:13 san02 cifs_service.py[2001536]: touch: cannot touch '/opt/petasan/config/shared/ctdb/public_addresses': Transport endpoint is not connected

when I try and cd to /opt/petasan/config/shared I get the same:

shared: Transport endpoint is not connected

Which of these services do I need to get started?

[ - ]  apache-htcacheclean
[ + ]  apache2
[ + ]  atop
[ + ]  atopacct
[ + ]  carbon-cache
[ + ]  ceph
[ + ]  collectd
[ + ]  collectl
[ - ]  console-setup.sh
[ + ]  cron
[ - ]  ctdb
[ + ]  dbus
[ - ]  fio
[ + ]  grafana-server
[ + ]  grub-common
[ - ]  hwclock.sh
[ - ]  iscsid
[ - ]  keyboard-setup.sh
[ + ]  kmod
[ - ]  lvm2
[ - ]  lvm2-lvmpolld
[ + ]  multipath-tools
[ + ]  networking
[ - ]  nfs-common
[ + ]  nginx
[ - ]  nmbd
[ - ]  ntp
[ - ]  open-iscsi
[ + ]  opensm
[ + ]  procps
[ + ]  radosgw
[ + ]  rpcbind
[ + ]  rsyslog
[ - ]  samba-ad-dc
[ - ]  smartmontools
[ - ]  smbd
[ + ]  ssh
[ - ]  sysstat
[ + ]  udev
[ - ]  uuidd
[ - ]  winbind
[ + ]  zabbix-agent

Many thanks for all your assistance!

Please can you explain "look at the monitors" further?

basically what i posted in my prev replies

I have been starting services manually - ceph is up now

is it healthy ? what is the output of
ceph status
Earlier you had error:

root@san01:~# ceph status
2023-03-01T17:09:02.922+0000 7f87455c3700 0 monclient(hunting): authenticate timed out after 300

how did you fix it ?

Ah, I have not fixed anything then!

The service says 'active' - service ceph status shows:

● ceph.service - LSB: Start Ceph distributed file system daemons at boot time
Loaded: loaded (/etc/init.d/ceph; generated)
Active: active (exited) since Mon 2023-03-06 20:16:08 GMT; 1 weeks 0 days ago
Docs: man:systemd-sysv-generator(8)
Process: 4164194 ExecStart=/etc/init.d/ceph start (code=exited, status=0/SUCCESS)

Mar 06 20:16:08 san03 systemd[1]: Starting LSB: Start Ceph distributed file system daemons at boot time...
Mar 06 20:16:08 san03 systemd[1]: Started LSB: Start Ceph distributed file system daemons at boot time.

however ceph status still shows:

2023-03-14T14:19:22.108+0000 7fc2cdb5c700  0 monclient(hunting): authenticate timed out after 300

So I guess ceph is not up!

What is the command to start the ceph monitor?

The /var/log/ceph/ceph-mon.HOSTNAME.log files still do not show anything after the day the cluster was created in early January - the cluster ran well for over a month after that.

Many thanks!

Pages: 1 2