Forums

Home / Forums

You need to log in to create posts and topics. Login · Register

CPU Usage on Nodes

Pages: 1 2

Tell me, please - loading 20-30 percent on all nodes in an empty cluster (not a single pool was created, there are no connected clients, only OSD) - is this normal behavior, or did I break something?

Additional question: It's normal in ceph.log file on mon node:

2020-04-18 05:36:58.806997 mgr.petasan-mon3 (mgr.64131) 51343 : cluster [DBG] pgmap v48231: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail
2020-04-18 05:37:00.807434 mgr.petasan-mon3 (mgr.64131) 51344 : cluster [DBG] pgmap v48232: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail
2020-04-18 05:37:02.807846 mgr.petasan-mon3 (mgr.64131) 51345 : cluster [DBG] pgmap v48233: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail
2020-04-18 05:37:04.808280 mgr.petasan-mon3 (mgr.64131) 51346 : cluster [DBG] pgmap v48234: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail
2020-04-18 05:37:06.808819 mgr.petasan-mon3 (mgr.64131) 51347 : cluster [DBG] pgmap v48235: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail
2020-04-18 05:37:08.809340 mgr.petasan-mon3 (mgr.64131) 51348 : cluster [DBG] pgmap v48236: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail
2020-04-18 05:37:10.809739 mgr.petasan-mon3 (mgr.64131) 51349 : cluster [DBG] pgmap v48237: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail
2020-04-18 05:37:12.810118 mgr.petasan-mon3 (mgr.64131) 51350 : cluster [DBG] pgmap v48238: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail
2020-04-18 05:37:14.810534 mgr.petasan-mon3 (mgr.64131) 51351 : cluster [DBG] pgmap v48239: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail
2020-04-18 05:37:16.810917 mgr.petasan-mon3 (mgr.64131) 51352 : cluster [DBG] pgmap v48240: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail
2020-04-18 05:37:18.811333 mgr.petasan-mon3 (mgr.64131) 51353 : cluster [DBG] pgmap v48241: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail
2020-04-18 05:37:20.811710 mgr.petasan-mon3 (mgr.64131) 51354 : cluster [DBG] pgmap v48242: 0 pgs: ; 0 B data, 7.8 GiB used, 20 TiB / 20 TiB avail

 

ceph.audit.log also grow very fast (now size - 36520615 and fast increase):

2020-04-18 05:45:06.968275 mon.petasan-mon3 (mon.0) 78180 : audit [DBG] from='client.? 10.5.108.13:0/619389959' entity='client.admin' cmd=[{,",f,o,r,m,a,t,",:, ,",j,s,o,n,",,, ,",p,r,e,f,i,x,",:, ,",s,t,a,t,u,s,",}]: dispatch
2020-04-18 05:45:06.992723 mon.petasan-mon3 (mon.0) 78181 : audit [INF] from='client.? 10.5.108.51:0/2981322423' entity='client.admin' cmd=[{"prefix": "config assimilate-conf"}]: dispatch
2020-04-18 05:45:07.021436 mon.petasan-mon3 (mon.0) 78182 : audit [DBG] from='client.? 10.5.108.52:0/2954037331' entity='client.admin' cmd=[{"prefix": "config generate-minimal-conf"}]: dispatch
2020-04-18 05:45:07.034556 mon.petasan-mon2 (mon.2) 59685 : audit [DBG] from='client.? 10.5.108.48:0/1090715648' entity='client.admin' cmd=[{"prefix": "config generate-minimal-conf"}]: dispatch
2020-04-18 05:45:07.051300 mon.petasan-mon3 (mon.0) 78183 : audit [DBG] from='client.? 10.5.108.52:0/3913457327' entity='client.admin' cmd=[{,",f,o,r,m,a,t,",:, ,",j,s,o,n,",,, ,",p,r,e,f,i,x,",:, ,",s,t,a,t,u,s,",}]: dispatch
2020-04-18 05:45:07.068457 mon.petasan-mon2 (mon.2) 59686 : audit [DBG] from='client.? 10.5.108.48:0/3508208063' entity='client.admin' cmd=[{,",f,o,r,m,a,t,",:, ,",j,s,o,n,",,, ,",p,r,e,f,i,x,",:, ,",s,t,a,t,u,s,",}]: dispatch
2020-04-18 05:45:07.084533 mon.petasan-mon3 (mon.0) 78184 : audit [INF] from='client.? ' entity='client.admin' cmd=[{"prefix": "config assimilate-conf"}]: dispatch
2020-04-18 05:45:07.086991 mon.petasan-mon1 (mon.1) 52080 : audit [INF] from='client.? 10.5.108.47:0/2084591108' entity='client.admin' cmd=[{"prefix": "config assimilate-conf"}]: dispatch
2020-04-18 05:45:07.087473 mon.petasan-mon3 (mon.0) 78185 : audit [INF] from='client.? ' entity='client.admin' cmd=[{"prefix": "config assimilate-conf"}]: dispatch
2020-04-18 05:45:07.090076 mon.petasan-mon1 (mon.1) 52081 : audit [INF] from='client.? 10.5.108.49:0/2679780626' entity='client.admin' cmd=[{"prefix": "config assimilate-conf"}]: dispatch
2020-04-18 05:45:07.104355 mon.petasan-mon2 (mon.2) 59687 : audit [INF] from='client.? 10.5.108.12:0/338527332' entity='client.admin' cmd=[{"prefix": "config assimilate-conf"}]: dispatch
2020-04-18 05:45:07.104760 mon.petasan-mon3 (mon.0) 78186 : audit [INF] from='client.? ' entity='client.admin' cmd=[{"prefix": "config assimilate-conf"}]: dispatch

No it is not normal.

Are you using real hardware, or a virtual lab setup ? how much RAM/cpu cores ?  even with later it should not be this much.

Can you close all management browsers and see if makes a difference, if so does it depend on which page you are accessing ?

if you run

ceph pg ls-by-pool POOL_NAME

replace POOL_NAME  with name of pool such as rbd

does it take time to complete ?

mon servers - virtual. 4 core 32G ram each.

OSD - real servers. 4 SSD for OSD, 4 core, 32G ram, 2*10G ethernet

After close management browser no difference.

ceph pg ls-by-pool iscsi-ssd - time of execution - less than second.

real    0m0.261s
user    0m0.186s
sys     0m0.005s

As I can see in atop on OSD server  - 5% used by petasan_config_upload and by 2% many ceph threads.

is the cluster health show Ok ?

due you see any errors in /opt/petasan/log/PetaSAN.log

is this a fresh install or was this upgraded ?

 

Yes, health status is OK (in dashboard and in ceph -s)

root@petasan-mon1:~# ceph -s
cluster:
id:     982c2213-6936-4285-a641-56d1ab906e04
health: HEALTH_OK

services:
mon: 3 daemons, quorum petasan-mon3,petasan-mon1,petasan-mon2 (age 7h)
mgr: petasan-mon3(active, since 8h), standbys: petasan-mon2, petasan-mon1
osd: 24 osds: 24 up (since 8h), 24 in (since 8h)

data:
pools:   1 pools, 1024 pgs
objects: 0 objects, 0 B
usage:   7.8 GiB used, 20 TiB / 20 TiB avail
pgs:     1024 active+clean

There are currently no errors in PetaSAN.log

It's a fresh install 2.5.0 upgraded to 2.5.1 (all nodes upgraded, storage nodes upgraded before creating the OSD.

 

The 20-30% is this on all nodes ? or only the vms ?

petasan_config_upload 5% on all nodes ?

in atop what are the the 2% processes: OSDs ?

In current moment 20-30% only on storage (hardware nodes). On vms (monitor) - about 15% (used by ceph-mon)

Petasan_config started only on storage.

atop on storage node:
ATOP - S-26-5-2-3 2020/04/18  15:16:24 - 10  2020/04/18  15:16:34                                      ----                            10
PRC | sys    0.67s  | user   8.10s |  #proc    496 | #trun      1  | #tslpi   535 |  #tslpu     0 | #zombie    0  | clones  1617 |  #exit    294 |
CPU | sys      12%  | user     84% |  irq       0% | idle    304%  | wait      0% |  steal     0% | guest     0%  | curf 3.30GHz |  curscal  94% |
cpu | sys       3%  | user     28% |  irq       0% | idle     69%  | cpu002 w  0% |  steal     0% | guest     0%  | curf 3.30GHz |  curscal  94% |
cpu | sys       3%  | user     21% |  irq       0% | idle     76%  | cpu000 w  0% |  steal     0% | guest     0%  | curf 3.30GHz |  curscal  94% |
cpu | sys       3%  | user     20% |  irq       0% | idle     77%  | cpu001 w  0% |  steal     0% | guest     0%  | curf 3.30GHz |  curscal  94% |
cpu | sys       3%  | user     15% |  irq       0% | idle     82%  | cpu003 w  0% |  steal     0% | guest     0%  | curf 3.30GHz |  curscal  94% |
CPL | avg1    0.92  | avg5    0.97 |  avg15   0.99 |               | csw    45254 |               | intr   23734  |              |  numcpu     4 |
MEM | tot    31.2G  | free   29.9G |  cache 245.5M | buff  123.0M  | slab  107.3M |  shmem   7.5M | vmbal   0.0M  | hptot   0.0M |  hpuse   0.0M |
SWP | tot     0.0M  | free    0.0M |               |               |              |               |               | vmcom   4.4G |  vmlim  15.6G |
DSK |          sda  | busy      0% |  read       0 | write     72  | KiB/r      0 |  KiB/w      5 | MBr/s    0.0  | MBw/s    0.0 |  avio 0.22 ms |
NET | transport     | tcpi    8497 |  tcpo   11006 | udpi      21  | udpo      21 |  tcpao    442 | tcppo      0  | tcprs      0 |  udpie      0 |
NET | network       | ipi     8518 |  ipo     8953 | ipfrw      0  | deliv   8518 |               |               | icmpi      0 |  icmpo      0 |
NET | eth3      0%  | pcki    8683 |  pcko    1645 | sp   10 Gbps  | si 6441 Kbps |  so  868 Kbps | erri       0  | erro       0 |  drpo       0 |
NET | bond0     0%  | pcki   15683 |  pcko   10665 | sp   20 Gbps  | si   11 Mbps |  so 2918 Kbps | erri       0  | erro       0 |  drpo       0 |
NET | bond0.7   0%  | pcki    8310 |  pcko    8711 | sp   20 Gbps  | si   10 Mbps |  so 2801 Kbps | erri       0  | erro       0 |  drpo       0 |
NET | eth2      0%  | pcki    7000 |  pcko    9020 | sp   10 Gbps  | si 4663 Kbps |  so 2050 Kbps | erri       0  | erro       0 |  drpo       0 |
NET | bond0.5   0%  | pcki      10 |  pcko      44 | sp   20 Gbps  | si    0 Kbps |  so   12 Kbps | erri       0  | erro       0 |  drpo       0 |
NET | lo      ----  | pcki     198 |  pcko     198 | sp    0 Mbps  | si  162 Kbps |  so  162 Kbps | erri       0  | erro       0 |  drpo       0 |

    PID   SYSCPU    USRCPU    VGROW     RGROW     RDDSK    WRDSK    RUID EUID ST    EXC    THR    S   CPUNR     CPU    CMD 1/14
   1205    0.22s     0.21s 0K -0.1M    62884K        0K     252K    root       root        --      -      1    S       2      4%    petasan_config
1869914    0.02s     0.19s 0K 0K -M    52084K    39552K        -    root       -           NE      0      0    E       -      2%    <ceph>
1869940    0.02s     0.19s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1870556    0.01s     0.20s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1869401    0.02s     0.18s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1869991    0.00s     0.20s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1870171    0.00s     0.20s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1870222    0.01s     0.19s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1870248    0.02s     0.18s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1870453    0.00s     0.20s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1870530    0.01s     0.19s 0K 05.2M    73776K         - -    root   ceph       -           NE      0      0    E       -      2%    <ceph>
1870787    0.02s     0.18s 0K 05.2M    70936K         - -    root   ceph       -           NE      0      0    E       -      2%    <ceph>
1869375    0.01s     0.18s 0K 05.2M    78816K         - -    root   ceph       -           NE      0      0    E       -      2%    <ceph>
1869452    0.00s     0.19s 0K 04.2M    70352K         - -    root   ceph       -           NE      0      0    E       -      2%    <ceph>
1869529    0.00s     0.19s    4520K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1869555    0.01s     0.18s 0K 2556K    55580K         - - 17164K    root       -           NE      0      0    E       -      2%    <ceph>
1869632    0.00s     0.19s 0K 03.5M    53772K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1869683    0.02s     0.17s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1869709    0.00s     0.19s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1869786    0.02s     0.17s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1869837    0.02s     0.17s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1869863    0.01s     0.18s       0K        0K         -        -    root       -           NE      0      0    E       -      2%    <ceph>
1870017    0.01s     0.18s 0 97028K        0K         -        -    root -     zabbix      NE      0      0    E       -      2%    <ceph>

 

Hm. I rebooted one of the storage servers and the CPU load on it drops to 0% (no more petasan_config_upload in process list). As for me - this is not a good situation, I do not like "strange underground knocks" ...

P.S. I rebooted another node. The same situation. I should reboot all nodes one by one, or it's situation interesting for you and I should leave one-two nodes unrebooted?

can you show what ceph processes are running at 2% via

ps aux | grep ceph

Pages: 1 2