[ZBX-15036] system.cpu.util[,user] has wrong data on higher thread count systems Created: 2018 Oct 18  Updated: 2024 Apr 10  Resolved: 2019 Feb 18

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Templates (T)
Affects Version/s: 4.0.0
Fix Version/s: 4.0.2rc1, 4.2.0alpha1, 4.2 (plan)

Type: Problem report Priority: Major
Reporter: Stefan Kocis Assignee: Alex Kalimulin
Resolution: Fixed Votes: 0
Labels: cpu
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Zabbix 4.0.0 LTS
KVM/Qemu (qcow2) appliance from 2018-10-17

Client/target machines
CentOS Linux release 7.5.1804

cat /proc/cpuinfo (last "core"/thread).
-------------------------------------------------------------------------------------------
processor : 47
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
stepping : 2
microcode : 0x3d
cpu MHz : 2899.940
cache size : 30720 KB
physical id : 1
siblings : 24
core id : 13
cpu cores : 12
apicid : 59
initial apicid : 59
fpu : yes
fpu_exception : yes
cpuid level : 15
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
bogomips : 5004.49
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
-------------------------------------------------------------------------------------------
processor : 63
vendor_id : GenuineIntel
cpu family : 6
model : 79
model name : Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz
stepping : 1
microcode : 0xb00002e
cpu MHz : 1200.265
cache size : 40960 KB
physical id : 1
siblings : 32
core id : 15
cpu cores : 16
apicid : 63
initial apicid : 63
fpu : yes
fpu_exception : yes
cpuid level : 20
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts flush_l1d
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips : 4205.38
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:


Attachments: PNG File Screenshot from 2018-11-15 11-29-30.png     PNG File image-2018-10-18-17-40-15-580.png     PNG File image-2018-10-18-17-41-07-452.png     PNG File image-2018-10-18-17-46-52-230.png    
Issue Links:
Causes
causes ZBX-15667 Zabbix default database creates host ... Closed
Team: Team A
Sprint: Sprint 45, Sprint 46, Nov 2018
Story Points: 0.5

 Description   

Steps to reproduce:

  1. add a server with more than 40 threads (20cores with hyperthreading)
  2. stress the CPU somewhat.
  3. see system.cpu.util[,user]
  4. compare to real stats on server
  5. see that system.cpu.util[,*] stats do not add up to 100% in total.

I tried 3.0 and 4.0 version of agent with the same result.

Other machines that have 40 or less threads work just fine with both centos 6.1 and 7.5 and zabbix agent 2.2 and 3.0 (latest for each distribution).

Result:
See screenshot...



 Comments   
Comment by Glebs Ivanovskis [ 2018 Oct 22 ]

Duplicates ZBXNEXT-391?

Comment by Stefan Kocis [ 2018 Oct 22 ]

It does not as far as I can see. But maybe somebody more knowledgeable could provide more certain answer.

Edit: on second look it could be connected somehow, but most certainly not the same, since the user load was somewhere between 30% and 50% which would, in the worst case, resulted in ~20% difference and the difference was around 50% and more or less at the level of real user load read from the server as seen on the screenshot.

Duplicates ZBXNEXT-391?

Comment by Stefan Kocis [ 2018 Oct 22 ]

It seems that this could be it:
https://support.zabbix.com/browse/ZBX-10710

Comment by Stefan Kocis [ 2018 Oct 22 ]

Just for the reccord. It seems that it is more related to kernel version than CPU cores/threads.

  • 4.17 new behavior with system.cpu.util[,guest] and system.cpu.util[,guest_nice].
  • 3.10 old behavior without it.
Comment by Andris Zeila [ 2018 Oct 26 ]

It seems that the graphs are missing guest, guest_nice times. It seems you are using virtualization, so that might account for the dips in cpu utilization graphs. Could you try adding them and check?

Comment by Stefan Kocis [ 2018 Oct 26 ]

As I wrote earlier. It is exactly that. Thank you. You can probably close this issue if you do not wish to change templates to prevent this kind of misunderstanding.

Comment by Andris Zeila [ 2018 Oct 29 ]

Yes, the templates must be updated.

Comment by Alex Kalimulin [ 2018 Nov 16 ]

Fixed in:

  • pre-4.0.2rc1 r87003
  • pre-4.2.0alpha1 r87007
Comment by Alexander Vladishev [ 2018 Nov 20 ]

(1) [D] Documentation needs to be updated

vso we have talked to vzhuravlev and it was not clear how can it be added to official templates. It was suggested that we only change xml templates in svn for now so Zabbix Templates/Official Templates is Won't Fix

martins-v This subissue is resolved in (2). WON'T FIX.

Generated at Fri Apr 26 10:51:51 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.