[ZBX-20260] system.cpu.util unable to get usage data of cores on second NUMA node Created: 2021 Nov 23  Updated: 2024 Dec 14  Resolved: 2022 Aug 16

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 5.4.7
Fix Version/s: 6.0.8rc1, 6.2.2rc1, 6.4.0alpha1, 6.4 (plan)

Type: Problem report Priority: Trivial
Reporter: Aleksejs Cankovs Assignee: Mihails Prihodko
Resolution: Fixed Votes: 2
Labels: agent, items, lld
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Microsoft Windows Server 2012 R2
HPE DL380
2 CPUs/2 Sockets (E5-2643 v4)


Attachments: Text File Processor2016.txt     Text File ProcessorInformation2016.txt     Text File coreinfo.txt     Zip Archive counters_at_qemu.zip     PNG File image-2021-11-23-14-50-03-336.png     Text File lld_discovery_result.txt     Text File typeperf2016.txt     Text File typeperf_processor_information.txt     Zip Archive zabbix_agent-6.0.5-windows-amd64-2c64a901cf8_v2.zip     Zip Archive zabbix_agent-6.0.5-windows-amd64-openssl-2c64a901cf8_v2.zip     Zip Archive zabbix_agent-6.0.6rc1-windows-amd64-openssl_2945440bba8_fix.zip     Text File zabbix_agent.log    
Team: Team B
Sprint: Sprint 84 (Jan 2022), Sprint 85 (Feb 2022), Sprint 86 (Mar 2022), Sprint 87 (Apr 2022), Sprint 88 (May 2022), Sprint 89 (Jun 2022), Sprint 90 (Jul 2022), Sprint 91 (Aug 2022)
Story Points: 1

 Description   

Steps to reproduce:

  1. Deploy agent on Windows with 2 socket system
  2. Use system.cpu.discovery to discover CPUs
  3. Use discovery to create items with key system.cpu.util{#CPU.NUMBER}

Result:

CPU cores from the second NUMA node produce an error: Performance counter is not ready.

Expected:
Data for every core is properly collected



 Comments   
Comment by Mihails Prihodko [ 2022 Jun 03 ]

[email protected], could you please provide the following information? This bug seems to be a rare one and I hope your input might help us to find the root cause.

1) What is the service pack version of the Microsoft Windows Server 2012 R2?

2) Please provide us the output of the following commands:

  1. typeperf -qx "Processor Information"
    (It looks this one is already provided.)
  2. typeperf -qx "Processor"
  3. typeperf "\Processor(_Total)% Processor Time" "\Processor Information(_Total)% Processor Time" "\Processor Information(0,_Total)% Processor Time" "\Processor Information(1,_Total)% Processor Time"
    The last command should be run for at least 10-20 seconds to collect some samples

3) If possible, please also send the output for CoreInfo command, available here: https://docs.microsoft.com/en-us/sysinternals/downloads/coreinfo

4) Please run the debug version of the agent with additional debug outputs in it, which is available zabbix_agent-6.0.5-windows-amd64-2c64a901cf8_v2.zip. Debug level should be set in the Zabbix agent config to DebugLevel=4 - for debugging before starting the agent. It might also be helpful remove the log before start.

5) Have you tried using Zabbix agent 2? Is the reported issue observed with agent 2?

 

UPD 1:

The same issue on forum:

https://www.zabbix.com/forum/zabbix-troubleshooting-and-problems/416705-zabbix-agent-on-windows-processor-group-limits-view-on-cpu-usage-and-number-of-cpu-s

Comment by Aleksejs Cankovs [ 2022 Jun 06 ]

Hello,

The server on which we initially observed the problem was already decommissioned. But I found one, where the problem persists and it is windows 2016. Indeed, not all servers are affected, I collected some info, I will attach output of typeperf commands and coreinfo, but for debugging I need client with TLS support. And no, we did not try agent2 on Windows systems yet. 

<mprihodko>Please find the version with TLS (OpenSSL) support zabbix_agent-6.0.5-windows-amd64-openssl-2c64a901cf8_v2.zip

Comment by Aleksejs Cankovs [ 2022 Jun 06 ]

zabbix_agent.log

Interesting, server has counters only for 16 logical processors under "\Processor\". However, under "Processor Information" there are counters for both sockets. 

<mprihodko> Yes, indeed. This is probably the direction for the further investigation.

When the number of logical processors is less or equal to 64, Zabbix agent 1 relies on the "old" "Processor" counter, and not on the "new" "Processor Information" counter. It looks to be the cause of this bug.

The counters are initialized in init_cpu_collector() function around here https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/src/zabbix_agent/cpustat.c?at=refs%2Fheads%2Ffeature%2FZBX-20985-6.0#125

Comment by Mihails Prihodko [ 2022 Jun 07 ]

This problem seems to be similar to the one we have: https://stackoverflow.com/questions/28098082/unable-to-use-more-than-one-processor-group-for-my-threads-in-a-c-sharp-app

Comment by Mihails Prihodko [ 2022 Jun 08 ]

A possible fix is available for testing zabbix_agent-6.0.6rc1-windows-amd64-openssl_2945440bba8_fix.zip We do not have such setup at Zabbix, and we cannot test it ourselves.

[email protected], could you please test it when you have time?

Everybody with the same problem is also welcome to test it.

Comment by Aleksejs Cankovs [ 2022 Jun 08 ]

I just tested it on two identical servers, it seems to work. At least I'm getting some values from the agent.

Comment by Mihails Prihodko [ 2022 Aug 16 ]

Available in:

Generated at Sun Apr 27 06:06:51 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.