[ZBX-21641] CPU utilization in negative numebrs... Created: 2022 Sep 13 Updated: 2025 Jun 16 Resolved: 2025 Jun 12 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Templates (T) |
Affects Version/s: | 6.2.2 |
Fix Version/s: | None |
Type: | Problem report | Priority: | Trivial |
Reporter: | Peter Danko | Assignee: | Denis Rasikhov |
Resolution: | Cannot Reproduce | Votes: | 3 |
Labels: | linux, preprocessing, snmp, template | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | 9h | ||
Original Estimate: | Not Specified | ||
Environment: |
Zabbix 6.2.2 |
Attachments: |
![]() |
Description |
Steps to reproduce:
Result: Expected:
I have fount the problem. Now is there this Java preprocesing: I tried to use the Abs() function but I will not get a correct result, only it will be always positive. |
Comments |
Comment by Peter Danko [ 2022 Sep 29 ] |
Hi, I have found that the number get from Linux devices is higher while it count all CPUs utilizations together. I thing then will be needed to update the template. Thanks. |
Comment by MArk [ 2024 Feb 23 ] |
It seems that this problem was first reported here. |
Comment by Peter Danko [ 2024 Feb 23 ] |
Hi, that's great that I'm not only one who has this issue. is there any solution how to resolve it? While now I was needed to disable this sensor for the storage while we had many problem messages even the cluster was OK. Thanks. |
Comment by MArk [ 2024 Feb 23 ] |
I haven't experienced this issue with any host, but, as far as I've read, it can be an issue when calculating the idle time from multiple CPUs. If you happen to know the number of CPU cores, you can create items to read each core's idle time. |
Comment by Peter Danko [ 2024 Feb 26 ] |
OK understand what you mean, but I dont know now the core count but even I will know it the Dell EMC Isilon is a storage cluster builded from more nodes and where the system send the data from the complete system as it is a one device and second problem will be that by each extending this system (adding new nodes) will be needed to change the counting code. Isnt there any other possibility how to resolve it? Thanks. |
Comment by Denis Rasikhov [ 2025 Jun 11 ] |
Hello Alfista! Could you please provide more details about the architecture of your environment? Is it multiple nodes combined in some sort of a cluster? Do you monitor each of those nodes via SNMP and the issue reproduces on each one of these? Also please provide the following info: 1. Info about CPU (physical CPUs count, cores, threads, etc.) |
Comment by Peter Danko [ 2025 Jun 11 ] |
Hi Denis, its Dell EMC Isilon, contains from 6 nodes. Each node has 2 Intel Xeon CPU's, with more cores including HT, but doesnt know exact how much. The nodes are A200 and F200. When is monitored as a cluster, then I get these wrong numbers. The templates that I have used was directly downloaded from your site. Sorry I cant give you for now much more informations while I doesnt work anymore there and doesnt have access to it. Thanks. |
Comment by Peter Danko [ 2025 Jun 11 ] |
Hi Denis, if it helps I can send you the used template. I should have it backed up.
|
Comment by Denis Rasikhov [ 2025 Jun 12 ] |
Yes, if you could send it, it would be useful to check, thank you! |
Comment by Denis Rasikhov [ 2025 Jun 12 ] |
So, I've ran multiple CPU stress tests on two different systems with 2-socket server motherboards and 2 physical multicore CPUs on each, comparing CPU metrics gathered via SNMP and Zabbix agent 7.0.13 and also compared these with outputs of top/htop and I was not able to reproduce the issue. In all of these the % of CPU idle time and utilization values were the same, as it should. So I'd say it works correctly on the multi socket/CPU/core/thread systems. For now I have just a couple of ideas what could possibly affect the values: 1. Clock tick rate (CLK_TCK, sysconf(_SC_CLK_TCK)) could be different than 100 used in the majority of Linux systems, which could affect the tick counters and in such case the calculations should be adjusted. But I've checked it across like 5 different distributions and it was the same on all of these. 2. The architecture specifics of Dell EMC Isilon somehow affect the data in /proc/stat, which leads to incorrect values. But in this case the values that would be received by Zabbix agent most likely would be also affected, because Zabbix agent and Net-SNMP agent rely on the same data source for these metrics, the /proc/stat file. Also maybe it was some bug in particular version of Net-SNMP agent that was fixed later, hard to tell. In any case, unfortunately there's not really much we can do without an access to such a system where it can be reproduced. |
Comment by Peter Danko [ 2025 Jun 14 ] |
Hi, here are the used templates. Hope this helps to find the problem. Thanks. |
Comment by Denis Rasikhov [ 2025 Jun 16 ] |
Thank you for sharing these, Alfista! If these are the actual templates you had the original problem with, then these are not official Zabbix templates and of course we can't guarantee these would work correctly. Most likely it's these community templates. |
Comment by Peter Danko [ 2025 Jun 16 ] |
Hi Denis, ok understand even I have downloaded it from your site some years ago. Thanks for your help. |