[ZBX-3547] agent returns wrong value for perf_counter(p, X), X>60 after host restart Created: 2011 Feb 20 Updated: 2017 May 30 Resolved: 2011 May 03 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 1.8.3 |
Fix Version/s: | 1.8.6, 1.9.4 (alpha) |
Type: | Incident report | Priority: | Major |
Reporter: | Alexandru Nica | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Windows 2003 R2 x64 |
Attachments: | bug_4.log bug_5.png compare-load.png compare-util.png outlook_closed.txt zabbix_agentd_5sec_pollig.log |
Description |
It seems that when using an active check of type perf_counter(whatever, X) with a large X value, after a host restart, the agent sometimes gets stuck sending trash information. I have seen this happen 2 times with these counters, monitoring different servers. As far as I can tell it happens only when X is larger than 60 (I only use 30, 60 or 300 in my templates) and gets back to normal after an agent restart. It also seems to happen just after a server restart, so it may be that the windows perfcounter API simply does not have enough data to generate an average for 300sec. |
Comments |
Comment by Alexandru Nica [ 2011 Mar 10 ] |
Thank you, I will test this version and post back. Only had the issues with the exchange template I was building. I have used "normal" perfcounters in a bunch of other templates and have never had problems. |
Comment by Alexandru Nica [ 2011 Mar 24 ] |
I have 4 exchange servers that I have put the test version on. However I can only restart/stop exchange and test them during weekends. Should I be running the agents at DebugLevel=4 to get the messages you mentioned? |
Comment by Alexandru Nica [ 2011 Mar 26 ] |
OK, I have restarted all 4 servers and the issue manifested itself only on one of them. >> 2132:20110326:142112.671 In add_perf_counter() [counter:\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length] [interval:300] This was until 14:21:12 when the MSExchange Transport service was not started yet. >> 2132:20110326:143557.828 For key [perf_counter[\MSExchangeTransport Queues(_total)\Retry Mailbox Delivery Queue Length, 300]] received value [9153761201933518800.000000] And long after the MSExchange Transport service started, tha zabbix agent does another check that returns 9153761201933518800.000000. When this reaches the server it is actually recorded as value 16439680 even though the item is stored as Numeric (float). >> 4232:20110326:150833.892 In add_perf_counter() [counter:\MSExchangeTransport Queues(_total)\Retry Remote Delivery Queue Length] [interval:300] Let me try manually stopping/restarting exchange transport and see what happens and I will post back. |
Comment by Alexandru Nica [ 2011 Mar 26 ] |
These are the steps I followed: >>exchange transport started From what I can tell, zabbix agent does not check if the instance of the perfcounter is actually valid. I noticed this when monitoring per CPU time. Perfmon won't let me monitor an absurd percounter like \\Processor(99)% Idle time because the 99th processor/core on the system does not exist. Yet, zabbix agent happily returns 0 instead of ZBX_NOTSUPPORTED. Hope this helps |
Comment by Alexandru Nica [ 2011 Mar 30 ] |
Thank you very much for fixing it. How was the behaviour changed? Does is return nodata (and disabled the item with ZBX_NOTSUPPORTED) when the instance no longer exists or does it return 0s? |
Comment by Rudolfs Kreicbergs [ 2011 Mar 30 ] |
See |
Comment by Oleksii Zagorskyi [ 2011 Apr 04 ] |
special for Rudolfs. "outlook_closed.txt" without comment - as is. |
Comment by Rudolfs Kreicbergs [ 2011 Apr 08 ] |
Due to the many bugs that would require dirty fixes, performance counter code is being rewritten completely, |
Comment by Rudolfs Kreicbergs [ 2011 Apr 08 ] |
Performance counters and cpu stat collection for Windows has been rewritten in dev branch: |
Comment by Oleksii Zagorskyi [ 2011 Apr 12 ] |
Tested latest binary r18973 Agent debuglog: and 2788:20110412:010833.359 Requested [perf_counter[\Процесс(OUTLOOK)\Прошло времени (сек),300]] Closing, running, closing (OUTLOOK i mean), etc, when agent works - the counter perf_counter[\Процесс(OUTLOOK)\Прошло времени (сек),300] works well. And, at the first second of work, agent independently added something: |
Comment by Alexei Vladishev [ 2011 Apr 12 ] |
Perhaps we should not return NOT SUPPORTED on the first run doing two sequential calls with a sleep(1) in between instead? |
Comment by Oleksii Zagorskyi [ 2011 Apr 12 ] |
latest binary r19006 tested. In the tests with the key "perf_counter[\Процессор(_Total)\% загруженности процессора,5]" i can periodically (every ~ 5 seconds) get 1 second answer delay and next FAST requested value is 0.0000 Different tests (even described on top) with the keys: Heh, I'm tired of these deep tests of one ZBX . |
Comment by Rudolfs Kreicbergs [ 2011 May 03 ] |
Fixed/available in pre-1.8.6 r19321 and pre-1.9.4 r19322 |
Comment by Oleksii Zagorskyi [ 2012 Feb 02 ] |
Note: these changes caused memory leak in agent in some cases. See |