[ZBX-22438] Zabbix agent2 is not retrieving IIS performance counters on multiple servers after a certain period of time Created: 2023 Mar 01 Updated: 2023 Dec 22 Resolved: 2023 Dec 22 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | None |
Affects Version/s: | 6.2.6, 6.2.7 |
Fix Version/s: | None |
Type: | Problem report | Priority: | Trivial |
Reporter: | Stijn De Doncker | Assignee: | Unassigned |
Resolution: | Duplicate | Votes: | 3 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Windows Server 2012/2016/2019 |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() |
||||||||
Issue Links: |
|
Description |
Zabbix agent2 is not retrieving IIS performance counters on multiple servers after a certain period of time. We have observed this behavior on Zabbix agent2 v6.2.5, v6.2.6 and v6.2.7 (which contains a fix for OS performance counters ( Behavior 1: On a server where the W3SVC service is disabled at startup and Zabbix agent is started at startup, and W3SVC service is enabled and started later on:
Behavior 2: On a server where the W3SVC service is started at startup and Zabbix agent is started at startup:
The root cause of the problem seems to be that the Zabbix agent is only retrieving IIS perfcounter data from the moment it starts running. As a result, any future data of apppools that were started (behavior 1) or newly added apppools (behavior 2) are not being captured by the Zabbix agent. This issue can only be resolved by restarting the Zabbix agent. An important side effect of this behavior is that the "first network errors" cause data gaps in the collection of other items for that host. These data gaps can be observed in the graph overview of other metrics as well, not only in the IIS perfcounter graphs (e.g. CPU usage of that host). Additionally, these data gaps can also trigger false positives for triggers that alert on "no data".
Steps to Reproduce: We cannot reproduce this issue on an empty test system. It only occurs on a production system with real production load.
Behavior 1:
Behavior 2:
We suspect that this issue is similar to the OS performance counters issue ( We have also noticed a similarity with ZBX-21661. Specifically, we have observed the "Detected performance counter with negative denominator the second time after retry, giving up…" messages in the Zabbix agent log. However, it is worth noting that these log messages sometimes start a couple of hours/days later than when the "first network error" alerts start. We have also seen these log messages even when the IIS performance counters are collected successfully, so there is no direct correlation there. Nonetheless, we believe that this information is relevant to the investigation and may help shed light on the root cause of the issue.
Here's a timeline example for behavior 1:
See screenshot behavior1-screenshot1
See screenshot behavior1-screenshot2
See screenshot behavior1-screenshot3
Here's a timeline example for behavior 2:
See screenshot behavior2-screenshot1
See screenshot behavior2-screenshot2
See screenshot behavior2-screenshot3
|
Comments |
Comment by Gregg Cranshaw [ 2023 Jul 11 ] |
I am also seeing this same issue with Zabbix Agent 2 V6.0.18. |
Comment by Vladislavs Sokurenko [ 2023 Sep 18 ] |
Unreachable error could be due to |