-
Problem report
-
Resolution: Duplicate
-
Trivial
-
None
-
6.2.6, 6.2.7
-
None
-
None
-
Windows Server 2012/2016/2019
Zabbix agent2 is not retrieving IIS performance counters on multiple servers after a certain period of time. We have observed this behavior on Zabbix agent2 v6.2.5, v6.2.6 and v6.2.7 (which contains a fix for OS performance counters (ZBX-20356)). We have identified two different behaviors related to this issue.
Behavior 1:
On a server where the W3SVC service is disabled at startup and Zabbix agent is started at startup, and W3SVC service is enabled and started later on:
- IIS perf counters are not working, and no data is sent to Zabbix, no latest data.
- In the proxy logs, we see "perf_counter_en["\APP_POOL_WAS(DefaultAppPool)\Current Application Pool Uptime"]" on host "host.example.com" failed: first network error, wait for 45 seconds" when attempting to get the metric.
- The Zabbix_get command on the proxy times out.
- The Zabbix_agent2 -t command on the host itself fails with message "ZBX_NOTSUPPORTED."
Behavior 2:
On a server where the W3SVC service is started at startup and Zabbix agent is started at startup:
- IIS perf counters are not working, and no data is sent to Zabbix, no latest data.
- In the proxy logs, we see e.g. "perf_counter_en["\APP_POOL_WAS(DefaultAppPool)\Current Application Pool Uptime"]" on host "host.example.com" failed: first network error, wait for 45 seconds" when attempting to get the metric.
- The Zabbix_get command on the proxy times out.
- The Zabbix_agent2 -t command on the host itself works and gets a value.
The root cause of the problem seems to be that the Zabbix agent is only retrieving IIS perfcounter data from the moment it starts running. As a result, any future data of apppools that were started (behavior 1) or newly added apppools (behavior 2) are not being captured by the Zabbix agent. This issue can only be resolved by restarting the Zabbix agent.
An important side effect of this behavior is that the "first network errors" cause data gaps in the collection of other items for that host. These data gaps can be observed in the graph overview of other metrics as well, not only in the IIS perfcounter graphs (e.g. CPU usage of that host). Additionally, these data gaps can also trigger false positives for triggers that alert on "no data".
Steps to Reproduce:
We cannot reproduce this issue on an empty test system. It only occurs on a production system with real production load.
Behavior 1:
- Disable W3SVC service, which will stop the apppool(s).
- Reboot the server.
- When the Zabbix agent2 service is started, wait 10 minutes before enabling/starting W3SVC and apppools.
- After another 10 minutes, "first network errors" start appearing in the Zabbix proxy log.
Behavior 2:
- Make sure Zabbix agent2 is started.
- Add a new IIS apppool and start that apppool.
- Wait 10 minutes, and "first network errors" start appearing in the Zabbix proxy log.
We suspect that this issue is similar to the OS performance counters issue (ZBX-20356), which seems to be fixed since version 6.2.7. We request that you investigate if a similar fix can be applied to resolve this issue with IIS (and all other) performance counters.
We have also noticed a similarity with ZBX-21661. Specifically, we have observed the "Detected performance counter with negative denominator the second time after retry, giving up…" messages in the Zabbix agent log. However, it is worth noting that these log messages sometimes start a couple of hours/days later than when the "first network error" alerts start. We have also seen these log messages even when the IIS performance counters are collected successfully, so there is no direct correlation there. Nonetheless, we believe that this information is relevant to the investigation and may help shed light on the root cause of the issue.
Here's a timeline example for behavior 1:
- W3SVC service is disabled
- Server is rebooted
- At 8:43am, Zabbix Agent 2 is started on the system
See screenshot behavior1-screenshot1
- At 8:45am, the first "first network error" occurs
See screenshot behavior1-screenshot2
- At 8:59am, WAS and W3SVC services are enabled and started
See screenshot behavior1-screenshot3
- At <zabbix_agent_restart>, the first network errors disappear from the proxy log and IIS performance counters start getting retrieved.
Here's a timeline example for behavior 2:
- Server is rebooted
- At 1:14am on Feb 14, Zabbix Agent 2 is started on the system
See screenshot behavior2-screenshot1
- At 8:58am on Feb 24, WAS was started and new apppools are enabled and started
See screenshot behavior2-screenshot2
- At 9:09am on Feb 24, the first "first network error" starts for the newly added apppools
See screenshot behavior2-screenshot3
- At <zabbix_agent_restart>, the first network errors disappear from the proxy log and IIS performance counters start getting retrieved.
- duplicates
-
ZBX-21703 Zabbix Agent2 is no longer retrieving Windows perfmon counters after a period of time
- Reopened