-
Problem report
-
Resolution: Cannot Reproduce
-
Trivial
-
None
-
4.0.16
-
None
-
None
-
Debian
I'm investigating regular gaps (5...10 min. duration) in all graphs for a particular VM (passive agent, standard "Template OS Linux" template plus few custom checks).
For example, "CPU iowait time" receives data for a moment then there is nothing for about 5 minutes, then some data again, then another 5...10 min. gap and so on.
Here is what I've found in Zabbix server log, repeatedly logged:
```
Zabbix agent item "net.tcp.service[tcp,b2btest.internal,8880]" on host "web31.vm" failed: first network error, wait for 15 seconds
resuming Zabbix agent checks on host "web31.vm": connection restored
```
Interval for the check is 300s and the problem appears to be because firewall is dropping connections:
```
$ time telnet b2btest.internal 8880
Trying xx.xx.xxx.xx...
telnet: Unable to connect to remote host: Connection timed out
real 2m11.103s
user 0m0.003s
sys 0m0.000s
```
The problem is that timeout in "net.tcp.service" makes Zabbix agent unresponsive which affects all other checks. Problem is exacerbated when there are more than one timeouting "net.tcp.service" check.
I've isolated the problem by disabling the problematic "net.tcp.service" item which instantly restored graphs to normal.