[ZBX-2091] Zabbix server network error, says it will retry in 15 seconds, but 15 seconds never comes Created: 2010 Mar 02  Updated: 2017 May 30  Resolved: 2012 Feb 03

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 1.8.1
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Brent Jones Assignee: Unassigned
Resolution: Duplicate Votes: 5
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Opensolaris snv_133
Sun C compiler


Issue Links:
Duplicate
duplicates ZBX-4232 Unclear log message "first network er... Closed

 Description   

Every once and a while, a host will build up a large number of items in the queue, and investigating the issue I found that there would be a network error for the host in the zabbix_server.log:

3800:20100302:021113.207 Item [prod-app.local:perf_counter[\System\File Write Bytes/sec]] error: Get value from agent failed: Cannot connect to [10.10.0.56:10050] [Interrupted system call]
3800:20100302:021113.208 ZABBIX Host [prod-app.local]: first network error, wait for 15 seconds

That will be the only entry for the server, with high error logging enabled. It says it will retry in 15 seconds, but it never does, and the queue time for all the items just grows.
Using "zabbix_get" manually, I can retrieve data just fine:

  1. /usr/zabbix/bin/zabbix_get -s prod-app.local -k agent.ping
    1
  1. /usr/zabbix/bin/zabbix_get -s prod-app.local -k "perf_counter[\System\File Write Bytes/sec]"
    7928.549888

I have to disable the host, then re-enable, to get the items to work again. After than, it can be days, hours, or weeks before I see the issue again, usually on a different host. The retry doesn't appear to happen.



 Comments   
Comment by richlv [ 2010 Mar 31 ]

in host configuration, does "Z" icon turn red or does it stay green ?

Comment by Brent Jones [ 2010 Apr 09 ]

It stays green from what I've seen so far

Comment by richlv [ 2011 Aug 30 ]

since then several improvements have been made regarding hanging pollers, configuration cache and elsewhere. does this still happen with latest version (1.8.6) ?
if yes, what are cache usage levels (config, history, trends, historytext) and what are poller busy rates ?

Comment by Michal Paal [ 2011 Sep 08 ]

Hi, I have this problem too. I'm using version 1.8.6 release 1.el6 x86_64

My agent is connecting thru zabbix_proxy(1.8.6) to our zabbix_server(1.8.6). This problem is reported by zabbix_proxy, not zabbix_server.

Few of my hosts behind this zabbix_proxy are experiencing this problem.

Comment by richlv [ 2011 Oct 13 ]

michal, is the problem you see about network error or network error and host not being checked after that ever again ?

Comment by Peter Baumann [ 2011 Oct 15 ]

Hi,
I have the same problem, I use 1.9.6 with a server and a proxy. The proxy got this in the logs:
27618:20111015:110553.879 Zabbix host [fw3]: first network error, wait for 15 seconds
27629:20111015:110608.917 Zabbix host [fw3]: another network error, wait for 15 seconds
27629:20111015:110623.921 Zabbix host [fw3]: another network error, wait for 15 seconds
27629:20111015:110638.928 Zabbix host [fw3]: another network error, wait for 15 seconds
27619:20111015:110654.650 Zabbix host [fw3]: first network error, wait for 15 seconds
27629:20111015:110709.228 Zabbix host [fw3]: another network error, wait for 15 seconds
27629:20111015:110724.232 Zabbix host [fw3]: another network error, wait for 15 seconds
27627:20111015:110740.106 Zabbix host [fw3]: first network error, wait for 15 seconds
27629:20111015:110755.353 Zabbix host [fw3]: another network error, wait for 15 seconds

Very strange is that this problem seems to happen ONLY when the agent is running on a freebsd host.
The agent on freebsd (pfSense) is 1.8.5.

Comment by Aleksandrs Saveljevs [ 2011 Oct 17 ]

Now that ZBX-4232 has been merged, Zabbix server's log messages should be more clear. In particular, Zabbix server will now log whenever a connection to the agent is restored. Previously, it did not do that, so it was hard to understand what was going on with those network errors and why "first" comes after "another".

Comment by Alexei Vladishev [ 2012 Feb 03 ]

I believe it can be closed since the issues has been addressed under ZBX-4232.

Generated at Fri Apr 26 21:55:12 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.