[ZBX-7881] Host not becoming available after UnavailablePeriod has passed and host is back up Created: 2014 Feb 27 Updated: 2017 May 30 Resolved: 2014 Apr 15 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 2.2.2 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Critical |
Reporter: | Adrian Pinzari | Assignee: | Unassigned |
Resolution: | Cannot Reproduce | Votes: | 0 |
Labels: | agent, snmp, unavailable | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
CentOS release 6.5 (Final) x64 |
Issue Links: |
|
Description |
In our case a host was marked as Unavailable due to SNMP items failing to report data back. However, when the host was restarted and the checks resumed, the host was not being marked as available again and as a result no data was collected until I restarted zabbix_server process. Here is the relevant output from zabbix_server.log: 3778:20140227:011532.839 SNMP agent item "ifInErrors[port4]" on host "XXXX" failed: first network error, wait for 15 seconds The UnavailablePeriod is the default 60 seconds, and it is my impression that the UnavailablePoller was not even attempting to check. |
Comments |
Comment by Adrian Pinzari [ 2014 Feb 27 ] |
I will add this as a follow-up, and this may turn into a RTFM case rather than a bug. In the config file, here is the section in question:
### Option: UnavailableDelay
# How often host is checked for availability during the unavailability period, in seconds.
#
# Mandatory: no
# Range: 1-3600
# Default:
# UnavailableDelay=60
Does this mean that if UnavailableDelay is not explicitly set, then it will not take effect? i.e. I need to have
### Option: UnavailableDelay
# How often host is checked for availability during the unavailability period, in seconds.
#
# Mandatory: no
# Range: 1-3600
# Default:
# UnavailableDelay=60
UnavailableDelay=60
|
Comment by Oleksii Zagorskyi [ 2014 Mar 05 ] |
Could be related to |
Comment by richlv [ 2014 Mar 16 ] |
no, default is set even if the corresponding line is commented out in the config file. what's the busy rate for the unreachable pollers ? how many of them do you have ? |
Comment by Adrian Pinzari [ 2014 Mar 25 ] |
Hello, The issue seems to have repeated again: 1585:20140325:104311.079 SNMP agent item "system.cpu.load" on host "XXX" failed: first network error, wait for 15 seconds My StartPollersUnreachable is set to 1 (default) and here is the data for the business around the time the host was disabled: |
Comment by richlv [ 2014 Mar 25 ] |
for those who see this problem, without restarting server, can you please do strace on unreachable poller and see what is it doing, if anything ? you can see process type (and pid) in server startup messages, as well as in ps/top output since zabbix 2.2 |
Comment by Juris Miščenko (Inactive) [ 2014 Apr 08 ] |
Unfortunately, I couldn't reproduce this. Hosts become available very soon after connectivity is re-established. If there are special some special conditions that you notice in your setups that might influence this, please report them to us. |