[ZBX-6544] Not resuming host checking after temporary disabling Created: 2013 Apr 29  Updated: 2017 May 30  Resolved: 2013 Aug 08

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 2.0.6
Fix Version/s: None

Type: Incident report Priority: Critical
Reporter: Anton Samets Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 1
Labels: unavailable
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

centos 5.9, x64



 Description   

After upgrading to 2.0.6 we see that something bad with restoring connection after temporary disabling checks. It's seems that this temporary became constant.

Logs:

5729:20130426:112247.920 Zabbix agent item [proc.num[,,run]] on host [eu2-db-201] failed: first network error, wait for 15 seconds
  5732:20130426:112249.594 Zabbix agent item [proc.num[,,run]] on host [eu2-s-55] failed: first network error, wait for 15 seconds
  5730:20130426:112253.870 Zabbix agent item [vfs.dev.read[/dev/sdb1,ops,avg1]] on host [eu2-db-205] failed: first network error, wait for 15 seconds
  5729:20130426:112253.871 Zabbix agent item [vfs.fs.size[/mnt/mysql,pfree]] on host [eu2-db-205] failed: first network error, wait for 15 seconds
  5732:20130426:112257.396 SNMP item [BW_baseapps_count_1_percent] on host [eu2-s-33] failed: first network error, wait for 15 seconds
  5728:20130426:112257.410 SNMP item [MYSQL_status_Opened_tables] on host [eu2-s-33] failed: first network error, wait for 15 seconds
  5729:20130426:112328.316 Zabbix agent item [net.if.total[eth3, bytes]] on host [eu2-s-20] failed: first network error, wait for 15 seconds
  5730:20130426:112328.316 Zabbix agent item [fs.readonly] on host [eu2-s-20] failed: first network error, wait for 15 seconds
  5728:20130426:112328.316 Zabbix agent item [system.cpu.util[,system,avg1]] on host [eu2-s-20] failed: first network error, wait for 15 seconds
  5729:20130426:112330.814 Zabbix agent item [proc.num[,,run]] on host [wowpeu2-st1-4] failed: first network error, wait for 15 seconds
  5731:20130426:112334.967 Zabbix agent item [net.if.total[eth0, bytes]] on host [eu2-st6-5] failed: first network error, wait for 15 seconds
  5729:20130426:112334.968 Zabbix agent item [agent.ping] on host [eu2-st6-5] failed: first network error, wait for 15 seconds
  5730:20130426:112334.968 Zabbix agent item [system.cpu.util[,idle,avg1]] on host [eu2-st6-5] failed: first network error, wait for 15 seconds
  5728:20130426:112335.261 Zabbix agent item [system.cpu.util[,system,avg1]] on host [eu2-gc2012-1] failed: first network error, wait for 15 seconds
  5729:20130426:112351.068 SNMP item [BW_baseapps_count_1_percent] on host [eu2-s-4] failed: first network error, wait for 15 seconds
  5728:20130426:112351.114 SNMP item [MYSQL_status_Innodb_row_lock_time] on host [eu2-s-4] failed: first network error, wait for 15 seconds
  5731:20130426:112400.377 Zabbix agent item [vfs.dev.read[/dev/mapper/vg00-root,ops,avg1]] on host [eu2-db-stagings-1] failed: first network error, wait for 15 seconds
  5730:20130426:112400.378 Zabbix agent item [vfs.dev.read[/dev/sda2,ops,avg1]] on host [eu2-db-stagings-1] failed: first network error, wait for 15 seconds

After restart:

9067:20130426:130605.918 Starting Zabbix Proxy (active) [eu2-mgmt-1]. Zabbix 2.0.6 (revision 35158).
  9067:20130426:130605.918 **** Enabled features ****
  9067:20130426:130605.918 SNMP monitoring:       YES
  9067:20130426:130605.918 IPMI monitoring:       YES
  9067:20130426:130605.918 WEB monitoring:        YES
  9067:20130426:130605.918 ODBC:                   NO
  9067:20130426:130605.918 SSH2 support:          YES
  9067:20130426:130605.918 IPv6 support:           NO
  9067:20130426:130605.918 **************************
  9069:20130426:130606.307 proxy #1 started [configuration syncer #1]
  9070:20130426:130606.307 proxy #2 started [heartbeat sender #1]
  9071:20130426:130606.308 proxy #3 started [data sender #1]
  9078:20130426:130606.312 proxy #10 started [trapper #1]
  9079:20130426:130606.312 proxy #11 started [trapper #2]
  9080:20130426:130606.312 proxy #12 started [trapper #3]
  9081:20130426:130606.313 proxy #13 started [trapper #4]
  9082:20130426:130606.313 proxy #14 started [trapper #5]
  9083:20130426:130606.313 proxy #15 started [icmp pinger #1]
  9084:20130426:130606.314 proxy #16 started [housekeeper #1]
  9084:20130426:130606.314 executing housekeeper
  9085:20130426:130606.314 proxy #17 started [http poller #1]
  9087:20130426:130606.315 proxy #19 started [history syncer #1]
  9088:20130426:130606.315 proxy #20 started [history syncer #2]
  9089:20130426:130606.315 proxy #21 started [history syncer #3]
  9090:20130426:130606.316 proxy #22 started [history syncer #4]
  9091:20130426:130606.316 proxy #23 started [ipmi poller #1]
  9092:20130426:130606.316 proxy #24 started [ipmi poller #2]
  9093:20130426:130606.316 proxy #25 started [ipmi poller #3]
  9067:20130426:130606.316 proxy #0 started [main process]
  9074:20130426:130606.339 proxy #6 started [poller #3]
  9072:20130426:130606.340 proxy #4 started [poller #1]
  9077:20130426:130606.340 proxy #9 started [unreachable poller #1]
  9076:20130426:130606.340 proxy #8 started [poller #5]
  9075:20130426:130606.341 proxy #7 started [poller #4]
  9073:20130426:130606.342 proxy #5 started [poller #2]
  9086:20130426:130606.343 proxy #18 started [discoverer #1]
  9069:20130426:130607.966 Received configuration data from server. Datalen 6755893
  9077:20130426:130611.347 resuming Zabbix agent checks on host [eu2-bt]: connection restored
  9084:20130426:130612.291 housekeeper deleted 670557 records from history (spent 5.975718 seconds)
  9077:20130426:130615.177 resuming Zabbix agent checks on host [eu2-s-96]: connection restored
  9077:20130426:130615.178 resuming Zabbix agent checks on host [eu2-s-47]: connection restored
  9077:20130426:130615.179 resuming Zabbix agent checks on host [eu2-s-90]: connection restored
 9077:20130426:130615.180 resuming Zabbix agent checks on host [eu2-s-57]: connection restored
  9077:20130426:130615.181 resuming Zabbix agent checks on host [eu2-s-72]: connection restored
  9077:20130426:130615.185 resuming Zabbix agent checks on host [eu2-jabber]: connection restored
  9077:20130426:130615.190 resuming Zabbix agent checks on host [eu2-wgniru]: connection restored
  9077:20130426:130615.194 resuming Zabbix agent checks on host [eu2-gc2012-5]: connection restored
  9077:20130426:130615.209 resuming Zabbix agent checks on host [eu2-backyard-ct]: connection restored
  9077:20130426:130615.211 resuming Zabbix agent checks on host [wowpeu2-st1-2]: connection restored
  9077:20130426:130615.212 resuming Zabbix agent checks on host [eu2-s-20]: connection restored
  9077:20130426:130615.218 resuming Zabbix agent checks on host [wowpeu2-st1-4]: connection restored
  9077:20130426:130615.225 resuming Zabbix agent checks on host [eu2-st1-3]: connection restored
  9077:20130426:130615.226 resuming Zabbix agent checks on host [eu2-s-36]: connection restored
  9077:20130426:130615.238 resuming Zabbix agent checks on host [eu2-s-58]: connection restored
  9077:20130426:130615.239 resuming Zabbix agent checks on host [eu2-db-207]: connection restored
  9077:20130426:130615.240 resuming Zabbix agent checks on host [eu2-s-37]: connection restored
  9077:20130426:130615.242 resuming Zabbix agent checks on host [eu2-st1-9]: connection restored
  9077:20130426:130615.246 resuming Zabbix agent checks on host [eu2-wgq-1]: connection restored
  9077:20130426:130621.255 resuming Zabbix agent checks on host [eu2-st2-4]: connection restored
  9077:20130426:130621.256 resuming Zabbix agent checks on host [eu2-db-209]: connection restored
  9077:20130426:130621.258 resuming Zabbix agent checks on host [eu2-blitz-1]: connection restored
  9077:20130426:130621.266 resuming Zabbix agent checks on host [eu2-s-25]: connection restored
  9077:20130426:130621.267 resuming Zabbix agent checks on host [eu2-s-104]: connection restored
  9077:20130426:130621.268 resuming Zabbix agent checks on host [eu2-s-40]: connection restored
  9077:20130426:130621.272 resuming Zabbix agent checks on host [eu2-s-86]: connection restored
  9077:20130426:130621.304 resuming Zabbix agent checks on host [eu2-st3-7]: connection restored
  9077:20130426:130621.306 resuming Zabbix agent checks on host [eu2-db-205]: connection restored
  9077:20130426:130621.323 resuming SNMP checks on host [wowseu2-frm-ru]: connection restored
  9077:20130426:130621.328 resuming Zabbix agent checks on host [wowpeu2-st1-8]: connection restored
  9077:20130426:130621.335 resuming SNMP checks on host [wowpeu2-st3-1]: connection restored
  9077:20130426:130621.440 resuming SNMP checks on host [eu2-knl-ru]: connection restored
  9077:20130426:130621.443 resuming Zabbix agent checks on host [wowpeu2-st2-8]: connection restored
  9077:20130426:130621.444 resuming Zabbix agent checks on host [eu2-s-26]: connection restored
  9077:20130426:130621.446 resuming Zabbix agent checks on host [eu2-gnls-balancer]: connection restored
  9077:20130426:130621.448 resuming Zabbix agent checks on host [eu2-gnls-node1]: connection restored
  9077:20130426:130621.456 resuming Zabbix agent checks on host [eu2-s-4]: connection restored
  9077:20130426:130621.458 resuming Zabbix agent checks on host [eu2-s-61]: connection restored
  9077:20130426:130621.459 resuming Zabbix agent checks on host [eu2-s-94]: connection restored
  9077:20130426:130621.462 resuming Zabbix agent checks on host [eu2-gnls-ptl]: connection restored
  9077:20130426:130621.575 resuming SNMP checks on host [wowpeu2-st1-7]: connection restored
  9077:20130426:130621.584 resuming Zabbix agent checks on host [eu2-s-105]: connection restored
  9077:20130426:130621.619 resuming Zabbix agent checks on host [wowpeu2-st1-6]: connection restored
  9077:20130426:130621.736 resuming SNMP checks on host [eu2-st6-5]: connection restored
  9077:20130426:130621.880 resuming SNMP checks on host [eu2-st4-6]: connection restored
  9077:20130426:130622.109 resuming Zabbix agent checks on host [eu2-knl-eu]: connection restored

and etc...

So main problem is, that host not resuming for checking.
For example, after downgrade to 2.0.5:

 30752:20130429:112039.864 resuming SNMP checks on host [woteu2-s-4]: connection restored
 30752:20130429:112054.144 resuming SNMP checks on host [woteu2-s-33]: connection restored
 30749:20130429:112138.664 SNMP item [BW_resend_max_percent] on host [woteu2-s-33] failed: first network error, wait for 15 seconds
 30748:20130429:112138.764 SNMP item [MYSQL_status_Sort_rows] on host [woteu2-s-33] failed: another network error, wait for 15 seconds
 30752:20130429:112154.610 resuming SNMP checks on host [woteu2-s-33]: connection restored
 30757:20130429:112208.488 cannot send list of active checks to [127.0.0.1]: host [Zabbix server] not found
 30750:20130429:112227.532 SNMP item [BW_cluster_onlinePlayers] on host [woteu2-s-4] failed: first network error, wait for 15 seconds
 30749:20130429:112228.372 SNMP item [MYSQL_status_Qcache_hits] on host [woteu2-s-4] failed: another network error, wait for 15 seconds
 30752:20130429:112250.075 resuming SNMP checks on host [woteu2-s-4]: connection restored
 30749:20130429:112322.889 SNMP item [BW_cluster_onlinePlayers] on host [woteu2-s-33] failed: first network error, wait for 15 seconds
 30752:20130429:112343.378 resuming SNMP checks on host [woteu2-s-33]: connection restored
 30756:20130429:112408.613 cannot send list of active checks to [127.0.0.1]: host [Zabbix server] not found
 30750:20130429:112517.554 SNMP item [BW_cluster_onlinePlayers] on host [woteu2-s-4] failed: first network error, wait for 15 seconds
 30751:20130429:112517.991 SNMP item [MYSQL_disk_rootdir_bytes_read] on host [woteu2-s-4] failed: another network error, wait for 15 seconds
 30748:20130429:112519.138 SNMP item [MYSQL_disk_rootdir_bytes_write] on host [woteu2-s-4] failed: another network error, wait for 15 seconds
 30751:20130429:112533.688 SNMP item [BW_baseapps_count_1_percent] on host [woteu2-s-33] failed: first network error, wait for 15 seconds
 30750:20130429:112534.971 SNMP item [BW_baseapps_count_2_percents] on host [woteu2-s-33] failed: another network error, wait for 15 seconds
 30749:20130429:112534.973 SNMP item [MYSQL_status_Open_tables] on host [woteu2-s-33] failed: another network error, wait for 15 seconds
 30752:20130429:112539.399 resuming SNMP checks on host [woteu2-s-4]: connection restored
 30752:20130429:112549.631 resuming SNMP checks on host [woteu2-s-33]: connection restored

Please, it's very critical for us. Right now we downgrading to 2.0.5 version.



 Comments   
Comment by Oleksii Zagorskyi [ 2013 Apr 29 ]

Initially discussed on forum (in Russian) https://www.zabbix.com/forum/showthread.php?p=131173

Comment by richlv [ 2013 Apr 30 ]

one thing that comes to mind - unreachable poller gets stuck for some reason.
when this happens, you could grab the unreachable poller pid (first number in those log lines, see the startup messages) and strace it. does it do anything ?

Comment by Oleksii Zagorskyi [ 2013 Apr 30 ]

Note that the forum thread started from absolutely another point, so I confused with current issue description ...

Comment by Anton Samets [ 2013 Apr 30 ]

Sorry about topic on forum, it was just my suggestion, why after update zabbix-proxy became work unstable for us.

Unfortunately, we can't update to 2.0.6 in production for emulating this issue, but I try in a few days with testing enviroment.

Comment by Dimitri Bellini [ 2013 May 07 ]

Hi i'm join to this bug, i have found the same problem also with my installation after i have upgrade to 2.0.6.


31250:20130507:142433.835 SNMP item [bbFcPortOperState-[122]] on host [switch2] failed: first network error, wait for 15 seconds
31350:20130507:142436.081 resuming SNMP checks on host [switch1]: connection restored
31250:20130507:142439.882 SNMP item [fcLossofSignal-[40]] on host [switch3] failed: first network error, wait for 15 seconds
31376:20130507:142442.116 resuming SNMP checks on host [switch4]: connection restored

Comment by Dimitri Bellini [ 2013 May 08 ]

very strange... after a zabbix_server restart the problem is disapear.
I'm at disposal for any further check.

Comment by Oleksii Zagorskyi [ 2013 May 23 ]

Another forum thread about complains in 2.0.6 regarding unreachable hosts.
https://www.zabbix.com/forum/showthread.php?t=41043

Comment by Alexei Vladishev [ 2013 Jul 22 ]

Is it a duplicate of ZBX-6801?

Comment by Anton Samets [ 2013 Aug 08 ]

After upgrade to 2.0.7 issue has gone.
You can close this issue.
Thank you.

Comment by richlv [ 2013 Aug 08 ]

closing as per the previous comment

Comment by Jasper [ 2014 Apr 28 ]

We encountered the same problem:

After heavy packetloss and high ping our server disabled the host which is causing troubles but it didn't resume after the problems were resolved:

temporarily disabling Zabbix agent checks on host [KVM01]: host unavailable

When we restarted the zabbix server service it worked again:

enabling Zabbix agent checks on host [KVM01]: host became available

Zabbix version is 2.0.11-1 on Fedora 19

Generated at Sun May 25 08:18:55 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.