Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-6544

Not resuming host checking after temporary disabling

    XMLWordPrintable

Details

    • Incident report
    • Status: Closed
    • Critical
    • Resolution: Cannot Reproduce
    • 2.0.6
    • None
    • Proxy (P), Server (S)
    • centos 5.9, x64

    Description

      After upgrading to 2.0.6 we see that something bad with restoring connection after temporary disabling checks. It's seems that this temporary became constant.

      Logs:

      5729:20130426:112247.920 Zabbix agent item [proc.num[,,run]] on host [eu2-db-201] failed: first network error, wait for 15 seconds
        5732:20130426:112249.594 Zabbix agent item [proc.num[,,run]] on host [eu2-s-55] failed: first network error, wait for 15 seconds
        5730:20130426:112253.870 Zabbix agent item [vfs.dev.read[/dev/sdb1,ops,avg1]] on host [eu2-db-205] failed: first network error, wait for 15 seconds
        5729:20130426:112253.871 Zabbix agent item [vfs.fs.size[/mnt/mysql,pfree]] on host [eu2-db-205] failed: first network error, wait for 15 seconds
        5732:20130426:112257.396 SNMP item [BW_baseapps_count_1_percent] on host [eu2-s-33] failed: first network error, wait for 15 seconds
        5728:20130426:112257.410 SNMP item [MYSQL_status_Opened_tables] on host [eu2-s-33] failed: first network error, wait for 15 seconds
        5729:20130426:112328.316 Zabbix agent item [net.if.total[eth3, bytes]] on host [eu2-s-20] failed: first network error, wait for 15 seconds
        5730:20130426:112328.316 Zabbix agent item [fs.readonly] on host [eu2-s-20] failed: first network error, wait for 15 seconds
        5728:20130426:112328.316 Zabbix agent item [system.cpu.util[,system,avg1]] on host [eu2-s-20] failed: first network error, wait for 15 seconds
        5729:20130426:112330.814 Zabbix agent item [proc.num[,,run]] on host [wowpeu2-st1-4] failed: first network error, wait for 15 seconds
        5731:20130426:112334.967 Zabbix agent item [net.if.total[eth0, bytes]] on host [eu2-st6-5] failed: first network error, wait for 15 seconds
        5729:20130426:112334.968 Zabbix agent item [agent.ping] on host [eu2-st6-5] failed: first network error, wait for 15 seconds
        5730:20130426:112334.968 Zabbix agent item [system.cpu.util[,idle,avg1]] on host [eu2-st6-5] failed: first network error, wait for 15 seconds
        5728:20130426:112335.261 Zabbix agent item [system.cpu.util[,system,avg1]] on host [eu2-gc2012-1] failed: first network error, wait for 15 seconds
        5729:20130426:112351.068 SNMP item [BW_baseapps_count_1_percent] on host [eu2-s-4] failed: first network error, wait for 15 seconds
        5728:20130426:112351.114 SNMP item [MYSQL_status_Innodb_row_lock_time] on host [eu2-s-4] failed: first network error, wait for 15 seconds
        5731:20130426:112400.377 Zabbix agent item [vfs.dev.read[/dev/mapper/vg00-root,ops,avg1]] on host [eu2-db-stagings-1] failed: first network error, wait for 15 seconds
        5730:20130426:112400.378 Zabbix agent item [vfs.dev.read[/dev/sda2,ops,avg1]] on host [eu2-db-stagings-1] failed: first network error, wait for 15 seconds
      

      After restart:

      9067:20130426:130605.918 Starting Zabbix Proxy (active) [eu2-mgmt-1]. Zabbix 2.0.6 (revision 35158).
        9067:20130426:130605.918 **** Enabled features ****
        9067:20130426:130605.918 SNMP monitoring:       YES
        9067:20130426:130605.918 IPMI monitoring:       YES
        9067:20130426:130605.918 WEB monitoring:        YES
        9067:20130426:130605.918 ODBC:                   NO
        9067:20130426:130605.918 SSH2 support:          YES
        9067:20130426:130605.918 IPv6 support:           NO
        9067:20130426:130605.918 **************************
        9069:20130426:130606.307 proxy #1 started [configuration syncer #1]
        9070:20130426:130606.307 proxy #2 started [heartbeat sender #1]
        9071:20130426:130606.308 proxy #3 started [data sender #1]
        9078:20130426:130606.312 proxy #10 started [trapper #1]
        9079:20130426:130606.312 proxy #11 started [trapper #2]
        9080:20130426:130606.312 proxy #12 started [trapper #3]
        9081:20130426:130606.313 proxy #13 started [trapper #4]
        9082:20130426:130606.313 proxy #14 started [trapper #5]
        9083:20130426:130606.313 proxy #15 started [icmp pinger #1]
        9084:20130426:130606.314 proxy #16 started [housekeeper #1]
        9084:20130426:130606.314 executing housekeeper
        9085:20130426:130606.314 proxy #17 started [http poller #1]
        9087:20130426:130606.315 proxy #19 started [history syncer #1]
        9088:20130426:130606.315 proxy #20 started [history syncer #2]
        9089:20130426:130606.315 proxy #21 started [history syncer #3]
        9090:20130426:130606.316 proxy #22 started [history syncer #4]
        9091:20130426:130606.316 proxy #23 started [ipmi poller #1]
        9092:20130426:130606.316 proxy #24 started [ipmi poller #2]
        9093:20130426:130606.316 proxy #25 started [ipmi poller #3]
        9067:20130426:130606.316 proxy #0 started [main process]
        9074:20130426:130606.339 proxy #6 started [poller #3]
        9072:20130426:130606.340 proxy #4 started [poller #1]
        9077:20130426:130606.340 proxy #9 started [unreachable poller #1]
        9076:20130426:130606.340 proxy #8 started [poller #5]
        9075:20130426:130606.341 proxy #7 started [poller #4]
        9073:20130426:130606.342 proxy #5 started [poller #2]
        9086:20130426:130606.343 proxy #18 started [discoverer #1]
        9069:20130426:130607.966 Received configuration data from server. Datalen 6755893
        9077:20130426:130611.347 resuming Zabbix agent checks on host [eu2-bt]: connection restored
        9084:20130426:130612.291 housekeeper deleted 670557 records from history (spent 5.975718 seconds)
        9077:20130426:130615.177 resuming Zabbix agent checks on host [eu2-s-96]: connection restored
        9077:20130426:130615.178 resuming Zabbix agent checks on host [eu2-s-47]: connection restored
        9077:20130426:130615.179 resuming Zabbix agent checks on host [eu2-s-90]: connection restored
       9077:20130426:130615.180 resuming Zabbix agent checks on host [eu2-s-57]: connection restored
        9077:20130426:130615.181 resuming Zabbix agent checks on host [eu2-s-72]: connection restored
        9077:20130426:130615.185 resuming Zabbix agent checks on host [eu2-jabber]: connection restored
        9077:20130426:130615.190 resuming Zabbix agent checks on host [eu2-wgniru]: connection restored
        9077:20130426:130615.194 resuming Zabbix agent checks on host [eu2-gc2012-5]: connection restored
        9077:20130426:130615.209 resuming Zabbix agent checks on host [eu2-backyard-ct]: connection restored
        9077:20130426:130615.211 resuming Zabbix agent checks on host [wowpeu2-st1-2]: connection restored
        9077:20130426:130615.212 resuming Zabbix agent checks on host [eu2-s-20]: connection restored
        9077:20130426:130615.218 resuming Zabbix agent checks on host [wowpeu2-st1-4]: connection restored
        9077:20130426:130615.225 resuming Zabbix agent checks on host [eu2-st1-3]: connection restored
        9077:20130426:130615.226 resuming Zabbix agent checks on host [eu2-s-36]: connection restored
        9077:20130426:130615.238 resuming Zabbix agent checks on host [eu2-s-58]: connection restored
        9077:20130426:130615.239 resuming Zabbix agent checks on host [eu2-db-207]: connection restored
        9077:20130426:130615.240 resuming Zabbix agent checks on host [eu2-s-37]: connection restored
        9077:20130426:130615.242 resuming Zabbix agent checks on host [eu2-st1-9]: connection restored
        9077:20130426:130615.246 resuming Zabbix agent checks on host [eu2-wgq-1]: connection restored
        9077:20130426:130621.255 resuming Zabbix agent checks on host [eu2-st2-4]: connection restored
        9077:20130426:130621.256 resuming Zabbix agent checks on host [eu2-db-209]: connection restored
        9077:20130426:130621.258 resuming Zabbix agent checks on host [eu2-blitz-1]: connection restored
        9077:20130426:130621.266 resuming Zabbix agent checks on host [eu2-s-25]: connection restored
        9077:20130426:130621.267 resuming Zabbix agent checks on host [eu2-s-104]: connection restored
        9077:20130426:130621.268 resuming Zabbix agent checks on host [eu2-s-40]: connection restored
        9077:20130426:130621.272 resuming Zabbix agent checks on host [eu2-s-86]: connection restored
        9077:20130426:130621.304 resuming Zabbix agent checks on host [eu2-st3-7]: connection restored
        9077:20130426:130621.306 resuming Zabbix agent checks on host [eu2-db-205]: connection restored
        9077:20130426:130621.323 resuming SNMP checks on host [wowseu2-frm-ru]: connection restored
        9077:20130426:130621.328 resuming Zabbix agent checks on host [wowpeu2-st1-8]: connection restored
        9077:20130426:130621.335 resuming SNMP checks on host [wowpeu2-st3-1]: connection restored
        9077:20130426:130621.440 resuming SNMP checks on host [eu2-knl-ru]: connection restored
        9077:20130426:130621.443 resuming Zabbix agent checks on host [wowpeu2-st2-8]: connection restored
        9077:20130426:130621.444 resuming Zabbix agent checks on host [eu2-s-26]: connection restored
        9077:20130426:130621.446 resuming Zabbix agent checks on host [eu2-gnls-balancer]: connection restored
        9077:20130426:130621.448 resuming Zabbix agent checks on host [eu2-gnls-node1]: connection restored
        9077:20130426:130621.456 resuming Zabbix agent checks on host [eu2-s-4]: connection restored
        9077:20130426:130621.458 resuming Zabbix agent checks on host [eu2-s-61]: connection restored
        9077:20130426:130621.459 resuming Zabbix agent checks on host [eu2-s-94]: connection restored
        9077:20130426:130621.462 resuming Zabbix agent checks on host [eu2-gnls-ptl]: connection restored
        9077:20130426:130621.575 resuming SNMP checks on host [wowpeu2-st1-7]: connection restored
        9077:20130426:130621.584 resuming Zabbix agent checks on host [eu2-s-105]: connection restored
        9077:20130426:130621.619 resuming Zabbix agent checks on host [wowpeu2-st1-6]: connection restored
        9077:20130426:130621.736 resuming SNMP checks on host [eu2-st6-5]: connection restored
        9077:20130426:130621.880 resuming SNMP checks on host [eu2-st4-6]: connection restored
        9077:20130426:130622.109 resuming Zabbix agent checks on host [eu2-knl-eu]: connection restored
      

      and etc...

      So main problem is, that host not resuming for checking.
      For example, after downgrade to 2.0.5:

       30752:20130429:112039.864 resuming SNMP checks on host [woteu2-s-4]: connection restored
       30752:20130429:112054.144 resuming SNMP checks on host [woteu2-s-33]: connection restored
       30749:20130429:112138.664 SNMP item [BW_resend_max_percent] on host [woteu2-s-33] failed: first network error, wait for 15 seconds
       30748:20130429:112138.764 SNMP item [MYSQL_status_Sort_rows] on host [woteu2-s-33] failed: another network error, wait for 15 seconds
       30752:20130429:112154.610 resuming SNMP checks on host [woteu2-s-33]: connection restored
       30757:20130429:112208.488 cannot send list of active checks to [127.0.0.1]: host [Zabbix server] not found
       30750:20130429:112227.532 SNMP item [BW_cluster_onlinePlayers] on host [woteu2-s-4] failed: first network error, wait for 15 seconds
       30749:20130429:112228.372 SNMP item [MYSQL_status_Qcache_hits] on host [woteu2-s-4] failed: another network error, wait for 15 seconds
       30752:20130429:112250.075 resuming SNMP checks on host [woteu2-s-4]: connection restored
       30749:20130429:112322.889 SNMP item [BW_cluster_onlinePlayers] on host [woteu2-s-33] failed: first network error, wait for 15 seconds
       30752:20130429:112343.378 resuming SNMP checks on host [woteu2-s-33]: connection restored
       30756:20130429:112408.613 cannot send list of active checks to [127.0.0.1]: host [Zabbix server] not found
       30750:20130429:112517.554 SNMP item [BW_cluster_onlinePlayers] on host [woteu2-s-4] failed: first network error, wait for 15 seconds
       30751:20130429:112517.991 SNMP item [MYSQL_disk_rootdir_bytes_read] on host [woteu2-s-4] failed: another network error, wait for 15 seconds
       30748:20130429:112519.138 SNMP item [MYSQL_disk_rootdir_bytes_write] on host [woteu2-s-4] failed: another network error, wait for 15 seconds
       30751:20130429:112533.688 SNMP item [BW_baseapps_count_1_percent] on host [woteu2-s-33] failed: first network error, wait for 15 seconds
       30750:20130429:112534.971 SNMP item [BW_baseapps_count_2_percents] on host [woteu2-s-33] failed: another network error, wait for 15 seconds
       30749:20130429:112534.973 SNMP item [MYSQL_status_Open_tables] on host [woteu2-s-33] failed: another network error, wait for 15 seconds
       30752:20130429:112539.399 resuming SNMP checks on host [woteu2-s-4]: connection restored
       30752:20130429:112549.631 resuming SNMP checks on host [woteu2-s-33]: connection restored
      

      Please, it's very critical for us. Right now we downgrading to 2.0.5 version.

      Attachments

        Activity

          People

            Unassigned Unassigned
            sharewax Anton Samets
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: