Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-2091

Zabbix server network error, says it will retry in 15 seconds, but 15 seconds never comes

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Duplicate
    • Icon: Major Major
    • None
    • 1.8.1
    • Server (S)
    • None
    • Opensolaris snv_133
      Sun C compiler

      Every once and a while, a host will build up a large number of items in the queue, and investigating the issue I found that there would be a network error for the host in the zabbix_server.log:

      3800:20100302:021113.207 Item [prod-app.local:perf_counter[\System\File Write Bytes/sec]] error: Get value from agent failed: Cannot connect to [10.10.0.56:10050] [Interrupted system call]
      3800:20100302:021113.208 ZABBIX Host [prod-app.local]: first network error, wait for 15 seconds

      That will be the only entry for the server, with high error logging enabled. It says it will retry in 15 seconds, but it never does, and the queue time for all the items just grows.
      Using "zabbix_get" manually, I can retrieve data just fine:

      1. /usr/zabbix/bin/zabbix_get -s prod-app.local -k agent.ping
        1
      1. /usr/zabbix/bin/zabbix_get -s prod-app.local -k "perf_counter[\System\File Write Bytes/sec]"
        7928.549888

      I have to disable the host, then re-enable, to get the items to work again. After than, it can be days, hours, or weeks before I see the issue again, usually on a different host. The retry doesn't appear to happen.

            Unassigned Unassigned
            bjones Brent Jones
            Votes:
            5 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: