Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-4284

Possible wrong host (agent!,snmp?) disabling and bad handling of Unreachable items (hosts)

    Details

      Description

      Deep debugging -> this issue report

      Configuration:
      Single enabled host with three agent items. The keys are:
      agent.version (itemid=22535)
      agent.version[] (itemid=22536) <----DISABLED for the experiment stage 1 (similar key selected for easier searching across log file)
      sleep5 (itemid=22537)
      Update interval for they all is 30 seconds.

      Zabbix agent config (added UserParameter):
      UserParameter=sleep5,sleep 5

      Zabbix server config is default.

      Stage 1:
      In the zabbix_server.log I see:
      30817:20111027:133610.349 Zabbix agent item [sleep5] on host [it0] failed: first network error, wait for 15 seconds
      30820:20111027:133628.418 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
      30820:20111027:133646.421 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
      30820:20111027:133704.426 temporarily disabling Zabbix agent checks on host [it0]: host unavailable

      And the item "agent.version" will never be checked !!! because the key "sleep5" gives timeout.

      And it's very bad.

      Stage 2:
      In this moment I enabled the item with the key "agent.version[]"
      And server behavior changed to this:

      30820:20111027:134834.519 enabling Zabbix agent checks on host [it0]: host became available
      30815:20111027:134840.435 Zabbix agent item [sleep5] on host [it0] failed: first network error, wait for 15 seconds
      30820:20111027:134858.525 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
      30820:20111027:134916.528 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
      30820:20111027:134931.530 resuming Zabbix agent checks on host [it0]: connection restored
      30819:20111027:134940.443 Zabbix agent item [sleep5] on host [it0] failed: first network error, wait for 15 seconds
      30820:20111027:134958.538 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
      30820:20111027:135016.541 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
      30820:20111027:135031.544 resuming Zabbix agent checks on host [it0]: connection restored
      30815:20111027:135040.450 Zabbix agent item [sleep5] on host [it0] failed: first network error, wait for 15 seconds
      30820:20111027:135058.550 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
      30820:20111027:135116.553 Zabbix agent item [sleep5] on host [it0] failed: another network error, wait for 15 seconds
      30820:20111027:135131.556 resuming Zabbix agent checks on host [it0]: connection restored
      .... etc, etc, etc

      The key "agent.version[]" does not give to become host unavailable after three network errors (UnreachablePeriod=45 seconds).
      Currently, the keys "agent.version" and "agent.version[]" are checked but with the not proper & unstable intervals:
      -> "agent.version"
      2011.Oct.27 13:51:36 1.8.9rc1
      2011.Oct.27 13:50:36 1.8.9rc1
      2011.Oct.27 13:49:36 1.8.9rc1
      2011.Oct.27 13:48:36 1.8.9rc1

      -> "agent.version[]"
      2011.Oct.27 13:51:36 1.8.9rc1
      2011.Oct.27 13:51:31 1.8.9rc1
      2011.Oct.27 13:50:36 1.8.9rc1
      2011.Oct.27 13:50:31 1.8.9rc1
      2011.Oct.27 13:49:36 1.8.9rc1
      2011.Oct.27 13:49:31 1.8.9rc1
      2011.Oct.27 13:48:36 1.8.9rc1
      2011.Oct.27 13:48:34 1.8.9rc1

      See "items_interval.png" screenshot additionally.

      But the key "sleep5" are not marked anyhow with the error state in the GUI.

      Single place where we can see the reason is zabbix_sever.log and a queue in the GUI.

      That's too not very good.

      Server was restarted and the filtered log (debuglevel=4) is attached.
      Filter is: grep -E " started|agent.version|network error|connection restored|became available|sleep5|agent result" zabbix_server.log > demo.log

      I would recommend to redesign this behavior.

      Specification: https://www.zabbix.org/wiki/Docs/specs/ZBX-4284

        Attachments

        1. 60_items_update_interval_10sec.png
          26 kB
          Oleksiy Zagorskyi
        2. 60_items_update_interval_60sec.png
          24 kB
          Oleksiy Zagorskyi
        3. demo.log
          58 kB
          Oleksiy Zagorskyi
        4. items_interval.png
          63 kB
          Oleksiy Zagorskyi

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                zalex_ua Oleksiy Zagorskyi
              • Votes:
                9 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: