Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-8095

Gaps in agent.ping items cause random triggers being activated

    Details

    • Type: Incident report
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.2.3
    • Fix Version/s: 2.2.4rc2, 2.3.2
    • Component/s: Server (S)
    • Labels:
      None
    • Environment:
      CentOS 6.5 x86_64

      Description

      Ever since we upgraded from 2.0 to 2.2.x, we get random 'server is unreachable' triggers being activated, and resolved a minute or so later. This happens 1 or 2 times every day ( and night ), with no immediate cause.

      Here are the agent.ping items from the time the alert is triggered:

      2014.Apr.16 03:36:58 Up (1)
      2014.Apr.16 03:35:28 Up (1)
      2014.Apr.16 03:33:58 Up (1)
      2014.Apr.16 03:32:28 Up (1)
      2014.Apr.16 03:32:16 Up (1)
      2014.Apr.16 03:21:58 Up (1)
      2014.Apr.16 03:20:29 Up (1)
      2014.Apr.16 03:18:58 Up (1)
      2014.Apr.16 03:17:28 Up (1)
      2014.Apr.16 03:15:58 Up (1)

      As you can see there is a 10 minute gap in items. We have set DebugLevel to 4 on both server and agent, and that showed us that the server never creates/asks those items, making it seem like a server issue, not an agent one.

      A bit more info about our environment:

      Number of hosts (monitored/not monitored/templates) 444 384 / 7 / 53
      Number of items (monitored/disabled/not supported) 26142 24525 / 513 / 1104
      Number of triggers (enabled/disabled) [problem/ok] 5406 5401 / 5 [51 / 5350]
      Required server performance, new values per second 144.63 -

      Item: Agent ping Triggers (2) agent.ping 60 7 7 Zabbix agent Zabbix agent
      Trigger:

      {Template Zabbix Agent:agent.ping.nodata(5m)}

      =1

      PS: Our template has an update interval of 60, while all hosts put it at 90, i guess that's worthy of another bug report.

      Internal Zabbix server items are quite idle, with busy poller % for example being around 10-15%. It was 20-25 %, but we increased pollers from 5 to 12 in the hopes of alleviating this problem. It didnt help.

      Anything else we can provide?

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                Unassigned
                Reporter:
                verwilst Bart Verwilst
              • Votes:
                2 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: