Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-4268

Timer process gets stuck with 100% utilization

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Won't fix
    • Icon: Major Major
    • None
    • 1.8.8
    • Server (S)
    • RHEL 5.5, x86-64 Linux

      I am monitoring the zabbix sub-processes in a graph, thankfully, or I would never know what was sticking. I noticed that items which should have been in maintenance were paging, and never even showed the orange link color on the dashboard. Careful inspection, and even maintenance period re-creation revealed that the problem persisted.

      I looked at the subprocess graph and discovered that the timer process had gone into some loop and pegged at 0% idle (or 100% usage, depending on how you look at it). I tried killing the timer in hopes Zabbix would spawn a new one, but of course it shut down the whole operation and I did a restart.

      I have included the graph so you can see what happened. The closest events of any interest in that 9:17AM area are these:

      31516:20111024:091729.430 Item [REDACTED.com:vip.bytesIn_perConn.443] became not supported: Division by zero. Cannot evaluate expression [464/0]
      31505:20111024:091733.207 Item [REDACTED.com:vip.bytesOut_perConn.443] became not supported: Division by zero. Cannot evaluate expression [5353/0]

      These items are based on SNMP queries against an F5 big iron, but does a calculation against another number also harvested therefrom. It seems that Zabbix correctly deflected these and marked the items as not supported, so this may be a mere coincidence. There are no triggers associated with the above items.

      We have never seen this bug before, so I presume it may have something to do with the optimizations included in 1.8.8. We noticed that our timer process averaged around 40% idle with 1.8.5 and now it averages 60% idle. We've definitely seen improvement, but we need our maintenance windows to work.

      I'll be adding a trigger to catch this 0% idle condition on the timer process for the time being.

        1. ora-01002.diff
          0.4 kB
        2. strace.out.gz
          7.53 MB
        3. ZabbixProcesses.png
          ZabbixProcesses.png
          72 kB

            Unassigned Unassigned
            untergeek Aaron Mildenstein
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: