Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-10767

rework logic of trigger-based internal events generation when host becomes unavailable

XMLWordPrintable

      If you will try to use actions for internal events, it will appear to be annoying, especially for triggers and most likely you will stop to use it.
      After period of time of using it I have a few comments to share, see below.

      Currently, if some agent (zabbix/snmp/etc) is stopped, after ~54 seconds ((15+timeout)*3) it becomes unavailable, all triggers (except with time-based functions) will be marked as unknown with error message "Agent is unavailable."
      A function "update_triggers_status_to_unknown" is responsible to do that and it also will generate events plus, if corresponding action exists, will generate also alerts like "Unknown <Trigger name>"

      Suppose I'm admin of zabbix and do control everything, so I've started to use internal events to send alerts.
      I have a zabbix agent host(s) with 5000 triggers (I saw that on some installations) for items with update interval 60-600-3600-86400 seconds.
      The agent has been stopped by someone for 1 minute and I immediately get 5000 "Unknown <Trigger name>" alerts, which is not funny at all.

      I don't know why I get these 5000 alerts, so after a minute I go to frontend and I see that those triggers have "Agent is unavailable." error.
      I had to visit frontend because it's not possible to know in alert why the trigger is unknown (ZBXNEXT-3140).
      At this point I think - hmm, I do control how I consider that particular host availability by a dedicated trigger "agent.ping.nodata(5m)=1"
      And at this point I started to get alerts like "Normal <Trigger name>" and I continue to get them during next 24 hours until all 5000 items will be polled next time.

      Another aspect - what if another zabbix admin/user (who don't get the internal alerts) after a few hours will be configuring triggers for this host?
      Yes, he/she will notice that some triggers (for items with big update interval) have rex X icon with "Agent is unavailable." error, but, in the same time, host ZBX icon is visible as green!
      It misleads!

      Also, what if I have other similar agents and received Unknown alerts for them being mixed with Normal alerts from the previous host?
      Such huge and different alerts flow impossible to effectively track, so I disable the action for internal trigger events at all
      As result most of people do not use it the internal monitoring and implement own solutions to monitor internal items and especially triggers.

      One more detail if you use zabbix proxies:
      If agent is monitored by proxy - only host status will be updated but its triggers will stay Normal, which may mislead. Reported as ZBX-10766.

      While I may see some tiny sense in the switching triggers to Unknown (visible gray ? icon in Monitoring menu for Unknown triggers), it produces more problems than usability, so suggested to be reworked.

      Possible solutions, better is first:

      1. So after long doubts I suggest to not switch triggers to Unknown (based on host availability) at all, because host availability in not an internal thing, it should be monitored by a regular item/trigger by regular zabbix users!

      2. All below:

      • fix ZBX-10766 for consistency;
      • add a new condition for internal actions with an ability to filter out those "host availability" events (Unknown and Normal), for example like requested in ZBXNEXT-3273;
      • automatically switch Unknown triggers (with error "Agent is unavailable.") back to Normal when host becomes available, without waiting next item update interval.

      For now I simply comment:

      DBbegin();
      process_events();
      DBcommit();
      

      in "update_triggers_status_to_unknown" function.

            asincovs Antons Sincovs
            zalex_ua Oleksii Zagorskyi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: