[ZBX-3163] dependencies of triggers and evaluation order Created: 2010 Nov 01 Updated: 2017 May 30 Resolved: 2014 Jun 17 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 1.8.4rc1 |
Fix Version/s: | 2.2.4rc1, 2.3.2 |
Type: | Incident report | Priority: | Major |
Reporter: | Aleksandrs Saveljevs | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 29 |
Labels: | dependencies, patch, triggerdependencies, unsquashable | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: | zabbix-1.8.16_warrant_triggerid_dependencies.patch zabbix-2.0.4_warrant_triggerid_dependencies.patch | ||||||||||||||||
Issue Links: |
|
Description |
Suppose we have two triggers: T1: item>100 T1 depends on T2. Value 150 comes in, T1 becomes PROBLEM. Value 250 comes in, T2 becomes PROBLEM as well. Now, if value 50 comes, two things can happen: (1) if T1 is evaluated before T2, then only T2 will go into OK state, T1 will remain a PROBLEM, (2) if T1 is evaluated after T2, then both will go into OK state. Similarly, if both triggers are in OK state and a value 250 comes in, then depending on the evaluation order either both triggers will become PROBLEM, or only T2 will become PROBLEM. |
Comments |
Comment by Aleksandrs Saveljevs [ 2011 Apr 12 ] |
By the way, in Zabbix 1.8 we have "dep_level" field in "triggers" table, which is not used and we want to drop it in Zabbix 2.0. Since there cannot be circular dependencies, the trigger dependency graph is a DAG and can thus be topologically sorted. So maybe we could use "dep_level" to fix the problem with evaluation order by first evaluating triggers with a greater (or lower) "dep_level"? In this case, GUI and server will have to maintain the validity of "dep_level" field when creating and changing triggers. |
Comment by Michael Maymann [ 2012 Mar 03 ] |
I also ran into the above problem... Will this be fixed in 2.0 ? |
Comment by Scott Duensing [ 2012 Oct 15 ] |
This is still an issue in 2.0.3. Before sending an alert for a child, all parent states need to be checked. The suggested "solution" of altering the timing of polling child objects does not scale and is hackish at best. |
Comment by Marc [ 2013 Feb 19 ] |
This is really confusing for alert recipients. Does anybody have a reliable work-around for 2.0.x? |
Comment by MATSUDA Daiki [ 2013 Apr 08 ] |
I added two patches for 1.8.16 and 2.0.4 (available for 2.0.5). They are to warrant the dependencies of trigger for creating on WEB frontend. For 1.8.16, with |
Comment by MATSUDA Daiki [ 2013 Apr 12 ] |
fix for events.objectid |
Comment by Dave Allaby [ 2014 Mar 19 ] |
I am running ver2.0.9 and am experiencing the same issue.. I am surprised this does not get more votes as it is a huge deal for me. I monitor more than 60 firewalls which in some cases are cascaded so when a top level firewall misses pings I get notified of everything below it which is very confusing.. Since they are all the same devices I really don't want to have to use multiple templates or settings such as timings to sort this out. The frontend seems to work as expected but the actions/notifications do not.. |
Comment by richlv [ 2014 Mar 19 ] |
dave, this issue is about two triggers against the same item, in your case trigger sensitivity should simply be tuned so that the more important one fires first |
Comment by Radu Molnar [ 2014 Apr 14 ] |
Rich, tuning the timing does not fix the problem, it just lowers the chances of hitting it. But it does not eliminate it altogether. |
Comment by richlv [ 2014 Apr 15 ] |
item ia, checked every 30 seconds, trigger ta checks for 90 seconds of problem state. i don't see a way for tb to fire before ta if they both are caused by the same reason... |
Comment by Ghozlane TOUMI [ 2014 Apr 16 ] |
Just as head's up, @richv, the problem with your approach is that when you have a real network with a dependency chain, you *cannot* have different item timings for the different levels of dependencies... Of course you can check for hosts behind 5 routers/switches every hour to be sure you won't get wrong alerts when the router dies, but hey guess what? we set up the monitoring to be alerted of problems on the hosts too in real life deployments, every host behind 1 or 10 routers come from the same template with the same timings (if you want to keep your sanity) and have the same triggers... and even if you check for routers frequently, as Radu and others pointed out, it only lessen the chance of hitting the alerting storm. To repeat, the problem is not with dependencies, it's only that alerting and escalating don't check for those dependencies... |
Comment by richlv [ 2014 Apr 16 ] |
you can use usermacros/variables to customise this even when having a single template for network devices. if anybody has an idea how to make this really safe, we can discuss on irc - but it seems to be impossible to me. (note that dependencies on the same item is a different question) |
Comment by Ghozlane TOUMI [ 2014 Apr 16 ] |
user macros are best used for customising triggers for very specific cases, you can't realy use those for a wide deployment as those are per template or per host ... anyway regarding alerting and dependecies, the problem is that dependencies are checked at the begining of an alert escalation, and not at every stage of the escalation, I understand trigger may fire in any order, and i'mok with that as long as when the depending trigger fires, the dependent is no longer bugging me. THis is true in the web interface, but unfortunately not for the alerting . for instance at the begining I thought a good solution to avoid an alarm storm would be to define a simple escalation scheme : do nothing for the first 5 minutes, to be sure other dependent triggers whould have time to trigger, and then send a mail. |
Comment by richlv [ 2014 Apr 16 ] |
checking deps during escalation sounds potentially doable - but we'd need some developer comment on that. it's a different problem from the one reported in this issue, though. |
Comment by Ghozlane TOUMI [ 2014 Apr 16 ] |
agreed , this is |
Comment by Volker Fröhlich [ 2014 Apr 16 ] |
ZBXNEXT-1461 also touches the topic of topology and dependency, but this ticket is about something different, as was already stated. It's about triggers using the very same item(s) but different thresholds or functions. |
Comment by Aleksandrs Saveljevs [ 2014 Jun 04 ] |
Development branch svn://svn.zabbix.com/branches/dev/ZBX-3163 adds topological sorting to triggers and processes them according to that order. For review purposes, there are a couple of points worth mentioning. Functions DCconfig_sort_triggers_topologically() and DCconfig_sort_triggers_topologically_rec() contain checks for "trigdep->trigger" being NULL. This makes the code not that pretty, but it has to be done, because in DCsync_configuration() the SQL query for "tdep_result" might yield a slightly different set of triggers than that for "trig_result". Function process_triggers() first sorts triggers according to topological index and processes them. After that, it sorts the triggers based on trigger ID and executes SQL statements in the database in that order, so that we do not have any deadlocks. |
Comment by Aleksandrs Saveljevs [ 2014 Jun 06 ] |
Fixed in pre-2.2.4 r46275 and pre-2.3.2 (trunk) r46274. |
Comment by Martins Valkovskis [ 2014 Jun 09 ] |
Documented in: |
Comment by Alexander Vladishev [ 2014 Jun 17 ] |
(1) triggers.lastchange should be loaded into a configuration cache only for new triggers. sasha RESOLVED in @2.2 r46625 and @trunk r46626. asaveljevs CLOSED |
Comment by Alexander Vladishev [ 2014 Nov 26 ] |
This fix was solved the problem described in |
Comment by Oleksii Zagorskyi [ 2016 Mar 06 ] |
ZBXNEXT-3177 asks to be able to sort triggers by id - to see evaluation order. |