[ZBX-3163] dependencies of triggers and evaluation order Created: 2010 Nov 01  Updated: 2017 May 30  Resolved: 2014 Jun 17

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 1.8.4rc1
Fix Version/s: 2.2.4rc1, 2.3.2

Type: Incident report Priority: Major
Reporter: Aleksandrs Saveljevs Assignee: Unassigned
Resolution: Fixed Votes: 29
Labels: dependencies, patch, triggerdependencies, unsquashable
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File zabbix-1.8.16_warrant_triggerid_dependencies.patch     Text File zabbix-2.0.4_warrant_triggerid_dependencies.patch    
Issue Links:
Duplicate
is duplicated by ZBX-3164 dependencies fail with zabbix_sender Closed
is duplicated by ZBX-5520 Bug in dependecies? Closed
is duplicated by ZBX-5864 Triggers calculation should not depen... Closed

 Description   

Suppose we have two triggers:

T1: item>100
T2: item>200

T1 depends on T2.

Value 150 comes in, T1 becomes PROBLEM. Value 250 comes in, T2 becomes PROBLEM as well.

Now, if value 50 comes, two things can happen: (1) if T1 is evaluated before T2, then only T2 will go into OK state, T1 will remain a PROBLEM, (2) if T1 is evaluated after T2, then both will go into OK state.

Similarly, if both triggers are in OK state and a value 250 comes in, then depending on the evaluation order either both triggers will become PROBLEM, or only T2 will become PROBLEM.



 Comments   
Comment by Aleksandrs Saveljevs [ 2011 Apr 12 ]

By the way, in Zabbix 1.8 we have "dep_level" field in "triggers" table, which is not used and we want to drop it in Zabbix 2.0. Since there cannot be circular dependencies, the trigger dependency graph is a DAG and can thus be topologically sorted. So maybe we could use "dep_level" to fix the problem with evaluation order by first evaluating triggers with a greater (or lower) "dep_level"? In this case, GUI and server will have to maintain the validity of "dep_level" field when creating and changing triggers.

Comment by Michael Maymann [ 2012 Mar 03 ]

I also ran into the above problem...
Just to precise: Wouldn't it be enough only to check parent-dependency(s) only after discovering a child has gone into problem-state. No need to check all parents every time you check a child, as parent(s) should always be ok if child is ok.

Will this be fixed in 2.0 ?

Comment by Scott Duensing [ 2012 Oct 15 ]

This is still an issue in 2.0.3. Before sending an alert for a child, all parent states need to be checked. The suggested "solution" of altering the timing of polling child objects does not scale and is hackish at best.

Comment by Marc [ 2013 Feb 19 ]

This is really confusing for alert recipients. Does anybody have a reliable work-around for 2.0.x?

Comment by MATSUDA Daiki [ 2013 Apr 08 ]

I added two patches for 1.8.16 and 2.0.4 (available for 2.0.5). They are to warrant the dependencies of trigger for creating on WEB frontend.
At the first I have a plan to sort on evaluating triggers. But it has always overhead. So, decided on creating.

For 1.8.16, with ZBX-6016 patch it works well.

Comment by MATSUDA Daiki [ 2013 Apr 12 ]

fix for events.objectid

Comment by Dave Allaby [ 2014 Mar 19 ]

I am running ver2.0.9 and am experiencing the same issue.. I am surprised this does not get more votes as it is a huge deal for me.

I monitor more than 60 firewalls which in some cases are cascaded so when a top level firewall misses pings I get notified of everything below it which is very confusing.. Since they are all the same devices I really don't want to have to use multiple templates or settings such as timings to sort this out.

The frontend seems to work as expected but the actions/notifications do not..
Is there a chance anyone has a workaround/patch for 2.0.9 or newer.. I don't mind upgrading if that will fix it..

Comment by richlv [ 2014 Mar 19 ]

dave, this issue is about two triggers against the same item, in your case trigger sensitivity should simply be tuned so that the more important one fires first

Comment by Radu Molnar [ 2014 Apr 14 ]

Rich, tuning the timing does not fix the problem, it just lowers the chances of hitting it. But it does not eliminate it altogether.

Comment by richlv [ 2014 Apr 15 ]

item ia, checked every 30 seconds, trigger ta checks for 90 seconds of problem state.
item ib, checked every 60 seconds, trigger tb checks for 180 seconds of problem state.
trigger tb depends on trigger ta.

i don't see a way for tb to fire before ta if they both are caused by the same reason...

Comment by Ghozlane TOUMI [ 2014 Apr 16 ]

Just as head's up, ZBX-4344 is about problem with dependencies and alerting . it's title is misleading, but i think it's the one everyone is bitching about in the comments here...

@richv, the problem with your approach is that when you have a real network with a dependency chain, you *cannot* have different item timings for the different levels of dependencies...

Of course you can check for hosts behind 5 routers/switches every hour to be sure you won't get wrong alerts when the router dies, but hey guess what? we set up the monitoring to be alerted of problems on the hosts too

in real life deployments, every host behind 1 or 10 routers come from the same template with the same timings (if you want to keep your sanity) and have the same triggers... and even if you check for routers frequently, as Radu and others pointed out, it only lessen the chance of hitting the alerting storm.

To repeat, the problem is not with dependencies, it's only that alerting and escalating don't check for those dependencies...

Comment by richlv [ 2014 Apr 16 ]

you can use usermacros/variables to customise this even when having a single template for network devices.
as for checking for dependencies, alerting does check for them, but the problem is that one of the problems might not be detected yet if you do not make triggers have different sensitivity.
you could argue that evaluating any trigger that depends on some other trigger should make all of those items be polled and then the triggers recalculated, but that would be a disaster performance wise, and still would not solve all cases - some of the triggers being depended on might not fire soon enough even in such a scenario.

if anybody has an idea how to make this really safe, we can discuss on irc - but it seems to be impossible to me.

(note that dependencies on the same item is a different question)

Comment by Ghozlane TOUMI [ 2014 Apr 16 ]

user macros are best used for customising triggers for very specific cases, you can't realy use those for a wide deployment as those are per template or per host ...

anyway regarding alerting and dependecies, the problem is that dependencies are checked at the begining of an alert escalation, and not at every stage of the escalation,

I understand trigger may fire in any order, and i'mok with that as long as when the depending trigger fires, the dependent is no longer bugging me. THis is true in the web interface, but unfortunately not for the alerting .

for instance at the begining I thought a good solution to avoid an alarm storm would be to define a simple escalation scheme : do nothing for the first 5 minutes, to be sure other dependent triggers whould have time to trigger, and then send a mail.
unfortunately as the dependencies are not checked at each stage, the bogus mails are sent, and the admins are waked up at night for nothing...

Comment by richlv [ 2014 Apr 16 ]

checking deps during escalation sounds potentially doable - but we'd need some developer comment on that. it's a different problem from the one reported in this issue, though.

Comment by Ghozlane TOUMI [ 2014 Apr 16 ]

agreed , this is ZBX-4344 , but most of the comments on ZBX-3163 could be solved by this kind of solution...

Comment by Volker Fröhlich [ 2014 Apr 16 ]

ZBXNEXT-1461 also touches the topic of topology and dependency, but this ticket is about something different, as was already stated. It's about triggers using the very same item(s) but different thresholds or functions.

Comment by Aleksandrs Saveljevs [ 2014 Jun 04 ]

Development branch svn://svn.zabbix.com/branches/dev/ZBX-3163 adds topological sorting to triggers and processes them according to that order.

For review purposes, there are a couple of points worth mentioning.

Functions DCconfig_sort_triggers_topologically() and DCconfig_sort_triggers_topologically_rec() contain checks for "trigdep->trigger" being NULL. This makes the code not that pretty, but it has to be done, because in DCsync_configuration() the SQL query for "tdep_result" might yield a slightly different set of triggers than that for "trig_result".

Function process_triggers() first sorts triggers according to topological index and processes them. After that, it sorts the triggers based on trigger ID and executes SQL statements in the database in that order, so that we do not have any deadlocks.

Comment by Aleksandrs Saveljevs [ 2014 Jun 06 ]

Fixed in pre-2.2.4 r46275 and pre-2.3.2 (trunk) r46274.

Comment by Martins Valkovskis [ 2014 Jun 09 ]

Documented in:

Comment by Alexander Vladishev [ 2014 Jun 17 ]

(1) triggers.lastchange should be loaded into a configuration cache only for new triggers.

sasha RESOLVED in @2.2 r46625 and @trunk r46626.

asaveljevs CLOSED

Comment by Alexander Vladishev [ 2014 Nov 26 ]

This fix was solved the problem described in ZBX-9079.

Comment by Oleksii Zagorskyi [ 2016 Mar 06 ]

ZBXNEXT-3177 asks to be able to sort triggers by id - to see evaluation order.

Generated at Fri Mar 29 12:44:53 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.