[ZBX-11738] No actions are taken on problem if it gets resolved while escalation is in progress Created: 2017 Jan 25  Updated: 2024 Apr 10  Resolved: 2017 Mar 16

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.2.2, 3.2.3, 3.4.0alpha1
Fix Version/s: 3.2.5rc1, 3.4.0alpha1

Type: Incident report Priority: Major
Reporter: Vjaceslavs Bogdanovs Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: action, actionoperations, escalations, problems
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File no_actions_taken.png     PNG File nope.png    
Issue Links:
Duplicate
Team: Team A
Sprint: Sprint 1, Sprint 2, Sprint 3
Story Points: 2.5

 Description   

Just like subject says: no actions are taken on problem if it gets resolved while escalation is in progress.

Consider the following scenario:
1. There are two trapper items (itemA and itemB) on a host
2. Both items have triggers (triggerA and triggerB) with expression {host:itemA.last()}<>0 and {host:itemB.last()}<>0
3. Both triggers have actions (with multiple operations) attached to them
4. First trigger (triggerA) goes into problem state and escalation process starts (action operations are executed)
5. While escalation is in progress (it may take some time as there are multiple commands) second trigger (triggerB) goes into problem state
6. While escalation is still in progress second trigger (triggerB) goes into normal state (problem on triggerB gets resolved)

No action operations assigned to triggerB will be executed:

Highlighted problem (09:48:40) has no actions executed as it changed it state while triggerA problem actions were executed. Last problem (09:52:35) shows that actions are executed without any problems if escalation is not in progress.



 Comments   
Comment by Glebs Ivanovskis (Inactive) [ 2017 Jan 25 ]

To my understanding this is related to very short problem durations, smaller than

#define CONFIG_ESCALATOR_FREQUENCY	3

So I guess it's either "by design" or "too hard to fix" without a better inter-process communication mechanism.

Comment by Vjaceslavs Bogdanovs [ 2017 Jan 25 ]

Nope, just add more operations to triggerA to make escalation longer:

Now trigger was in problem state for 10s. But I agree that there are no easy fixes for all of the trigger / action cases.

I suggest adding an option to process trigger actions even if trigger is in resolved state when escalator gets to it. This could be an option for user to tell Zabbix that this problem is important and actions should be taken even if problem got solved.

Comment by Oleksii Zagorskyi [ 2017 Jan 25 ]

You say :

4. First trigger (triggerA) goes into problem state and escalation process starts (action operations are executed)
5. While escalation is in progress (it may take some time as there are multiple commands) second trigger (triggerB) goes into problem state

something is wrong here, IMO, or described unusually that may mislead.
4 - when a trigger goes to problem, a record to escalations table is inserted by db syncer, but escalation process itself will be started later - during following 3 seconds (maximum) by escalator process.

If so silly point would be existing in zabbix all the time - we would know about it already.
I cannot believe that independent item+trigger does affect another item+trigger, does not matter that they belong to single host.

3. Both triggers have actions (with multiple operations) attached to them

do you mean you use separate actionS with conditions for each trigger?

Comment by Glebs Ivanovskis (Inactive) [ 2017 Jan 25 ]

I guess it is crucial to mention that there was only one escalator process. And while it was busy with "heavy" action of one trigger another one was able to fire and resolve itself without any actions performed by escalator when it finished with first trigger action.

For someone who bumps into such misconfiguration I will leave a few suggestions/workarounds:

  • monitor escalator business level and set number of escalator processes appropriately;
  • prevent trigger flapping by using more robust trigger expressions (e.g. check problem resolution conditions with time shifts to give escalators some time to execute first step of action);
  • if operations are 100% crucial at every problem occasion - there is option to disable automatic problem resolution: Generate OK events NONE, close all of them manually.
Comment by Vjaceslavs Bogdanovs [ 2017 Jan 25 ]

something is wrong here, IMO, or described unusually that may mislead.
4 - when a trigger goes to problem, a record to escalations table is inserted by db syncer, but escalation process itself will be started later - during following 3 seconds (maximum) by escalator process.

Not sure what is misleading here. I am skipping some steps (item value change, etc.) as they are there, but don't affect the result. And yes, there are some actions made by db syncer, but I am telling about the state when escalator started (we can assume that step №5 starts when first operation is executed as a reaction to triggerA).

do you mean you use separate actionS with conditions for each trigger?

Yes, two items, two triggers and two actions.

I guess it is crucial to mention that there was only one escalator process.

Yes, but it is a default value, so I think it is common case among Zabbix users.

Comment by Vjaceslavs Bogdanovs [ 2017 Jan 25 ]

And yes, I can't reproduce this bug in Zabbix 3.0, but it is present since 3.2. Just tested with clean install.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Jan 25 ]

Probably related to ZBXNEXT-3195.

Comment by Vjaceslavs Bogdanovs [ 2017 Jan 25 ]

Probably related to ZBXNEXT-3195.

Checked on clean 3.2.0 installation, bug is present in 3.2.0

Comment by Glebs Ivanovskis (Inactive) [ 2017 Jan 25 ]

Or ZBX-11454. Seem nothing else from ChangeLog is related.

Comment by Viktors Tjarve [ 2017 Feb 02 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-11738
Additionally improved passing parameters esc_step and actionid for escalation recovery operations.

Comment by Sergejs Paskevics [ 2017 Feb 06 ]

(1) Wrong calculation of nextcheck. This check has been lost - if (escalation.nextcheck <= now)

viktors.tjarve RESOLVED in r65652.

sasha CLOSED

Comment by Sergejs Paskevics [ 2017 Feb 06 ]

(2) Unused parameter escalation in escalation_execute_recovery_operations.

viktors.tjarve RESOLVED in r65654.

sasha Great! CLOSED

Comment by Viktors Tjarve [ 2017 Feb 13 ]

(3) Only fitst escalation step of second trigger triggerB will be executed. That will happen in a case when escalation related to triggerA is still in progress and second trigger triggerB fires, goes into PROBLEM state, reaches time of second or any further step and goes back into OK state (problem on triggerB gets resolved) before triggerA gets processed.

viktors.tjarve RESOLVED in r65744.

s.paskevics I cannot reproduce such situation. I added minor changes in r65875. Сhanges should be checked by someone else.

sasha I undid these changes. WON'T FIX

Comment by Alexander Vladishev [ 2017 Mar 10 ]

Successfully tested! Have a look at my changes in r66301, r66302.

Comment by Sergejs Paskevics [ 2017 Mar 13 ]

Looks good! Thank you!

Comment by Viktors Tjarve [ 2017 Mar 13 ]

Released in:

  • 3.2.5rc1 r66325
  • 3.4.0alpha1 r66332
Comment by Sergejs Paskevics [ 2017 Mar 14 ]

This change doesn't need to be documented

Comment by richlv [ 2017 Apr 12 ]

(4) changelog entry currently says :

fixed performing actions for problemas that have started and resolved while another problems escalation is executed

  • there's a typo "problemas"
  • "another problems escalation" should be "another problem escalation"

viktors.tjarve RESOLVED in r67537.

sasha Thanks! CLOSED

Generated at Wed Jan 22 22:55:17 EET 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.