-
Incident report
-
Resolution: Fixed
-
Critical
-
2.1.9
-
None
The problem is in process_actions() in src/zabbix_server/actions.c, which consists of two parts. First, we fetch all actions for the given events and make a list of actions for which the given events might be recovery events. Then, we go through all such actions and all given events again, and check whether a given event is a recovery event for a particular action.
Unfortunately, these checks in the second part are not sufficiently strict. So it is possible that, in the first phase, an action is added to the list due to event E1, but in the second phase recovered by event E2.
An example of such a problem that we observed in our environment is as follows. We had a trigger going into PROBLEM state and a consequent escalation starting. Then, a host went unavailable and an internal event for this trigger was generated. We received a recovery message from Zabbix that was sent in response to this internal event.