[ZBXNEXT-3196] New option for actions: delay escalation while in maintenance Created: 2016 Mar 17  Updated: 2019 Feb 21  Resolved: 2016 Jun 06

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: API (A), Frontend (F), Server (S)
Affects Version/s: None
Fix Version/s: 3.2.0alpha1

Type: New Feature Request Priority: Major
Reporter: Alexander Vladishev Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: actionoperations, actions, maintenance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates ZBX-8558 Wrong event is generated at an end of... Closed
is duplicated by ZBX-10265 Generating duplicated events after ma... Closed

 Description   

Current processing of maintenance periods produces new events at the end of maintenance that confuses Zabbix users. It's hard to understand who generated these events and why, information about acknowledgment of the original event is lost as well.

We should pause escalation during maintenance periods instead.



 Comments   
Comment by richlv [ 2016 Apr 15 ]

similar - ZBXNEXT-128 asks for a way to add an operation condition that would take into account the maintenance status

Comment by richlv [ 2016 Apr 15 ]

similar - ZBXNEXT-2355 mentions extra events, created after maintenance ends

Comment by richlv [ 2016 Apr 15 ]

it could also be very confusing when the escalation gets paused - consider logging at loglevel4 "escalation paused/resumed "messages, and including that information in the ESC.HISTORY

Comment by richlv [ 2016 Apr 15 ]

we might want to set ZBXNEXT-894 as a duplicate of this issue

Comment by Andris Zeila [ 2016 Apr 25 ]

Development started in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-3196

Comment by Andris Zeila [ 2016 Apr 25 ]

(1) [I] database patch added in r59672

sasha Database patch (r59672) and changes in data.tmpl (r59727) were successfully tested! CLOSED

Comment by richlv [ 2016 Apr 26 ]

my comment on logging was edited - but it would still be appreciated to have at least some comment on whether any logging will help users here

wiper currently there are no plans to log paused escalations. Actually we won't have paused escalations as such - they will be simply ignored during maintenance period.

<richlv> thank you for the answer

Comment by Ivo Kurzemnieks [ 2016 Apr 27 ]

(2) [F] Translation strings added:

  • Cannot set "%1$s" for action "%2$s".
  • Pause operations while in maintenance

sasha CLOSED

Comment by Andris Zeila [ 2016 Apr 27 ]

Server ready for testing: r59760

Comment by Sandis Neilands (Inactive) [ 2016 May 09 ]

(3) In check_escalation() maintenance is unconditionally set to HOST_MAINTENANCE_STATUS_ON. It must be set to item.host.maintenance_status.

sandis.neilands RESOLVED in r59973.

wiper CLOSED

Comment by Sandis Neilands (Inactive) [ 2016 May 09 ]

(4) check_escalation() never FAILs.

sandis.neilands RESOLVED in r59983.

wiper CLOSED

Comment by Sandis Neilands (Inactive) [ 2016 May 09 ]

(5) execute_action() can use uninitialized action. Scenario:

  • check_escalation() fails before setting the action;
  • process_escalation() calls execute_action() with the uninitialized action.

The head comment of check_escalation() is wrong. action->actionid is not set to 0.

 * Comments: 'action' is filled with information about action. If information *
 *           could not be gathered, 'action->actionid' is set to 0.           *

However in execute_action() we check for just that.

BTW, ignore can also be left uninitialized but it is not used when check_escalation() fails.

wiper RESOLVED in r60014

sandis.neilands CLOSED.

Comment by Sandis Neilands (Inactive) [ 2016 May 09 ]

(6) Problem in escalator's main loop not related to this development. Since ZBXNEXT-2844 the process_escalations() is called three times, each time for different escalation_source. The problem is that process_escalations resets nextcheck to now + CONFIG_ESCALATOR_FREQUENCY thus overriding whatever nextcheck that was found in the previous invocation.

This is not a huge problem though - CONFIG_ESCALATOR_FREQUENCY is currently just 3 seconds. It might be better to at least change the order of process_escalations() so that for ZBX_ESCALATION_SOURCE_TRIGGER it is called last. Even better would be to set nextcheck in the mainloop and not increase it in process_escalations() (it could reduce it though).

wiper Moved to ZBXNEXT-3195. It's the last escalator related ZBXNEXT - it's better to have finishing touches (including varibale/function renaming if necessary) there.
CLOSED

Comment by Sandis Neilands (Inactive) [ 2016 May 10 ]

(7) Currently escalation pausing in server is implemented for all four event sources: internal, discovery, auto registration, triggers. By default, the escalations will be paused during maintenance. In front-end and API there is no option to disable this.

Options:
1. Provide the same option also in internal, discovery, auto-reg. action configuration as in trigger action configuration, lift the restriction in API.
2. Change the default to from 1 to 0 in DB and front-end, API.
3. Change server to consider maintenance only for trigger based escalations.

wiper Third it is,
RESOLVED in r59996

sandis.neilands CLOSED.

Comment by Alexander Vladishev [ 2016 May 17 ]

(8) Redundant check for escalarion_rc

if (0 == cur_esc.r_eventid && FAIL != escalation_rc)

wiper RESOLVED in r60139

sasha CLOSED

Comment by Alexander Vladishev [ 2016 May 17 ]

(9) Possible memory leak

escalation_rc = check_escalation(&cur_esc, &skip, &maintenance, &error);
if (FAIL == check_db_action(cur_esc.actionid, &action, &error))
        escalation_rc = FAIL;

if (FAIL != escalation_rc)
{
        if (EVENT_SOURCE_TRIGGERS == action.eventsource &&
                        ACTION_MAINTENANCE_MODE_PAUSE == action.maintenance_mode &&
                        HOST_MAINTENANCE_STATUS_ON == maintenance)
        {
                /* remove paused escalations that were created and recovered/cancelled */
                /* during maintenance period                                           */
                if (0 == cur_esc.esc_step && 0 != cur_esc.r_eventid)
                {
                        zbx_vector_uint64_append(&escalations_to_be_deleted,
                                        cur_esc.escalationid);
                        free_db_action(&action);                      <<- this code is missing here
                        goto next;
                }

wiper RESOLVD in r60140

sasha CLOSED

Comment by Alexander Vladishev [ 2016 May 17 ]

(10) src/zabbix_server/escalator/escalator.c:1837 action can be uninitialized here (for example, when action is disabled)

wiper RESOLVED in r60142

sasha I undoing the changes. action will be initialized in any case.

WON'T FIX

Comment by Alexander Vladishev [ 2016 May 17 ]

(11) src/zabbix_server/escalator/escalator.c:1838 Memory leak: error can be uncleared when goto next; happened

wiper RESOLVED in r60143

sasha CLOSED

Comment by Alexander Vladishev [ 2016 May 17 ]

(12) src/zabbix_server/escalator/escalator.c:1727 result of check_escalation() can be overwritten.

escalation_rc = check_escalation(&cur_esc, &skip, &maintenance, &error);
if (FAIL == check_db_action(cur_esc.actionid, &action, &error))
        escalation_rc = FAIL;

wiper RESOLVED in r60138

sasha escalation_rc will be uninitialized when check_db_action() returns FAIL

REOPENED

wiper RESOLVED in r60145

sasha CLOSED

Comment by Alexander Vladishev [ 2016 May 17 ]

(13) Incorrect naming:

  1. escalations_to_be_deleted is array of IDs. May be del_escalationids?
  2. better name for variable maintenance will be maintenance_status
  3. deleted_escalation_count => ret ?

wiper Reverted most of the renaming done in r59971. The escalator will be changed in ZBXNEXT-3195 and possibly ZBXNEXT-3193. Any renaming should be done in the last escalator related issue.
RESOLVED in r60150

sasha CLOSED

Comment by Alexander Vladishev [ 2016 May 18 ]

(14) PHP errors while updating actions:

Undefined index: eventsource [ in actionconf.php:192]

gunarspujats RESOLVED in r60202

sasha CLOSED

Comment by Alexander Vladishev [ 2016 May 18 ]

(15) Cannot update non-trigger actions:

Cannot update "maintenance_mode" for action "Auto discovery. Linux servers.".

gunarspujats RESOLVED in r60202

sasha CLOSED with minor fix in r60209

Comment by Andris Zeila [ 2016 May 20 ]

Released in:

  • pre-3.1.0 r60213
Comment by Martins Valkovskis [ 2016 May 26 ]

(16) Updated documentation:

sandis.neilands CLOSED.

Generated at Fri Apr 26 22:32:41 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.