[ZBXNEXT-9802] Missing STEPS for the RECOVERY and UPDATE Trigger-Action-Operations Created: 2025 Feb 03  Updated: 2025 Feb 03

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Epic Priority: Trivial
Reporter: Simon Jackson Assignee: Andris Zeila
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: 24h
Time Spent: Not Specified
Original Estimate: 24h
Environment:

All


Attachments: PNG File image-2025-02-03-09-49-38-778.png     PNG File image-2025-02-03-09-49-48-651.png    
Epic Name: Trigger Operations

 Description   

In an IT Support role; it's not common to receive an email/push notification about every single problem detected by Zabbix. EG: PC in marketing is rebooting, switchport detected interface down.

For the operational alert for Triggers; the STEPS feature with custom or default delays is really useful (top red area):

For the other two operations (Recovery and Update); the use of Steps is NOT available. Edit:

 

Scenario:

 - 100x servers are manually patched, scheduled business change, for an outside of hours activity. All the devices receive approved firmware, os patches and in some cases application patches. After all patching we should perform a full reboot - to be sure a cold-boot would work as well.  During the reboot process, network switch-ports are detected as operationState=down. Zabbix polls every 60 seconds, and detects the down state. Raising a `problem record` in zabbix.

Triggers fire, Operation step 1 - does nothing.

If the port is still down after 2 minutes, we send emails - it's useful to know which servers are stuck. Sometimes servers prompt asking for some human input.. (keyboard not detected: press f1 to continue - anyone remember that? )

If it's the trigger is 5 minutes old, then we send push notifications via a third party (in this case OpsGenie). REALLY useful.

 

What doesn't make sense is the quantities..  Last month:

  • 10x DOWN events (problems raised >2m)
  • 100x UP events (problems resolved)

 

We should be able to configure the operational steps to allow an effective means of communications; much like the steps available in the initial trigger of a problem record.



 Comments   
Comment by Simon Jackson [ 2025 Feb 03 ]

Note the recovery action (Notify all involved), does not suit our requirements - having different 'steps' for each type of operation would be amazing.

Generated at Sat Apr 05 13:02:34 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.