-
Epic
-
Resolution: Unresolved
-
Trivial
-
None
-
None
-
None
-
All
-
Trigger Operations
In an IT Support role; it's not common to receive an email/push notification about every single problem detected by Zabbix. EG: PC in marketing is rebooting, switchport detected interface down.
For the operational alert for Triggers; the STEPS feature with custom or default delays is really useful (top red area):
For the other two operations (Recovery and Update); the use of Steps is NOT available. Edit:
Scenario:
- 100x servers are manually patched, scheduled business change, for an outside of hours activity. All the devices receive approved firmware, os patches and in some cases application patches. After all patching we should perform a full reboot - to be sure a cold-boot would work as well. During the reboot process, network switch-ports are detected as operationState=down. Zabbix polls every 60 seconds, and detects the down state. Raising a `problem record` in zabbix.
Triggers fire, Operation step 1 - does nothing.
If the port is still down after 2 minutes, we send emails - it's useful to know which servers are stuck. Sometimes servers prompt asking for some human input.. (keyboard not detected: press f1 to continue - anyone remember that? )
If it's the trigger is 5 minutes old, then we send push notifications via a third party (in this case OpsGenie). REALLY useful.
What doesn't make sense is the quantities.. Last month:
- 10x DOWN events (problems raised >2m)
- 100x UP events (problems resolved)
We should be able to configure the operational steps to allow an effective means of communications; much like the steps available in the initial trigger of a problem record.