[ZBXNEXT-1150] Event should store and show the maintenance status for an event Created: 2012 Mar 14  Updated: 2024 Feb 22

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Frontend (F), Server (S)
Affects Version/s: 1.8.10
Fix Version/s: None

Type: New Feature Request Priority: Major
Reporter: James Sperry Assignee: Alexei Vladishev
Resolution: Unresolved Votes: 17
Labels: events, maintenance, troubleshooting, usability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

all


Issue Links:
Duplicate
is duplicated by ZBXNEXT-2355 provide an ability to later understan... Closed
is duplicated by ZBXNEXT-3621 filter by maintenance in event.get Closed

 Description   

It would be very helpful if in the Event Details area of an event, there was a row on the left that included the Maintenance status during the event.

When troubleshooting Actions and Triggers, it's very difficult to know if the maintenance was adhered to. Having this information would take a lot of guesswork out of it.



 Comments   
Comment by Volker Fröhlich [ 2012 Mar 14 ]

From my point of view, actual maintenance periods should be stored along with the host, as long as maintenance is only host based.

This way you could also visualize it in graphs.

Comment by Volker Fröhlich [ 2012 Dec 03 ]

Connected to ZBXNEXT-1084

Comment by Pavel Timofeev [ 2013 Jan 28 ]

Not only in Event detail, but in Events list too.
I'd like to see something like this
http://img-fotki.yandex.ru/get/6430/16519813.0/0_9ff52_182007c4_orig
and this
http://img-fotki.yandex.ru/get/6431/16519813.0/0_9ff53_eb508574_orig
That would be very useful!

Comment by Oleksii Zagorskyi [ 2016 Dec 20 ]

In duplicated ZBXNEXT-2355 I shared an idea where the information could be stored:

Where it could stored - I don't know. The "events" table don't have suitable columns.
I guess currently numbers (if there were alerts) for the "Actions" column taken from "alerts" table.
Maybe we could store something to the "alerts" table and then display it in a special way?

What we need to remember that staring from 3.2 events may be found on Monitoring -> Problems (Show History switch) menu.

Comment by Oleksii Zagorskyi [ 2016 Dec 20 ]

I've updated issue summary to not limit all possible use cases, linked here as duplicates.

Comment by richlv [ 2016 Dec 28 ]

here's a hackish idea that might or might not work.

my usecase is grabbing events with event.get to produce a report on which events and how many times have fired during a specific period of time.
now, event.get is completely useless for this purpose if you want to exclude the events that were generated during a maintenance.

the hackish idea is to have an action that sends an alert with a "not in maintenance" condition (adjusting other conditions as needed to minimise the amount of these "extra" alerts).
there might already be an action like that in many setups - maybe one that sends emails to the team alias, for example.
then one could grab alerts with alert.get and filter down to event IDs that are relevant (as in, happened only outside the maintenance).
one cannot just count the alerts as any repeated alerts will completely skew the statistics about events.

will have to play with this to see whether it works as hoped.

Comment by James Sperry [ 2016 Dec 28 ]

just to be clear, my use case is not to exclude events which were created during maintenance.
My use case is to show if an action was not triggered due to the fact that the host was in maintenance.

We automate the adding and removing of maintenance in our environment. So when we remove a host or group from maintenance, the maintenance item is deleted.

So if someone comes to me and says "Hey, why didn't we see this alert last week", I would like to look into the event history and say "it was in maintenance at the time"

thanks.

Comment by richlv [ 2016 Dec 29 ]

ejames, yes, that's harder. the best i can think of - having two nearly identical actions. one with, one without "not in maintenance" condition. then an api-crawling script that compares alerts from both. any difference must be caused by an active maintenance at the time...

Comment by richlv [ 2016 Dec 31 ]

as for the event.get problem, looks like alert.get can be used to work around the lack of the maintenance status like this :

  • alert.get all alerts (filter by a specific action, time period etc)
  • grab unique eventids from the output
  • query event.get by those eventids

of course, that will scale much worse - more alerts than events, might have to filter by a large number of events. other than that, seems to work for the intended purpose - finding out what alerted over some period, while ignoring things that happened during an active maintenance.

Comment by Oleksii Zagorskyi [ 2019 Sep 30 ]

Maybe it could take "event_suppress" table data, as it has been added in v4.0 ?
But ... Checked it. Unfortunately zabbix server deletes entries after maintenance is finished for a host, generated events.

 20583:20190930:234928.509 taking host (10271) out of maintenance
 20583:20190930:234928.509 query [txnlev:1] [update hosts set maintenanceid=null,maintenance_type=0,maintenance_status=0,maintenance_from=0 where hostid=10271;]
 20583:20190930:234928.510 query [txnlev:1] [commit;]
 20583:20190930:234928.510 query [txnlev:1] [begin;]
 20583:20190930:234928.511 query [txnlev:1] [delete from event_suppress where suppress_until<1569876568]
 20583:20190930:234928.511 query [txnlev:1] [commit;]
 20583:20190930:234928.511 query [txnlev:0] [select eventid,objectid,r_eventid from problem where source=0 and object=0 and mod(eventid,1)=0 order by eventid]
 20583:20190930:234928.512 query [txnlev:0] [select eventid,maintenanceid,suppress_until from event_suppress where mod(eventid,1)=0 order by eventid]
 20583:20190930:234928.513 query [txnlev:0] [select functionid,triggerid from functions where (triggerid between 16019 and 16023 or triggerid in (13491,13496,13501,13502,15917,15924,15944,16012,16033,16055)) order by triggerid]
 20583:20190930:234928.515 query [txnlev:0] [select eventid,tag,value from problem_tag where (eventid between 1069 and 1073 or eventid in (15,353,2238,2239,3395,3397,4061,4083,4084,4086,4089,4093,4095)) order by eventid]
 20583:20190930:234928.516 query [txnlev:1] [begin;]
 20583:20190930:234928.516 query [txnlev:1] [select maintenanceid from maintenances where maintenanceid=1 order by maintenanceid lock in share mode]
 20583:20190930:234928.517 In zbx_dc_get_event_maintenances()
 20583:20190930:234928.517 End of zbx_dc_get_event_maintenances()
 20583:20190930:234928.517 query [txnlev:1] [delete from event_suppress where eventid=4095 and maintenanceid=1;

But I think it could be possible to rework current approach and do not delete corresponding useful entries in the table and show details in event details, at least.
Such records later could be deleted by constrains, together with parent events.

Comment by Dimitri Bellini [ 2023 Oct 25 ]

Hi DevTeam,
I would like to re-open this discussion because I think it's very important on Enterprise enviroments.
It's very useful to track the events that did not fire an action because the was a "maintenance".
Please check what we can do it for the next 7.0

Thanks os much

Comment by Dimitri Bellini [ 2023 Dec 14 ]

Hi Guys,
Some updates about this problem? As first step to workaround the issue I would like to suggest a "simple event tag" to mark the events fireuring a maintenance.
It's not the final answear but can be very useful in the meantime we do not have a "real" solution.
Thanks

Comment by Alexei Vladishev [ 2023 Dec 18 ]

We plan to discuss possible solutions later this week and will keep everyone updated.

Comment by Dimitri Bellini [ 2023 Dec 18 ]

Hi Alexei, perfect! I will wait your further news.

Generated at Sat Apr 20 09:22:26 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.