[ZBX-10265] Generating duplicated events after maintenance even when trigger didn't change state during real maintenance period Created: 2016 Jan 14 Updated: 2017 Oct 11 Resolved: 2017 Oct 11 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Documentation (D), Server (S) |
Affects Version/s: | 2.2.11, 2.4.7, 3.0.0alpha6 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Major |
Reporter: | Andrei Gushchin (Inactive) | Assignee: | Unassigned |
Resolution: | Duplicate | Votes: | 4 |
Labels: | duplicates, events, maintenance | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
Description |
Steps for reproduce:
You see that was generated new event and trigger become unacknowledged, it is unexpected behavior for setups where acknowledging used as indication that problem is known and somebody work on it. |
Comments |
Comment by Oleksii Zagorskyi [ 2016 Jan 18 ] |
It's known and already documented (in |
Comment by Andrei Gushchin (Inactive) [ 2016 Jan 18 ] |
Hmm In For example if we have 10000+ servers few was down or broke, NOC team is ACK their trigger then create maintenance, and put bunch of hosts(with hostgroup) included that two to maintenance, after awhile get them out from maintenance manually, It will generate new events -> notifications,escalations etc. , at least NOC team should ACK them again. |
Comment by Oleksii Zagorskyi [ 2016 Jan 18 ] |
Hard to believe ... but I tested and it appeared to be a true !!! A function mentioned in the /****************************************************************************** * * * Function: generate_events * * * * Purpose: generate events for triggers after maintenance period * * The events will be generated if trigger changed its state during * * the maintenance * But for PROBLEM/OK event duplicated created even if original event was created before host was switched "in maintenance" state. And it's a roulette what trigger status was before the already configured active maintenance. Problem is that we do not store real timestamps when hosts status being changed to maintenance instead we store maximal clock from maintenance ("Active since" from global or "Data", "At (hour:minute)" from periods) in hosts.maintenance_from. We could use and store timestamp of current time, i.e. now(). It would be much more correct, IMO. This issue is critical enough as we are aware of many zabbix users/etc who, using zabbix API, puts many hosts/groups to already existing very "wide time" maintenance. Even more, this bug may create multiple (more than 2) consecutive same type events, which drives us crazy long time already in ZBX-9432. Debug log (2 parts - for PROBLEM and OK events): 13687:20160118:190738.147 Starting Zabbix Server. Zabbix 2.4.7rc1 (revision 54982). 13701:20160118:190738.178 server #13 started [timer #1] ... --FOR DUPLICATED OK: 13701:20160118:191900.351 In process_maintenance() 13701:20160118:191900.351 query [txnlev:0] [select m.maintenanceid,m.maintenance_type,m.active_since,tp.timeperiod_type,tp.every,tp.month,tp.dayofweek,tp.day,tp.start_time,tp.period,tp.start_date from maintenances m,maintenances_windows mw,timeperiods tp where m.maintenanceid=mw.maintenanceid and mw.timeperiodid=tp.timeperiodid and m.active_since<=1453137540 and m.active_till>1453137540] 13701:20160118:191900.352 In process_maintenance_hosts() 13701:20160118:191900.352 query [txnlev:0] [select h.hostid,h.host,h.maintenanceid,h.maintenance_status,h.maintenance_type,h.maintenance_from from maintenances_hosts mh,hosts h where mh.hostid=h.hostid and h.status=0 and mh.maintenanceid=1] 13701:20160118:191900.353 query [txnlev:0] [select h.hostid,h.host,h.maintenanceid,h.maintenance_status,h.maintenance_type,h.maintenance_from from maintenances_groups mg,hosts_groups hg,hosts h where mg.groupid=hg.groupid and hg.hostid=h.hostid and h.status=0 and mg.maintenanceid=1] 13701:20160118:191900.354 End of process_maintenance_hosts() 13701:20160118:191900.354 In update_maintenance_hosts() 13701:20160118:191900.354 query [txnlev:1] [begin;] 13701:20160118:191900.354 query [txnlev:1] [select hostid,host,maintenance_type,maintenance_from from hosts where status=0 and flags<>2 and maintenance_status=1 and not hostid=10124] 13701:20160118:191900.355 taking host 'it0' out of maintenance 13701:20160118:191900.355 query [txnlev:1] [update hosts set maintenanceid=null,maintenance_status=0,maintenance_type=0,maintenance_from=0 where hostid=10105] 13701:20160118:191900.355 query [txnlev:1] [commit;] 13701:20160118:191900.361 In generate_events() 13701:20160118:191900.361 query [txnlev:0] [select distinct t.triggerid,t.description,t.expression,t.priority,t.type,t.lastchange,t.value from triggers t,functions f,items i where t.triggerid=f.triggerid and f.itemid=i.itemid and t.status=0 and i.status=0 and i.state=0 and i.hostid=10105] 13701:20160118:191900.363 In get_trigger_values() 13701:20160118:191900.363 get_trigger_values() from:'2016.01.01 00:00:00' 13701:20160118:191900.363 query [txnlev:0] [select value from events where source=0 and object=0 and objectid=14239 and clock<1451599200 order by eventid desc limit 1] 13701:20160118:191900.364 query [txnlev:0] [select value from events where source=0 and object=0 and objectid=14239 and clock>=1451599200 and value=1 limit 1] 13701:20160118:191900.364 End of get_trigger_values() before:0 inside:1 after:0 13701:20160118:191900.364 In process_events() events_num:1 13701:20160118:191900.364 In DCget_nextid() table:'events' num:1 13701:20160118:191900.364 End of DCget_nextid() table:'events' [246:246] 13701:20160118:191900.365 query without transaction detected 13701:20160118:191900.365 query [txnlev:0] [insert into events (eventid,source,object,objectid,clock,ns,value) values (246,0,0,14239,1453137540,0,0); ] 13701:20160118:191900.370 In process_actions() events_num:1 13701:20160118:191900.370 query [txnlev:0] [select actionid,evaltype,formula from actions where status=0 and eventsource=0] 13701:20160118:191900.371 In check_action_conditions() actionid:15 13701:20160118:191900.371 query [txnlev:0] [select conditionid,conditiontype,operator,value from conditions where actionid=15 order by conditiontype] 13701:20160118:191900.371 In check_action_condition() actionid:15 conditionid:33 cond.value:'' 13701:20160118:191900.371 In check_trigger_condition() 13701:20160118:191900.371 query [txnlev:0] [select count(*) from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and h.maintenance_status=0 and i.itemid=f.itemid and f.triggerid=t.triggerid and t.triggerid=14239] 13701:20160118:191900.372 End of check_trigger_condition():SUCCEED 13701:20160118:191900.372 End of check_action_condition():SUCCEED 13701:20160118:191900.372 End of check_action_conditions():SUCCEED 13701:20160118:191900.372 In DCget_nextid() table:'escalations' num:1 13701:20160118:191900.372 End of DCget_nextid() table:'escalations' [6:6] 13701:20160118:191900.372 query without transaction detected 13701:20160118:191900.372 query [txnlev:0] [insert into escalations (escalationid,actionid,status,triggerid,itemid,eventid,r_eventid) values (6,15,0,14239,null,246,null); ] 13701:20160118:191900.377 End of process_actions() 13701:20160118:191900.377 In DBupdate_itservices() 13701:20160118:191900.377 In its_flush_updates() 13701:20160118:191900.377 In its_load_services_by_triggerids() 13701:20160118:191900.377 query [txnlev:0] [select serviceid,triggerid,status,algorithm from services where triggerid=14239] 13701:20160118:191900.378 End of its_load_services_by_triggerids() 13701:20160118:191900.378 End of its_flush_updates():SUCCEED 13701:20160118:191900.378 End of DBupdate_itservices():SUCCEED 13701:20160118:191900.378 End of process_events() 13701:20160118:191900.378 End of generate_events() 13701:20160118:191900.378 End of update_maintenance_hosts() 13701:20160118:191900.378 End of process_maintenance() 13701:20160118:191900.378 timer #1 [processed 1 triggers, 0 events in 0.002521 sec, 1 maintenances in 0.026423 sec, idle 30 sec] ... --FOR DUPLICATED PROBLEM: 13701:20160118:192200.413 In process_maintenance() 13701:20160118:192200.413 query [txnlev:0] [select m.maintenanceid,m.maintenance_type,m.active_since,tp.timeperiod_type,tp.every,tp.month,tp.dayofweek,tp.day,tp.start_time,tp.period,tp.start_date from maintenances m,maintenances_windows mw,timeperiods tp where m.maintenanceid=mw.maintenanceid and mw.timeperiodid=tp.timeperiodid and m.active_since<=1453137720 and m.active_till>1453137720] 13701:20160118:192200.414 In process_maintenance_hosts() 13701:20160118:192200.414 query [txnlev:0] [select h.hostid,h.host,h.maintenanceid,h.maintenance_status,h.maintenance_type,h.maintenance_from from maintenances_hosts mh,hosts h where mh.hostid=h.hostid and h.status=0 and mh.maintenanceid=1] 13701:20160118:192200.415 query [txnlev:0] [select h.hostid,h.host,h.maintenanceid,h.maintenance_status,h.maintenance_type,h.maintenance_from from maintenances_groups mg,hosts_groups hg,hosts h where mg.groupid=hg.groupid and hg.hostid=h.hostid and h.status=0 and mg.maintenanceid=1] 13701:20160118:192200.415 End of process_maintenance_hosts() 13701:20160118:192200.415 In update_maintenance_hosts() 13701:20160118:192200.415 query [txnlev:1] [begin;] 13701:20160118:192200.416 query [txnlev:1] [select hostid,host,maintenance_type,maintenance_from from hosts where status=0 and flags<>2 and maintenance_status=1 and not hostid=10124] 13701:20160118:192200.416 taking host 'it0' out of maintenance 13701:20160118:192200.416 query [txnlev:1] [update hosts set maintenanceid=null,maintenance_status=0,maintenance_type=0,maintenance_from=0 where hostid=10105] 13701:20160118:192200.417 query [txnlev:1] [commit;] 13701:20160118:192200.422 In generate_events() 13701:20160118:192200.422 query [txnlev:0] [select distinct t.triggerid,t.description,t.expression,t.priority,t.type,t.lastchange,t.value from triggers t,functions f,items i where t.triggerid=f.triggerid and f.itemid=i.itemid and t.status=0 and i.status=0 and i.state=0 and i.hostid=10105] 13701:20160118:192200.423 In get_trigger_values() 13701:20160118:192200.423 get_trigger_values() from:'2016.01.01 00:00:00' 13701:20160118:192200.423 query [txnlev:0] [select value from events where source=0 and object=0 and objectid=14239 and clock<1451599200 order by eventid desc limit 1] 13701:20160118:192200.424 End of get_trigger_values() before:0 inside:2 after:1 13701:20160118:192200.424 In process_events() events_num:1 13701:20160118:192200.424 In DCget_nextid() table:'events' num:1 13701:20160118:192200.424 End of DCget_nextid() table:'events' [248:248] 13701:20160118:192200.424 query without transaction detected 13701:20160118:192200.424 query [txnlev:0] [insert into events (eventid,source,object,objectid,clock,ns,value) values (248,0,0,14239,1453137720,0,1); ] 13701:20160118:192200.440 In process_actions() events_num:1 13701:20160118:192200.440 query [txnlev:0] [select actionid,evaltype,formula from actions where status=0 and eventsource=0] 13701:20160118:192200.441 In check_action_conditions() actionid:15 13701:20160118:192200.441 query [txnlev:0] [select conditionid,conditiontype,operator,value from conditions where actionid=15 order by conditiontype] 13701:20160118:192200.441 In check_action_condition() actionid:15 conditionid:33 cond.value:'' 13701:20160118:192200.441 In check_trigger_condition() 13701:20160118:192200.441 query [txnlev:0] [select count(*) from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and h.maintenance_status=0 and i.itemid=f.itemid and f.triggerid=t.triggerid and t.triggerid=14239] 13701:20160118:192200.442 End of check_trigger_condition():SUCCEED 13701:20160118:192200.442 End of check_action_condition():SUCCEED 13701:20160118:192200.442 End of check_action_conditions():SUCCEED 13701:20160118:192200.442 In DCget_nextid() table:'escalations' num:1 13701:20160118:192200.442 End of DCget_nextid() table:'escalations' [8:8] 13701:20160118:192200.442 query without transaction detected 13701:20160118:192200.442 query [txnlev:0] [insert into escalations (escalationid,actionid,status,triggerid,itemid,eventid,r_eventid) values (8,15,0,14239,null,248,null); ] 13701:20160118:192200.454 End of process_actions() 13701:20160118:192200.454 In DBupdate_itservices() 13701:20160118:192200.454 In its_flush_updates() 13701:20160118:192200.454 In its_load_services_by_triggerids() 13701:20160118:192200.454 query [txnlev:0] [select serviceid,triggerid,status,algorithm from services where triggerid=14239] 13701:20160118:192200.454 End of its_load_services_by_triggerids() 13701:20160118:192200.455 End of its_flush_updates():SUCCEED 13701:20160118:192200.455 End of DBupdate_itservices():SUCCEED 13701:20160118:192200.455 End of process_events() 13701:20160118:192200.455 End of generate_events() 13701:20160118:192200.455 End of update_maintenance_hosts() 13701:20160118:192200.455 End of process_maintenance() 13701:20160118:192200.455 timer #1 [processed 1 triggers, 0 events in 0.001975 sec, 1 maintenances in 0.041823 sec, idle 30 sec] CONFIRMED |
Comment by Oleksii Zagorskyi [ 2016 Jan 20 ] |
(1) Documentation. martins-v Thanks, I slightly changed the added text, trying to keep the same meaning, but perhaps more precisely. Please have a look. zalex_ua very good, thanks! Replicated to 2.4 and 2.2. 2.0 is skipped as the page is not maintained anymore. |
Comment by Pascal Uhlmann [ 2016 Sep 30 ] |
This issue also exists in 2.0.12. I'd say this behaviour should be configurable as a global setting. |
Comment by Vladislavs Sokurenko [ 2017 Oct 11 ] |
This should have been fixed in |