[ZBX-11426] Events removed by housekeeper can cause trigger to be stuck in problem state Created: 2016 Nov 04 Updated: 2018 Dec 14 Resolved: 2017 Nov 28 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 3.2.1 |
Fix Version/s: | 3.2.9rc1, 3.2.11rc1, 3.4.3rc1, 3.4.5rc1, 4.0.0alpha1, 4.0 (plan) |
Type: | Problem report | Priority: | Critical |
Reporter: | Andris Zeila | Assignee: | Andrea Biscuola (Inactive) |
Resolution: | Fixed | Votes: | 6 |
Labels: | events, housekeeper | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
||||||||||||||||||||||||||||||||
Team: | Team A | ||||||||||||||||||||||||||||||||
Sprint: | Sprint 14, Sprint 15, Sprint 16, Sprint 17, Sprint 19, Sprint 21, Sprint 22 | ||||||||||||||||||||||||||||||||
Story Points: | 3.5 |
Description |
When housekeeper removes open problem event the trigger value/problem count is not updated. If this was the last open problem event then trigger will be stuck in problem state and keep generating recovery events. To fix the current situation recovery events must update trigger value/problem count event if there were no open problems To avoid this from happening in future housekeeper must not remove open problem events. |
Comments |
Comment by Oleksii Zagorskyi [ 2016 Nov 08 ] |
|
Comment by Aleksandrs Saveljevs [ 2016 Nov 08 ] |
|
Comment by Andris Zeila [ 2016 Nov 10 ] |
|
Comment by Alexander Vladishev [ 2017 Aug 10 ] |
|
Comment by Andrea Biscuola (Inactive) [ 2017 Sep 20 ] |
Fixed in svn://svn.zabbix.com/branches/dev/ZBX-11426 Modified the filters in the housekeeping_events() function for checking through a subquery if an event have an associated problem in the problem table. Remove only the events without a corresponding record (open or closed). |
Comment by Andris Zeila [ 2017 Sep 20 ] |
Successfully tested, please review minor changes in r72783 abs Looks good. CLOSED |
Comment by Andrea Biscuola (Inactive) [ 2017 Sep 26 ] |
Released in:
|
Comment by richlv [ 2017 Sep 28 ] |
this might be worth documenting in the housekeeper section (and maybe also in the upgrade notes for 3.2.9 and 3.4.3) |
Comment by Andrea Biscuola (Inactive) [ 2017 Sep 29 ] |
Maybe a good idea, as now the housekeeper behaviour is explicit regarding how some types of events are kept or deleted. The issue itself was already mitigated in the past through another task and this is just the completion of that work. |
Comment by richlv [ 2017 Oct 06 ] |
indeed, currently the behaviour seems to be completely undocumented |
Comment by Andris Zeila [ 2017 Oct 13 ] |
With event housekeeping period set to 1d (or close to it) there is a danger of recovery events being removed while the recovered events are still in problem table. I'm not sure if it's worth adding more complexity to event deleting queries (although it would be the safest way). I think acceptable workaround would be to call housekeeping_problems() before housekeeping_events() function. As the event housekeeping period cannot be less than problem cleanup period (24h) this would ensure that problems are removed from problem table before corresponding events are removed from events table. wiper So it was decided to have proper fix. To do it we need to add problem.r_eventid index and also check for r_eventid when removing events. |
Comment by Andrea Biscuola (Inactive) [ 2017 Nov 14 ] |
Fixed in svn://svn.zabbix.com/branches/dev/ZBX-11426 Swap the calls to housekeeping_problems() and housekeeping_events(). Also added a filter to the events delete queries for checking the |
Comment by Andris Zeila [ 2017 Nov 16 ] |
Successfully tested, please review coding style fixes in r74679 abs style fix ok. CLOSED |
Comment by Andrea Biscuola (Inactive) [ 2017 Nov 17 ] |
Released in:
|
Comment by Andrea Biscuola (Inactive) [ 2017 Nov 17 ] |
The housekeeper final behaviour after this change is that an event Also, now the housekeeper will delete problems first and events |