[ZBXNEXT-4271] Delay escalator by a huge escalations table with Recovery operations Created: 2017 Dec 11  Updated: 2018 Oct 31  Resolved: 2018 Oct 10

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: 3.2.10, 3.4.4, 4.0 (plan)
Fix Version/s: 4.0.1rc1, 4.2.0alpha1, 4.2 (plan)

Type: Change Request Priority: Trivial
Reporter: Kim Jongkwon Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File 4.0.1rc1_test_result.png     PNG File million escalations new.png     PNG File million escalations old.png    
Issue Links:
Causes
Duplicate
Team: Team A
Sprint: Sprint 42, Sprint 43, Sprint 44
Story Points: 1

 Description   

from ZBX-13137

There are cases in which you store them in the escalations table without the OK event.

Examples:

zabbix=> select count(*) from escalations where status=2;
  count  
---------
 1180124

The real problem is the "escalator busy 100%" in this situation.
Escalator's busy rate has increased and It's hard to notice the problem until the problem occurs.

Solution 1 (no need for development)
Delete a "recovery operation" actions or "all event manual close" can solves this problem. (escalation table datas is also deleted.)

And I also thought it's better to remove it from the escalations if really don't need to deal with "RECOVERY" by trigger settings. Problem of "escalator busy" is need improvement. Therefore, the additional improvements that require development are below.

Solution 2
With "OK event generation : None" and "Don't allow manual close" datas -> to be removed

Solution 3
With "OK event generation : None" and "Allow manual close" datas -> If possible, We need a solution that doesn't increase the escalator busy rate. (The best solution is not to store data in escalations.)

Solution 4 (New Features)
An additional feature that allows to automatically close (remove the old escalation data)



 Comments   
Comment by Vladislavs Sokurenko [ 2018 Oct 03 ]

Fixed in :

  • pre-4.0.1rc1 r85403
  • pre-4.2.0alpha1 (trunk) r85404
Comment by Kim Jongkwon [ 2018 Oct 24 ]

Just FYI. To clarify :
The "Solutions" ideas that is written is to remove the escalator datas. But actual fix is improved escalator performance, for now.

  • improved escalator performance by using nextcheck index instead of reading whole table (vso)

I've checked this fix with Zabbix 4.0.1rc1. (I tested with 360,000 escalation datas)

I think it's a pretty good results. Thanks for fix.

vso thanks for feedback, was it test data or real Zabbix server ? Under which conditions would you like escalation data to be removed ?

JKKim That was 'Test data'. Please double check - I noticed it is different from the situation in 3.2. In 4.0.1rc1, escalation data does not disappear even if the any option is changed. (like Solution 1 - Delete a "recovery operation" actions)
If this is true, There is no simple way to erase the stored huge datas.

And go back to the story of ZBX-13137. Users will think this type of data is not created with this trigger options below.

  • OK event generation : None
  • Allow manual close : No (unchecked)

In my opinion, If this option is selected - It is best condition to remove datas.

vso does decreasing delay help ?

JKKim As the performance improved so much 4.0.1, delay resolved with this performance fix. I understood that this is the best way at now. so we can CLOSE.

Current design, If the escalator data remove...

  • Impossible to recovery events the manual closed setting to be changed later.
  • If Escalator process doing remove data, Delays may increase due to remove or check. (maybe)

But unfortunately remaining "unused data" on escalations table. Just I think that it is necessary to discuss conditions and processing that data can be removed. (It could be another ZBXNEXT in the future.)

Generated at Fri Mar 29 01:43:45 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.