[ZBX-9278] Better housekeeper logic for deleting history of deleted/changed items Created: 2015 Feb 03 Updated: 2019 Dec 10 |
|
Status: | Open |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Frontend (F), Server (S) |
Affects Version/s: | 2.4.3 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Trivial |
Reporter: | Oleksii Zagorskyi | Assignee: | Unassigned |
Resolution: | Unresolved | Votes: | 5 |
Labels: | housekeeper, performance | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
Description |
2 sub issues described in comments below. |
Comments |
Comment by Oleksii Zagorskyi [ 2015 Feb 03 ] |
(1) [server] First HK invocation looks like this: 24358:20150121:134433.198 housekeeper [removing old history and trends] 24358:20150121:134433.198 In housekeeping_history_and_trends() now:1421840673 24358:20150121:134433.198 In hk_history_delete_queue_prepare_all() 24358:20150121:134433.198 query [txnlev:0] [select itemid,min(clock) from history group by itemid] 24358:20150121:134433.201 query [txnlev:0] [select itemid,min(clock) from history_str group by itemid] 24358:20150121:134433.201 query [txnlev:0] [select itemid,min(clock) from history_log group by itemid] 24358:20150121:134433.201 query [txnlev:0] [select itemid,min(clock) from history_uint group by itemid] 24358:20150121:134433.206 query [txnlev:0] [select itemid,min(clock) from history_text group by itemid] 24358:20150121:134433.207 query [txnlev:0] [select itemid,min(clock) from trends group by itemid] 24358:20150121:134433.207 query [txnlev:0] [select itemid,min(clock) from trends_uint group by itemid] 24358:20150121:134433.220 query [txnlev:0] [select i.itemid,i.value_type,i.history,i.trends from items i,hosts h where i.hostid=h.hostid and h.status in (0,1)] 24358:20150121:134433.221 End of hk_history_delete_queue_prepare_all() ... 24358:20150121:134433.871 housekeeper [removing deleted items data] 24358:20150121:134433.871 In housekeeping_cleanup() 24358:20150121:134433.871 query [txnlev:0] [select housekeeperid,tablename,field,value from housekeeper where tablename in ('history','history_log','history_str','history_text','history_uint','trends','trends_uint') order by tablename] 24358:20150121:134433.871 query without transaction detected 24358:20150121:134433.871 query [txnlev:0] [delete from history where itemid=27853 limit 5000] ... next invocations look like this: 24358:20150121:134557.509 housekeeper [removing old history and trends] 24358:20150121:134557.509 In housekeeping_history_and_trends() now:1421840757 24358:20150121:134557.509 In hk_history_delete_queue_prepare_all() 24358:20150121:134557.509 query [txnlev:0] [select i.itemid,i.value_type,i.history,i.trends from items i,hosts h where i.hostid=h.hostid and h.status in (0,1)] 24358:20150121:134557.511 End of hk_history_delete_queue_prepare_all() ... 24358:20150121:134558.129 housekeeper [removing deleted items data] 24358:20150121:134558.129 In housekeeping_cleanup() 24358:20150121:134558.129 query [txnlev:0] [select housekeeperid,tablename,field,value from housekeeper where tablename in ('history','history_log','history_str','history_text','history_uint','trends','trends_uint') order by tablename] 24358:20150121:134558.130 query without transaction detected 24358:20150121:134558.130 query [txnlev:0] [delete from history_uint where itemid=27721 limit 5000] 24358:20150121:134558.199 End of housekeeping_cleanup():5000 ... We see that only for first invocation the HK scans all history tables (using indexes) to know which itemid has history and which doesn't - let's call it as "delete queues cache". I want to pay your attention to [removing deleted items data] stage. For example when I upgraded to 2.4 I forgot to enable HK in Administration menu When I enabled HK - it took ~10 hours for first activity and deleted 99,99% rows of the HK table. According to the debuglog above the HK did 95M SQLs like this (for different history tables of course): delete from history where itemid=27853 limit 5000 Most big my history tables are 300G, 200G, 150G, so you can imagine. Question is - why HK tries to delete history for items while it is practically aware that these items don't have history at all ("delete queues cache"). Of course there is a question - is it possible to determine had an item any history before it was deleted when zabbix server was already running - I don't know. Feel free to ask me if something is not clear. |
Comment by Oleksii Zagorskyi [ 2015 Feb 03 ] |
(2) [frontend] Such a logic leads to bad performance - described in (1) Even if (1) will be fixed as suggested - here is next problem which will not be resolved. Suppose I had an item in template (linked to hundreds of hosts) with "integer" type. After one year I changed history type for the item to "float". What I suggest is to change logic when we change item type or delete it. When change item type frontend should do: 1. delete from HK where tablename="new_item_type" and value="itemid"; <- sanity check for cases if item type was changed front and back 1.1. the same for trends if item is numeric <- sanity check for cases if item type was changed front and back 2. INSERT INTO housekeeper (tablename,field,value,housekeeperid) VALUES ('previous_item_type','itemid','29848','323') 2.1. the same for trends if item is numeric It will cover the problem I described in this comment above. Then in case if item is deleted, we need to just insert 1 (or 2 if item is numeric) record to the HK table - for current item type only. |
Comment by Oleksii Zagorskyi [ 2015 Nov 06 ] |
Most likely all idea I posted here will be taken into account in ZBXNEXT-2860, but let's wait to make sure and then will close current report. |