[ZBX-9278] Better housekeeper logic for deleting history of deleted/changed items Created: 2015 Feb 03  Updated: 2019 Dec 10

Status: Open
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F), Server (S)
Affects Version/s: 2.4.3
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Unresolved Votes: 5
Labels: housekeeper, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   

2 sub issues described in comments below.



 Comments   
Comment by Oleksii Zagorskyi [ 2015 Feb 03 ]

(1) [server]
Next I'll use just "HK" shortening for "housekeeper" word.

First HK invocation looks like this:

 24358:20150121:134433.198 housekeeper [removing old history and trends] 
 24358:20150121:134433.198 In housekeeping_history_and_trends() now:1421840673
 24358:20150121:134433.198 In hk_history_delete_queue_prepare_all()
 24358:20150121:134433.198 query [txnlev:0] [select itemid,min(clock) from history group by itemid]
 24358:20150121:134433.201 query [txnlev:0] [select itemid,min(clock) from history_str group by itemid]
 24358:20150121:134433.201 query [txnlev:0] [select itemid,min(clock) from history_log group by itemid]
 24358:20150121:134433.201 query [txnlev:0] [select itemid,min(clock) from history_uint group by itemid]
 24358:20150121:134433.206 query [txnlev:0] [select itemid,min(clock) from history_text group by itemid]
 24358:20150121:134433.207 query [txnlev:0] [select itemid,min(clock) from trends group by itemid]
 24358:20150121:134433.207 query [txnlev:0] [select itemid,min(clock) from trends_uint group by itemid]
 24358:20150121:134433.220 query [txnlev:0] [select i.itemid,i.value_type,i.history,i.trends from items i,hosts h where i.hostid=h.hostid and h.status in (0,1)]
 24358:20150121:134433.221 End of hk_history_delete_queue_prepare_all()
...
 24358:20150121:134433.871 housekeeper [removing deleted items data]
 24358:20150121:134433.871 In housekeeping_cleanup()
 24358:20150121:134433.871 query [txnlev:0] [select housekeeperid,tablename,field,value from housekeeper where tablename in ('history','history_log','history_str','history_text','history_uint','trends','trends_uint') order by tablename]
 24358:20150121:134433.871 query without transaction detected
 24358:20150121:134433.871 query [txnlev:0] [delete from history where itemid=27853 limit 5000]
...

next invocations look like this:

 24358:20150121:134557.509 housekeeper [removing old history and trends]
 24358:20150121:134557.509 In housekeeping_history_and_trends() now:1421840757
 24358:20150121:134557.509 In hk_history_delete_queue_prepare_all()
 24358:20150121:134557.509 query [txnlev:0] [select i.itemid,i.value_type,i.history,i.trends from items i,hosts h where i.hostid=h.hostid and h.status in (0,1)]
 24358:20150121:134557.511 End of hk_history_delete_queue_prepare_all()
...
 24358:20150121:134558.129 housekeeper [removing deleted items data]
 24358:20150121:134558.129 In housekeeping_cleanup()
 24358:20150121:134558.129 query [txnlev:0] [select housekeeperid,tablename,field,value from housekeeper where tablename in ('history','history_log','history_str','history_text','history_uint','trends','trends_uint') order by tablename]
 24358:20150121:134558.130 query without transaction detected
 24358:20150121:134558.130 query [txnlev:0] [delete from history_uint where itemid=27721 limit 5000]
 24358:20150121:134558.199 End of housekeeping_cleanup():5000
...

We see that only for first invocation the HK scans all history tables (using indexes) to know which itemid has history and which doesn't - let's call it as "delete queues cache".
For next invocations HS uses that internal cache to not scan the tables again.

I want to pay your attention to [removing deleted items data] stage.

For example when I upgraded to 2.4 I forgot to enable HK in Administration menu
At some point I noticed that I have HK table with 96M rows and it's ~5GB size.
Yes, it's unusual case - caused by wrong LLD rule with full OID tree scan and creating hundreds of items and deleting them afterward.
No history were collected at all for that items.

When I enabled HK - it took ~10 hours for first activity and deleted 99,99% rows of the HK table.
Of course it's not so bad, but still zabbix could avoid to waste that DB resources.

According to the debuglog above the HK did 95M SQLs like this (for different history tables of course):

delete from history where itemid=27853 limit 5000

Most big my history tables are 300G, 200G, 150G, so you can imagine.
To perform such each query HK needs to scan the table every time. Innodb pool doesn't help too much in such case.

Question is - why HK tries to delete history for items while it is practically aware that these items don't have history at all ("delete queues cache").
It would be logical to just delete corresponding records from the HK table.

Of course there is a question - is it possible to determine had an item any history before it was deleted when zabbix server was already running - I don't know.
I don't know does zabbix server keep the "delete queues cache" fresh for every received value or not - need help from devs.
If it is - then my suggestion could be implemented without significant code changes.

Feel free to ask me if something is not clear.

Comment by Oleksii Zagorskyi [ 2015 Feb 03 ]

(2) [frontend]
It's addition to (1) but should be taken into account in any case
When we delete an item in zabbix frontend - 7 records inserted to the HK table (5 for histories, 2 for trends).
We do that disregarding on current item type - to be sure that all possible history, which supposedly could be collected in the past when item supposedly was different type, will be deleted from database.

Such a logic leads to bad performance - described in (1)

Even if (1) will be fixed as suggested - here is next problem which will not be resolved.

Suppose I had an item in template (linked to hundreds of hosts) with "integer" type. After one year I changed history type for the item to "float".
I'm not absolutely sure, but looks like In this case existing history/trends for that hundreds integer items will stay in database FOREVER, which of course is very bad.

What I suggest is to change logic when we change item type or delete it.

When change item type frontend should do:

1.   delete from HK where tablename="new_item_type" and value="itemid"; <- sanity check for cases if item type was changed front and back
1.1. the same for trends if item is numeric                             <- sanity check for cases if item type was changed front and back
2.   INSERT INTO housekeeper (tablename,field,value,housekeeperid) VALUES ('previous_item_type','itemid','29848','323')
2.1. the same for trends if item is numeric

It will cover the problem I described in this comment above.

Then in case if item is deleted, we need to just insert 1 (or 2 if item is numeric) record to the HK table - for current item type only.

Comment by Oleksii Zagorskyi [ 2015 Nov 06 ]

Most likely all idea I posted here will be taken into account in ZBXNEXT-2860, but let's wait to make sure and then will close current report.

Generated at Sat Jun 14 17:44:10 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.