[ZBXNEXT-8651] add possibility to adjust alerts storage period Created: 2023 Aug 21  Updated: 2023 Aug 21

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Frontend (F), Server (S)
Affects Version/s: 6.0.20, 6.4.5
Fix Version/s: None

Type: New Feature Request Priority: Trivial
Reporter: Oleksii Zagorskyi Assignee: Valdis Murzins
Resolution: Unresolved Votes: 1
Labels: alerts, housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   

Alerts are removed from the DB by a foreign key referencing the event that this alert was sent about.

That's good, but there are installation scenarios when zabbix need to send many alerts per event. In this case alerts table may contain a lot of rows, which makes hard to manage it.

It would be good if zabbix could keep alerts more short period than event. Zabbix could have a separate setting for alerts housekeeping.

 






[ZBXNEXT-8483] Improve housekeeping of events from deleted triggers. Created: 2023 Jun 07  Updated: 2023 Nov 01

Status: Confirmed
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: 5.0.35, 6.0.18, 6.4.3
Fix Version/s: None

Type: Change Request Priority: Major
Reporter: Edgars Melveris Assignee: Alex Kalimulin
Resolution: Unresolved Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   

Currently when a trigger is deleted information about it is written into the "housekeeper" table.
Later housekeeper uses that data to remove data from "problems" table related to this trigger.
But "events" table is not cleaned up and events in it remain until the global housekeeping settings from Administration -> (General) -> Housekeeping -> Trigger data storage period
are reached. By default this is on year.
An installation with many changes (hosts removed etc) could end up with a huge amount of entries in events table that cannot be used in any way.
This is caused by ZBX-12975, if housekeeper deletes the events before configuration has been reloaded (trigger still exists in configuration cache), this can cause problems.

I believe this process could be improved by removing such orphaned events. For example 24h after the trigger has been deleted. 24h to be on the safe side that configuration has been reloaded.






[ZBXNEXT-8371] Add availability to keep events even after parent items\triggers are removed Created: 2023 Mar 29  Updated: 2023 Mar 29

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: 6.4.1rc1
Fix Version/s: None

Type: New Feature Request Priority: Minor
Reporter: Elina Kuzyutkina (Inactive) Assignee: Zabbix Development Team
Resolution: Unresolved Votes: 1
Labels: events, housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   

In the context of plans, of possiblity to import (and export) events from (to) third-party applications
Currently, the housekeeper clears events if their associated objects (hosts\items\triggers) are deleted
But these make it impossible to see the history of events by dynamic entities. For example, discovered by LLD. As a workaround, we can store discovered entities much longer after they are no longer discoverable. But this is a bad option - it can lead to difficulties in managing monitoring settings, and not far from performance problems.
It would be nice to be able to enable storing only events for a custom (or default by housekeeper) time after the objects themselves are deleted

Regards, Elina






[ZBXNEXT-8357] Proxy housekeeping Created: 2023 Mar 22  Updated: 2024 Apr 10  Resolved: 2023 May 21

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Proxy (P)
Affects Version/s: 6.0.14, 6.4.0
Fix Version/s: 6.0.18rc1, 6.4.3rc1, 7.0.0alpha1, 7.0 (plan)

Type: Change Request Priority: Medium
Reporter: Elina Kuzyutkina (Inactive) Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 2
Labels: housekeeper, proxy
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Causes
Duplicate
Sub-task
Team: Team A
Sprint: Sprint 98 (Mar 2023), Sprint 99 (Apr 2023), Sprint 100 (May 2023)
Story Points: 0.5

 Description   

Current proxy houskeeper logic:
Check if we have min(clock) older now-OfflineBufer*1h
If we have -> delete all such data and finish the housekeeping cycle
If we have min(clock) newer now-OfflineBufer*1h we delete 4 housekeepr periods at a time
* provided that this data was transferred to the server (<id from ids table for the proxy_history)

This logic creates a problem when the proxy for some reason receives data from the past. In the most negative case, the database can "grow" to "OfflineBuffer" period of data and never get less (untill truncate proxy_history and ids tables)

We can delete data older than offline buffer first and then checking minclock and deleting 4*Housekeeper period of data. That should be done in one query or at least in one transaction (in case if proxy receive another data from the past - it should be deleted after it is sent to zabbix server)



 Comments   
Comment by Vladislavs Sokurenko [ 2023 May 15 ]

Implemented in 

Comment by Arturs Dancis [ 2023 May 19 ]

Documentation updated:

  • Introduction > What's new in Zabbix (6.0.18, 6.4.3)
  • Installation > Upgrade notes (6.0.18, 6.4.3)
  • Appendixes > Process configuration > Zabbix proxy (6.0, 6.4, 7.0)




[ZBXNEXT-7827] Housekeeper gets SIGSEGV Created: 2022 Jun 28  Updated: 2022 Jul 01  Resolved: 2022 Jul 01

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: 4.0.41
Fix Version/s: None

Type: Change Request Priority: Trivial
Reporter: IBM iX DevOps Assignee: Andris Zeila
Resolution: Done Votes: 0
Labels: housekeeper, segfault
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:
  • CentOS Linux release 7.9.2009 (Core
  • HyperV hypervisor
  • SSD Disk
  • Postgresql 9

Attachments: File zabbix-housekeeper-sigsegv.log     File zabbix-housekeeper-strace.log     File zabbix-postgresql.log     File zabbix_server-housekeeper-debug4.log     File zabbix_server-housekeeper-debug5.log     File zabbix_server_objdump.tar.gz    

 Description   

After a standard OS security update the housekeeper process suddenly get's SIGSEGVed, see zabbix-housekeeper-sigsegv.log

Postgresql itself only sees a vanishing connection: zabbix-postgresql.log
It seems to happen right after the select itemid,min(clock) from history_uint group by itemid query finishes.

I attached via strace. But the log is not very helpfull: zabbix-housekeeper-strace.log

So far, a package downgrade, OS restart etc. has not helped. Befor the OS level updates postgresql was presenting an error to the housekeeper that did not seem to create any SIGSEGV:

2022-06-28 07:12:10 CEST STATEMENT:  delete from history_uint where itemid=74542 and clock<1655788142
2022-06-28 07:12:10 CEST ERROR:  could not access status of transaction 138739714
2022-06-28 07:12:10 CEST DETAIL:  Could not open file "pg_subtrans/0845": No such file or directory.

This postgresql error is no longer present after OS level updates and a reboot. But now housekeeper no longer works.



 Comments   
Comment by Andris Zeila [ 2022 Jun 29 ]

Could you check if 'select itemid,min(clock) from history_uint group by itemid' returns NULLs ? That would crash housekeeper, however for that to happen something is wrong with history_uint table.

If it crashes later, could you please get the crash with DebugLog 4 ? That might to localize the crash. Also attaching objdump (objdump -DSswx <server binary path>) might help with it.

Comment by IBM iX DevOps [ 2022 Jun 29 ]

The query does not return NULL. We'll keep you posted about debug leve and objdump.

Comment by IBM iX DevOps [ 2022 Jun 29 ]

I've logged the housekeeper with 4 zabbix_server-housekeeper-debug4.log and 5 zabbix_server-housekeeper-debug5.log . Also I've attached the objdump zabbix_server_objdump.tar.gz .

Comment by IBM iX DevOps [ 2022 Jun 29 ]

An update to 4.0.42 also did not help.

Comment by IBM iX DevOps [ 2022 Jun 29 ]

The query mentioned above returns this:

select itemid,min(clock) from history_uint group by itemid;

       itemid       |    min     
--------------------+------------
             307879 | 1655794653
             341398 | 1655794198
              55290 | 1655794593
...
             271657 | 1648622773
             239576 | 1655795928
(33327 rows)
Comment by IBM iX DevOps [ 2022 Jun 29 ]

The other queries are:

select itemid,min(clock) from history group by itemid;

 itemid |    min     
--------+------------
 328954 | 1653806554
 210645 | 1653806652
 278938 | 1653979329
 239961 | 1655793752
...
 281661 | 1653979541
 365947 | 1656419792
(16674 rows)
select itemid,min(clock) from history_str group by itemid;

 itemid |    min     
--------+------------
 258851 | 1655794340
 102834 | 1655796579
 303319 | 1656226519
...
 283074 | 1653979404
  83203 | 1655795654
 119252 | 1655796110
 277617 | 1653979618
(14938 rows)
Comment by Andris Zeila [ 2022 Jun 29 ]

Could you please add sorting to history_uint (I'm interested if there are any abnormal values in the lowest/highest range), and maybe explicitly check for null clock ?

select itemid,min(clock) from history_uint group by itemid order by min(clock);
select itemid from history_uint where clock is null;

Everything (backtrace, objectdump and logs) tells that it crashed when trying to convert a null clock to integer.

Comment by IBM iX DevOps [ 2022 Jun 29 ]
select itemid,min(clock) from history_uint group by itemid order by min(clock);

       itemid       |    min     
--------------------+------------
             209544 | 1550225701
             209541 | 1550225701
             243194 | 1571881325
...
             366315 | 1656504283
             366306 | 1656504583
             366311 | 1656504584
             366319 | 1656506719
 595882534286393346 |           
 595882534286393345 |           
(33333 rows)       

Looks like those last two values could be the culprit.

Comment by IBM iX DevOps [ 2022 Jun 29 ]
select itemid from history_uint where clock is null;

 itemid 
--------
(0 rows)
Comment by Andris Zeila [ 2022 Jun 29 ]

Right, that's were the crashes are coming from. Are those real items? How many null clock records are there? Do 'value' contents in those records look normal? The clock field should have 'not null' setting, so it's quite weird. At any rate you should be safe removing the offending records from history_uint.

Comment by IBM iX DevOps [ 2022 Jun 30 ]

Thanks. That fixed it!

As the data was "corrupt" in Postgres and a DELETE did not work, due to the pg_subtrans errors we had to do the following workaround:

  1. Stop Zabbix and webserver processes
  2. Dump the offending table
    pg_dump -cCxO -Fc --no-security-labels -t history_uint zabbix > history_uint.pg_dump
    
  3. Convert Dump to text (better for restore as it uses COPY instead of INSERT
    pg_restore history_uint.pg_dump > history_uint.sql
    
  4. Remov empty/wrong entries
    sed -i '/595882534286393346/d' history_uint.sql
    sed -i '/595882534286393345/d' history_uint.sql
    
  5. Restore the table
    pv /data/history_uint.sql | psql -U zabbix zabbix
    
  6. Start Zabbix and webserver processes

Again, thanks a lot for the help! It is very much appreciated!





[ZBXNEXT-6131] Performance improvement of "delete sql" of problem table in PostgreSQL. Created: 2020 Aug 13  Updated: 2020 Aug 13

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: 4.0.23
Fix Version/s: None

Type: Change Request Priority: Trivial
Reporter: Kazuo Ito Assignee: Andris Zeila
Resolution: Unresolved Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   
rc = DBexecute("delete from problem where r_clock<>0 and r_clock<%d", now - SEC_PER_DAY);

"r_clock<>0" does not use the index.
I think it's better to use "between".

I changed "delete" to "select" and got "sql explain".

zabbix=> explain analyze select * from problem where r_clock<>0 and r_clock < 1595774180;
                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Seq Scan on problem  (cost=0.00..29.63 rows=5 width=78) (actual time=0.871..0.871 rows=0 loops=1)
   Filter: ((r_clock <> 0) AND (r_clock < 1595774180))
   Rows Removed by Filter: 1042
 Total runtime: 0.947 ms
(4 行)

zabbix=> explain analyze select * from problem where r_clock between 1 and (1595774180 - 1);
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Index Scan using problem_2 on problem  (cost=0.00..9.17 rows=5 width=78) (actual time=0.031..0.031 rows=0 loops=1)
   Index Cond: ((r_clock >= 1) AND (r_clock <= 1595774179))
 Total runtime: 0.058 ms
(3 行)

actual time 0.871 -> 0.031






Take in consideration history storage period with 0 on housekeep override (ZBXNEXT-5176)

[ZBXNEXT-5205] Frontend part: do not override item's history/trends storage period by global housekeeping options if it's disabled in host level. Created: 2019 May 02  Updated: 2024 Apr 10  Resolved: 2019 Jul 04

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Frontend (F)
Affects Version/s: 4.0.6, 4.2.0, 4.4.0alpha1
Fix Version/s: 4.0.11rc1, 4.2.5rc1, 4.4.0alpha1, 4.4 (plan)

Type: Change Request (Sub-task) Priority: Trivial
Reporter: Miks Kronkalns Assignee: Miks Kronkalns
Resolution: Fixed Votes: 0
Labels: history, housekeeper, items, trends
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File item_list_no_override_is_shown.png     PNG File macro_override.png     PNG File misaligned_buttons.png     PNG File no_override_and_no_do_no_keep.png    
Team: Team A
Sprint: Sprint 51 (Apr 2019), Sprint 52 (May 2019), Sprint 53 (Jun 2019), Sprint 54 (Jul 2019)
Story Points: 1

 Description   

As proposed in master issue, there should be a chance to disable history/trend storage period for items in host/template level, selecting that as option in radio button, instead of typing "0" with hand.



 Comments   
Comment by Miks Kronkalns [ 2019 May 02 ]

Frontend part implemented in
 - 4.0 feature/ZBXNEXT-5176
 - 4.2 feature/ZBXNEXT-5176_4.2

Comment by Miks Kronkalns [ 2019 Jul 02 ]

Fixed in:

  • 4.0.11rc1 f2770b316ef
  • 4.2.5rc1 c9fc882527c
  • 4.4.0alpha1 (master) 5f0957e8233, 97259c1bda8
Comment by Miks Kronkalns [ 2019 Jul 03 ]

Documentation changes:





[ZBXNEXT-5176] Take in consideration history storage period with 0 on housekeep override Created: 2019 Apr 15  Updated: 2024 Apr 10  Resolved: 2019 Jul 04

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Frontend (F), Server (S)
Affects Version/s: 4.0.6, 4.2.0, 4.4.0alpha1
Fix Version/s: 4.0.11rc1, 4.2.5rc1, 4.4.0alpha1, 4.4 (plan)

Type: Change Request Priority: Major
Reporter: Markus Fischbacher Assignee: Andris Zeila
Resolution: Fixed Votes: 2
Labels: history, housekeeper
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Attachments: PNG File screenshot-1.png    
Issue Links:
Sub-task
part of ZBX-15222 Housekeeping not working Closed
Sub-Tasks:
Key
Summary
Type
Status
Assignee
ZBXNEXT-5205 Frontend part: do not override item's... Change Request (Sub-task) Closed Miks Kronkalns  
Team: Team A
Sprint: Sprint 51 (Apr 2019), Sprint 52 (May 2019), Sprint 53 (Jun 2019), Sprint 54 (Jul 2019)
Story Points: 1

 Description   

Steps to reproduce:

  1. Housekeeping override of History to 7d (because of TimescaleDB)
  2. Create an item with History storage period of 0 (zero)

Result:

The items configured history storage period of 0 is overridden by the global 7d.

Expected:

Even if this is plausible for most cases, IMO for the 0 value this should be considered as non-overrideable.

For example if you configure the master element for an Prometheus scrape and set it to 0 because there is alot of data you realy dont want to store them for 7d!

I think, this even was as expected in previous versions?



 Comments   
Comment by Markus Fischbacher [ 2019 Apr 15 ]

Currently my only workaround is to deactivate global override and individually set to 7d and master items to 0.

Comment by Dmitrijs Lamberts [ 2019 Apr 15 ]

Hello Markus,
Your statement is fully correct. But this is more a feature request rather a bug, so moved it here.

Additionally - I would say that this becomes a problem not only with Timescale support, but also without it since Zabbix is more and more about pre-processing and dependent items. Quite often users have a master item that gathers big amount of data which is being extracted by dependent items, and general Idea is to drop that huge data chunk in preprocessing stage without storing to database.

In a large environments quite often Override housekeeper is used, and that again will negatively affect such master items.

Comment by Glebs Ivanovskis [ 2019 Apr 15 ]

There is an alternative to global overrides — user macros.

Comment by Markus Fischbacher [ 2019 Apr 16 ]

@Glebs - that is my current workaround. I define an global macro {$DEFAULT_HISTORYPERIOD} = 7d and use use that in the items. Actually it's one of the first things I change in all my installations and on all Templates/Items. TBT I often have thought to bring that in for a change request.

But i still see how it is better to provide the global override - it just should honour the 0 setting!

@Dmitrijs - I thought it would fit better as a bug as I really thought it worked in previous versions. But I'm happy you see that as a problem too.

Comment by Glebs Ivanovskis [ 2019 Apr 16 ]

Official templates don't always follow the best practices of template building. As far as I recall discussions during ZBXNEXT-1675 implementation, it was decided not to use macros in update intervals and history/trend storage settings in official templates in order to keep them "humane" and easier to understand for Zabbix beginners. But you can still try to convince Zabbix to do this by filing a feature request and gathering few votes. Sometimes people do change their minds.

Comment by Alexei Vladishev [ 2019 Apr 26 ]

I think that Zabbix must not overwrite history storage period for items configured to have no history (i.e. item history period set to 0).

Comment by Edgars Melveris [ 2019 Apr 26 ]

Another option could be to override only if the item period is larger. eg change the terminology there to "Maximum history period" or similar. This might be useful in some use cases, where no partitioning is used, but user would still wants smaller history period for some items.
Example:
Override max history to 30 days, but user has some items, that update really often, but do not require that much history. For such items, 1 day might be enough history and the setting might be honored in this case.

Comment by Miks Kronkalns [ 2019 Jul 02 ]

Fixed in:

  • 4.0.11rc1 f2770b316ef
  • 4.2.5rc1 c9fc882527c
  • 4.4.0alpha1 (master) 5f0957e8233, 97259c1bda8
Comment by Miks Kronkalns [ 2019 Jul 03 ]

Documentation changes:





[ZBXNEXT-4949] More granular housekeeper control Created: 2019 Jan 09  Updated: 2019 Jan 09

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Change Request Priority: Major
Reporter: Sean Nienaber Assignee: Andris Zeila
Resolution: Unresolved Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

We're in the process of preparing for an upgrade from v3 to v4 and have an 80GB DB, this is going to take about a day of downtime to upgrade the DB.

When digging into the data, we've found massive amount of data in the history and events tables which we can remove by reducing the overrides in the web interface.

The housekeeper however runs hourly at a minimum and clears out only a 4 hour window at a time.

I would like to see the following added to either the web interface for server conf:

  • HousekeepingDeleteWindow - Sets the window for delete window, useful values between 1h and 24h
  • HousekeepingFrequency - Change this from hourly to minutes, allowing values between 0 for running continually and 1440 for daily.

Thanks,

S






[ZBXNEXT-3526] allow keeping "n" last item values Created: 2016 Nov 08  Updated: 2016 Nov 09

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: 3.2.1
Fix Version/s: None

Type: New Feature Request Priority: Minor
Reporter: richlv Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: history, housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

currently we can set time period in days for which to keep item values - anything older gets nuked by the housekeeper.

there could be items that get values infrequently and at nearly unpredictably times - maybe a trapper item that we are sending software version to only during the upgrade process.

for such items, we might only want to keep last n values - one, two, three etc. we would have no idea how they would end up being distributed in time - could be hours, could be years.
gathering the values all the time would be useless, increase the db size and make the data much less clear. we would have to find the value change among a huge amount of data instead of seeing a few useful values with proper timestamps.

it would be useful to set the retention for such items not to a time period, but to the number of values to be kept.



 Comments   
Comment by Alexei Vladishev [ 2016 Nov 09 ]

I doubt it will ever be implemented due to performance considerations.

Comment by richlv [ 2016 Nov 09 ]

thanks for the response - performance is indeed the biggest concern here
could other backends for history in the future help somewhat ?
assuming only the zabbix server updates the data, it could also keep a record to make housekeeper's job easier, but that might be to complicated.





[ZBXNEXT-3469] clean up network discovery information Created: 2016 Oct 03  Updated: 2016 Oct 04

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Frontend (F), Server (S)
Affects Version/s: 3.2.1
Fix Version/s: None

Type: Change Request Priority: Minor
Reporter: richlv Assignee: Unassigned
Resolution: Unresolved Votes: 4
Labels: housekeeper, networkdiscovery
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

network discovery information about discovered hosts/services is never cleaned up automatically, and there is no way to clean it manually.

while manual clearing would not be consistent with other entities (events, sessions etc - everything housekeeper deals with), automatic clearing could be performed by the housekeeper.

it could be just a single extra option, taking care of both hosts (dhosts table) and services (dservices table). by default set to 1 year, it would clean out information about hosts that are down for > 1 year - if a host has been up during that period, it would not be removed. service information would only be removed together with hosts, never alone.

housekeeper should probably also clean out any dhost/dservice information when a dhosts entry exists in a network range, not covered by any discovery rule (for example, if the network range of a rule was changed to exclude some addresses).

discovery rule status (enabled/disabled) probably should not affect this - although some users might prefer disable/enable cycle to clear out that information manually.

this issue is a followup of ZBX-10480.



 Comments   
Comment by richlv [ 2016 Oct 04 ]

having a way to manually clean things up would still be desired - for example, one might test discovery actions, then start fresh with an action that only reacts to "discovered" events - that's not possible without direct db changes.

as an example of that, see https://www.reddit.com/r/zabbix/comments/53syqy/discovery_rule_that_respects_host_groups/





[ZBXNEXT-2935] Housekeeper option to remove orphan items from history Created: 2015 Aug 31  Updated: 2015 Sep 01

Status: Reopened
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Change Request Priority: Minor
Reporter: Harry Coin Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: database, housekeeper, orphaned
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

All


Attachments: JPEG File Screenshot_2015-08-31_10-49-57.jpg    

 Description   

I've found over a period of years a zabbix-server + mysql combination will cause this query, which should be empty, to nevertheless find a dozen or so items corresponding to millions of orphaned history rows. Housekeeping runs complete without removing these.

select distinct(itemid) FROM `history` where itemid not in (SELECT itemid FROM `items`);

I could be missing some zabbix use case, zabbix is not my primary focus. Maybe there's a 'retired item' table somewhere. But if I'm not, for those of us who would rather have the database space than retain a history of deleted items, consider adding a housekeeping flag to delete such orphans automatically. Perhaps a database rule to cause item deletes to cascade delete related history entries?

Finally, my appreciation for creating and supporting such a useful tool!



 Comments   
Comment by richlv [ 2015 Aug 31 ]

see housekeeper table, it holds a list of deleted items and housekeeper deletes all data for those items following different rules. i guess we can close this issue then - please reopen if there's still a need to do something

Comment by Harry Coin [ 2015 Aug 31 ]

Kindly note the housekeeper has been running for the same years these orphan items exist. Before posting I double checked to make sure the housekeeper completed its run, changing the number of items to process per run to unlimited. Even when that had completed, these history entries describing events that happened to itemids not in the items table remained. I rebooted a few times, let the housekeeper complete, and yet millions of records referring to item record ids not in the items table remain.

So, closing this item on the basis the housekeeper would take care of it needs further explanation.

Comment by Harry Coin [ 2015 Aug 31 ]

See reply above.

Comment by richlv [ 2015 Aug 31 ]

in general, it would be best to discuss things like this in one of the options at https://www.zabbix.org/wiki/Getting_help - irc would be a great place - but in any case, you did not specify zabbix version. if it's a recent one and you have upgraded, check that housekeeper is enabled at all in administration -> general -> housekeeping

Comment by Harry Coin [ 2015 Aug 31 ]

I do think the housekeeper has been properly enabled and configured. Note the version is also in the shot, 2.2.

Comment by Harry Coin [ 2015 Aug 31 ]

Really, just a feature request for an option to delete all references everywhere to items that no longer exist.

Comment by richlv [ 2015 Sep 01 ]

as noted, that is supposed to happen already. please use https://www.zabbix.org/wiki/Getting_help to discuss the details, until a specific bug or feature request can be reported.

Comment by Harry Coin [ 2015 Sep 01 ]

Also as noted, though it was supposed to happen, it was not happening. So here we are on this page where I think it's created to ask for features.

To be clear: The feature requested is a server command line one-of recovery option to delete every orphan item in the database and generally not proceed until the database is in a known consistent state.

Having done this manually, finding a few thousand orphan records in trends* and several million orphan records in history* not mentioned in houskeeper and also not in items I think I've presented a reasonable use case. Look at it this way, if the server flag finds nothing to do it can act as an integrity check and so give confidence the server is about to begin operations on a known-good platform.

It's a reasonable enough feature request IMHO.

Comment by Oleksii Zagorskyi [ 2015 Sep 01 ]

Not absolutely sure, but there can be other use cases.
For example we delete a child node (which passed its history to master) on master.
As I recall it was discussed already in other issues.





[ZBXNEXT-2860] Improve housekeeper algorithm Created: 2015 Jun 29  Updated: 2015 Nov 05

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: API (A), Frontend (F), Server (S)
Affects Version/s: 2.4.5
Fix Version/s: None

Type: Change Request Priority: Major
Reporter: Alexey Pustovalov Assignee: Unassigned
Resolution: Unresolved Votes: 3
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   

Currently Zabbix server retrieves all itemid's min clock from history* and trends* table. Also it uses housekeeper table to remove non-existing items. We need to optimize the process and drop housekeeper table at all and use only history and trends tables data.



 Comments   
Comment by Oleksii Zagorskyi [ 2015 Jun 30 ]

In ZBX-9278 (which I posted earlier ) I suggested some exact solutions for almost the same aspects as Alexey mentioned here.

I cannot imagine how we could drop the housekeeper table at all.

I'd suggest to close current report as duplicate and move unique details to the ZBX-9278

Comment by Oleksii Zagorskyi [ 2015 Nov 03 ]

Now I understand how Alexey's scenario would work and I agree that it would be nice a for following scenario:
Existing zabbix installation, where long time ago type of information has been changed for many items (itemIDs), and these itemIDs are still in zabbix configuration.

How to "clean" currently existing installations? - yes, remove values for every history/trends table for items which do not exist already for this table type.
My suggestion in ZBX-9278 would not help for this scenario.

Comment by Oleksii Zagorskyi [ 2015 Nov 05 ]

Before working on this issue, take a look please to ZBX-10012 discussion.





[ZBXNEXT-2661] Please add warning that housekeeper is disabled to the upgrade procedures. Created: 2015 Jan 06  Updated: 2015 Jan 09  Resolved: 2015 Jan 09

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Documentation (D)
Affects Version/s: None
Fix Version/s: None

Type: Change Request Priority: Trivial
Reporter: Oleg Ivanivskyi Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Zabbix 2.2 and 2.4


Issue Links:
Duplicate

 Description   

Not everyone reads the upgrade notes. There is a problem of lack free disk space/big database size due to the disabled housekeeper after the upgrade (especially from 2.0 to 2.4).

I believe, we should add a warning like "Housekeeper is disabled by default after upgrading to 2.2. The desired housekeeper functionality should be checked and enabled manually if it is necessary." to the upgrade procedures also:
https://www.zabbix.com/documentation/2.2/manual/installation/upgrade
https://www.zabbix.com/documentation/2.4/manual/installation/upgrade



 Comments   
Comment by Oleksii Zagorskyi [ 2015 Jan 07 ]

I'd support it.
I know several zabbix users where such a message would provide positive effect.

Comment by Martins Valkovskis [ 2015 Jan 07 ]

Added to:

https://www.zabbix.com/documentation/2.2/manual/installation/upgrade
https://www.zabbix.com/documentation/2.4/manual/installation/upgrade

RESOLVED.

sasha Housekeeper is enabled by default, but disabled after upgrade procedure.

REOPENED

martins-v Revised, please review.

RESOLVED.

sasha CLOSED





[ZBXNEXT-2572] Item Delete - Long periods of high internal housekeeper process usage Created: 2014 Nov 06  Updated: 2014 Nov 06

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: 2.2.6
Fix Version/s: None

Type: New Feature Request Priority: Major
Reporter: Dimitri Bellini Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Redhat Enterprise 6.x x64 - Percona MySQL 5.6


Attachments: PNG File zabbix_housekeeping_trouble.png    
Issue Links:
Duplicate
is duplicated by ZBXNEXT-2571 Item Delete - Long periods of high in... Closed

 Description   

Actually the Zabbix Housekeeper is working very well with normal operations, cleaning item values but i can't say the same when we delete some prototype items or simple items.

During an official Zabbix issue we are now know the right meaning of the MaxHousekeeperDelete parameter, this variable only affect the amount of value of a deleted item per Housekeeper run.

For example if i have a template for Cisco Switches with some Item prototype related to the port metrics, you will probably face this numbers:
N.1 Item prototype expanded with 100 switch ports x 10 monitored Switches => 1000 Discovered Items x Update interval
In this case Zabbix Housekeeper wil try to remove n. MaxHousekeeperDelete (ex.500) x Total Discovery Items (1000) => 500k history value in a single run.

So the MaxHousekeeperDelete don't limit the query for a single Housekeeper run and in some case this logic can dramatically increase the workload of Zabbix Server.

Please provide a better documentation of the MaxHousekeeperDelete or much more Predictable logic of Housekeeper process operation.



 Comments   
Comment by richlv [ 2014 Nov 06 ]

i'm inclined to close this as a duplicate of ZBXNEXT-2570 - would you agree ?





[ZBXNEXT-2195] housekeeping for maintenance periods Created: 2014 Mar 11  Updated: 2014 Mar 11

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Frontend (F), Server (S)
Affects Version/s: None
Fix Version/s: None

Type: New Feature Request Priority: Minor
Reporter: richlv Assignee: Unassigned
Resolution: Unresolved Votes: 5
Labels: housekeeper, maintenance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

maintenance periods currently pile up indefinitely. in larger installations over time this can lead to very bad page performance or event maintenance config page not opening at all (see ZBX-7930).

it would be useful to add an option in administration -> general -> housekeeping to remove maintenance periods that have their global expiry date n days in the past (no need to compute all the subperiods, limiting by the main period should be sufficient).






[ZBXNEXT-2016] Remove the ZBX_HISTORY_DATA_UPKEEP constant Created: 2013 Nov 14  Updated: 2014 Nov 06  Resolved: 2013 Nov 27

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Frontend (F)
Affects Version/s: 2.2.1
Fix Version/s: 2.2.1, 2.3.0

Type: Change Request Priority: Major
Reporter: Pavels Jelisejevs (Inactive) Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: frontend, history, housekeeper, item, trends
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Currently we can't set a global history storage period when internal housekeeping is disabled. If we remove this limitation, this parameter could be used as an alternative to ZBX_HISTORY_DATA_UPKEEP. We'll also need to do this for trends.

We'll need to make the following changes:

  • Allow to set global history and trend keeping periods even if internal housekeeping is disabled.
  • Change "Housekeeper" to "Housekeeping" everywhere on that page to make it clear, that it doesn't only affect the internal housekeeper.
  • Change "Enable housekeeper" to "Internal housekeeping".
  • Change "Keep data for" to "Data storage period".
  • Change "Keep history" to "History storage period" and "Keep trends" to "Trend storage period" in item and item prototype forms, mass update form and error messages.
  • Change "Keep history" and "Keep trends" to just "History" and "Trends" in the item configuration filter.


 Comments   
Comment by Pavels Jelisejevs (Inactive) [ 2013 Nov 14 ]

This needs to be done before continuing work on ZBX-4063.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Nov 25 ]

RESOLVED in svn://svn.zabbix.com/branches/dev/ZBXNEXT-2016.

Comment by Ivo Kurzemnieks [ 2013 Nov 26 ]

(1) Please, correct get_request() to getRequest() in the changed lines:
adm.housekeeper.php: 92,95,105-113,165,169
chart3.php: 139
chart7.php: 86

jelisejev RESOLVED in r40537.

iivs CLOSED.

Comment by Ivo Kurzemnieks [ 2013 Nov 27 ]

TESTED

Comment by Pavels Jelisejevs (Inactive) [ 2013 Nov 28 ]

I've made a minor change before merging: changed "Internal housekeeping" to "Enable internal housekeeping". It clearer and more consistent with other labels that way.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Nov 28 ]

Available in 2.2.1rc1 r40559 and 2.3.0 r40560.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Nov 28 ]

(2) Noted it in:

We should probably also update the housekeeper settings docs.

martins-v Updated:

RESOLVED.

jelisejev These two changes need to be documented as well:

Change "Keep history" to "History storage period" and "Keep trends" to "Trend storage period" in item and item prototype forms, mass update form and error messages.
Change "Keep history" and "Keep trends" to just "History" and "Trends" in the item configuration filter.

martins-v Updated documentation:

The respective pages for 2.4, 3.0 have also been updated in this way. RESOLVED.

jelisejev CLOSED.

Comment by richlv [ 2014 Feb 20 ]

(3) also should be documented in :

martins-v For 2.2 noted that the constant is removed in 2.2.1; removed entirely from 2.4. RESOLVED.

jelisejev CLOSED.





[ZBXNEXT-1649] Fine grained control of tasks performed by housekeeper Created: 2013 Mar 07  Updated: 2018 May 11  Resolved: 2014 Jan 31

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Frontend (F), Server (S)
Affects Version/s: None
Fix Version/s: 2.1.0

Type: New Feature Request Priority: Trivial
Reporter: Yoav Steinberg Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: housekeeper, partitioning
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File fine_grained_disable_housekeeping.patch    
Issue Links:
Causes
causes ZBX-14312 Proxy->Agent communication drops inte... Closed
Duplicate
is duplicated by ZBXNEXT-1539 Housekeeper does not delete deleted i... Open
is duplicated by ZBXNEXT-699 Support for global history (and poten... Closed
is duplicated by ZBXNEXT-953 Zabbix write into housekeeper table w... Closed
is duplicated by ZBXNEXT-30 simplify housekeeper configuration Closed
is duplicated by ZBXNEXT-1039 Hide keep history and keep trends fro... Closed

 Description   

Disabling the housekeeper becomes a recommendation when it can't keep up with the size of the history tables. In such cases external housekeeping needs to be performed (like deleting table partitions). In many situations I'd like to keep most of the housekeeper functionality for tables that don't cause any problems (such as trends or alerts) while using an external cleanup for the problematic tables (history_uint) for example.
I suggest adding fine grained control over which tasks are performed by the housekeeper. Something like:
DisableHousekeeping=history_uint,history
instead of simply
DisableHousekeeping=1



 Comments   
Comment by Yoav Steinberg [ 2013 Mar 07 ]

This can also resolve ZBXNEXT-953. In the supplied patch we delete from the housekeeper table everything related to the disabled (partitioned) tables, so the housekeeper table doesn't keep filling up.

Comment by Yoav Steinberg [ 2013 Mar 07 ]

Patch for this feature.

Comment by Yoav Steinberg [ 2013 Mar 07 ]

Feature implementation.

Comment by Oleksii Zagorskyi [ 2013 Mar 08 ]

I recall similar discussion (not very sure) here in Jira or on zabbix forum, but cannot find it at the moment.
added:
ohh, probably I meant ZBXNEXT-1539

Comment by Yoav Steinberg [ 2013 Mar 10 ]

Indeed this is similar. In ZBXNEXT-1539 there are two patches:
One makes sure we don't query from history tables more data than required which seems like a good idea in any case (regardless of partitioning).
The other patch in removes some work from the housekeeper but doesn't seem to handle other the work done on the "housekeeping_cleanup()" and doesn't provide enough flexibility in my opinion regarding what's partitioned.

Comment by Alexei Vladishev [ 2013 Mar 26 ]

Draft specification is available here https://www.zabbix.org/wiki/Docs/specs/ZBXNEXT-1649 for public review.

Comment by Yoav Steinberg [ 2013 Mar 27 ]

The specs seems promising and address the problem perfectly. One thing that comes to mind is that if there's willingness to redesign the housekeeper to that extent it might be worth considering adding partitioning itself as a feature to zabbix and make the housekeeper responsible for clearing the partitions.

In my main installation partition management is handled by external cron jobs. I've seen blog posts about how to manage partitions by adding stored procedures to the database too. But ideally this will all be handled by the housekeeper - perhaps as a second step or long term goal.

Comment by Alexei Vladishev [ 2013 Apr 03 ]

The major problem with partitioning is that it is database engine specific. Some engines (think of most of NoSQL solutions) handle data expiration differently with no or limited support of data partitioning.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 11 ]

(1) Let's increase the with of the label column on the housekeeper setting page so that the text would fit properly.

oleg.egorov RESOLVED IN r35001

jelisejev If you need to define a page-specific style, add a class to the widget div, not tab div.

oleg.egorov RESOLVED IN r35034

jelisejev To be more consistent with other class names please rename the class to "hk".

oleg.egorov RESOLVED IN r35062

jelisejev Please review some minor changes in r35072.

oleg.egorov CLOSED

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 11 ]

(2) Some text corrections:
Remove IT services history -> Remove IT service history
Remove IT services history older than (in days) -> Remove IT service history older than (in days)
Remove autoregistration events and alerts older than (in days) -> Remove auto-registration events and alerts older than (in days)
### - Overridden by global housekeeper settings. -> Overriden by global housekeeper settings (#### day/days)

The link in the last message remains as is. Also add a translation context to the "global housekeeper settings" link so it could be properly translated.

oleg.egorov RESOLVED IN r35001

jelisejev
1. _s('Overridden by') - don't use the _s() helper here, the string doesn't have any arguments.
2. _('global housekeeper settings') - add a translation context to this string so that translators know, that this string is a part of a sentence.
3. when translating the "1 day/2 days" string don't substitute the "days" word as a parameter, use the _n() helper: it also supports parameters.
4. Don't use original gettext functions like ngettext(), use our helper wrappers instead.
5. In line 374: $data['config'] = select_config(); - the $data variable contains data passed to the view from the controller. To store configuration data - introduce a separate variable.

oleg.egorov RESOLVED IN r35047

jelisejev Please review r35056. I've corrected the usage of the _n() function and split the "Overriden by global housekeeper settings (# day/days)" string into "Overriden by global housekeeper settings" and "# day/days" to simplify translation. I've also added code comments to out getttext functions so that their usage would be clearer.

oleg.egorov CLOSED

jelisejev Missed one more thing, these strings must also be corrected in validation and audit messages in adm.housekeeper.php. REOPENED.

jelisejev Another minor correction in r35076 and 35077.

oleg.egorov CLOSED

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 11 ]

(3) In configuration.item.edit.php:

1. No need for "== 1" in "$data['config']['hk_history_global'] == 1", just "$data['config']['hk_history_global']".
2. Please use the methods available in CWebUser: CWebUser::getType() instead of CWebUser::$data['type'].

oleg.egorov RESOLVED IN r35001

jelisejev CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 11 ]

(4) In administration.general.housekeeper.edit.php

1. Please format the include according to the guidelines https://www.zabbix.org/wiki/Docs/specs/coding_style#File_includes
2. The CCheckBox::__construct method can accept a boolean value for the $checked parameter. No need to pass "yes" or "no".

oleg.egorov RESOLVED IN r35001

jelisejev No need to write the full "($this->data['config']['hk_audit_mode'] == 0) ? false : true" expression, you can just write "!$this->data['config']['hk_audit_mode']".

oleg.egorov RESOLVED IN r35035

jelisejev CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 11 ]

(5) In CChart and CPie

1. Same as (3.1).
2. select_config(true) - true is already the default value for the first parameter, not need to pass it explicitly.
3. Let's pass the value of "hk_trends" to the CChart/CPie objects from the outside, either in the construct method or through a property. It's bad design for the graph object to access configuration data directly.

oleg.egorov RESOLVED IN r35001

jelisejev I meant that the "history" parameter should be passed to the CChart() and CPie() object when creating them, it shouldn't load them from the database itself. Also, if you add a property to a class, make sure to define it explicitly, with a corresponding PHP doc comment.

oleg.egorov RESOLVED IN r35035

jelisejev
1. Name "setHistory" is uncanonical, please rename it to something more meaningful, like "historyPeriod"; also explicitly define the propetry in both classes with the corresponding PHPDoc.
2. Add a PHP doc to the setHistory() method. Also don't forget to rename it when renaming the property.
3. In CChart.php line 221 "if ($this->setHistory) {" setHistory can also be "0", that means we need to always use history. You'll need to replace the if statement with "if ($this->setHistory !== null) {". Same thing in CPie.
4. When creating chart object, the setHistory property could be set simpler like this:

$graph->setHistory(($config['hk_history_global']) ? $config['hk_history'] : null);

Also note, that it's more logical to pass NULL in global history overriding is disabled.

oleg.egorov RESOLVED IN r35067

jelisejev Please review my changes in r35073.

oleg.egorov CLOSED

Comment by Andris Zeila [ 2013 Apr 12 ]

Server side fixed and ready for testing in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-1649 r34985

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 15 ]

(10) Charts must use the "hk_history" field, not "hk_trends".

oleg.egorov RESOLVED IN r35035

jelisejev CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 15 ]

(11) [D] We'll need to remove the ZBX_HISTORY_DATA_UPKEEP constant from the docs

https://www.zabbix.com/documentation/2.2/manual/web_interface/definitions?s[]=zbx&s[]=history&s[]=data&s[]=upkeep

jelisejev We decided to leave the ZBX_HISTORY_DATA_UPKEEP constant as is for now. CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 15 ]

(12) The "Overriden by global housekeeper settings (#### day/days)" must be displayed in item prototype and template item forms.

oleg.egorov RESOLVED

jelisejev Sorry, typo, they must NOT be displayed.

oleg.egorov RESOLVED IN r35064

jelisejev CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 16 ]

Frontend TESTED.

Comment by Alexander Vladishev [ 2013 Apr 18 ]

Server side successfully tested!

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 18 ]

(15) [I] The upgrade patch only saves old configuration values for the "hk_events_trigger" field.

sasha RESOLVED in r35108.

jelisejev CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 22 ]

(16) I have the "Enable housekeeping" and "Override item history period" checkboxes for history enabled. When I unset the "Enable housekeeping" and save the form, the "Override item history period" checkbox is also unset. It shouldn't be updated.

oleg.egorov RESOLVED IN r35193

jelisejev CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 22 ]

(17) In default.css:

1. It would be better if instead of duplicating styles for .element-group and .element-group-first you would assign both classes to the first group, set the border to "0" in .element-group and then undo it .element-group-first. That way all of the other styles won't need to be duplicated.
2. Instead of introducing the "hkElementLabel" class you can use a generic selector, like ".hk .element-group label" and remove the extra div.

oleg.egorov RESOLVED IN r35183

jelisejev Regarding 1 - that's not exactly what I meant. Please review my changes in r35212.

oleg.egorov CLOSED

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 22 ]

(18) In aministration.general.housekeeper.edit.js.php

Instead of manually ednabling and disabling inputs when settings checkboxes, you can just set the checkbox and then call trigger('change') to do everything else.

oleg.egorov RESOLVED IN r35205

jelisejev CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 23 ]

Frontend TESTED. Please close (17) before merging.

Comment by Oleg Egorov (Inactive) [ 2013 Apr 23 ]

Available in version pre-2.1.0 (trunk) r35212
CLOSED

Comment by Pavels Jelisejevs (Inactive) [ 2013 Apr 24 ]

(19) Documentation needs to be updated.

martins-v Updated:

https://www.zabbix.com/documentation/2.2/manual/introduction/whatsnew220#finer_control_over_housekeeping_tasks
https://www.zabbix.com/documentation/2.2/manual/installation/upgrade_notes_220#housekeeper_changes
https://www.zabbix.com/documentation/2.2/manual/web_interface/frontend_sections/administration/general
https://www.zabbix.com/documentation/2.2/manual/appendix/config/zabbix_server (removed DisableHousekeeping parameter)

sasha CLOSED

Comment by Oleg Egorov (Inactive) [ 2013 Apr 30 ]

FIXED IN version pre-2.1.0 (trunk) r35343

Comment by richlv [ 2013 Apr 30 ]

(21) currently one label says "Override item trends period"

should be changed to "Override item trend period" - sounds better and is more consistent with "Keep trend data for" string

oleg.egorov FIXED IN version pre-2.1.0 (trunk) r35368

CLOSED

Comment by Oleg Egorov (Inactive) [ 2013 Apr 30 ]

FIXED IN version pre-2.1.0 (trunk) r35368
CLOSED

Comment by Andris Zeila [ 2013 May 10 ]

(22) [S] PostgreSQL errors when running server:

  7209:20130509:180125.988 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  column "i.history" must appear in the GROUP BY clause or be used in an aggregate function
 [select i.itemid,min(t.clock),i.history from history t,items i where t.itemid=i.itemid group by i.itemid]
  7219:20130509:180125.989 server #25 started [proxy poller #1]
  7209:20130509:180125.989 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  column "i.history" must appear in the GROUP BY clause or be used in an aggregate function
 [select i.itemid,min(t.clock),i.history from history_str t,items i where t.itemid=i.itemid group by i.itemid]
  7194:20130509:180125.989 server #0 started [main process]
  7209:20130509:180125.989 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  column "i.history" must appear in the GROUP BY clause or be used in an aggregate function
 [select i.itemid,min(t.clock),i.history from history_log t,items i where t.itemid=i.itemid group by i.itemid]
  7220:20130509:180125.989 server #26 started [self-monitoring #1]
  7209:20130509:180125.989 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  column "i.history" must appear in the GROUP BY clause or be used in an aggregate function
 [select i.itemid,min(t.clock),i.history from history_uint t,items i where t.itemid=i.itemid group by i.itemid]
  7209:20130509:180125.990 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  column "i.history" must appear in the GROUP BY clause or be used in an aggregate function
 [select i.itemid,min(t.clock),i.history from history_text t,items i where t.itemid=i.itemid group by i.itemid]
  7215:20130509:180125.990 server #21 started [history syncer #3]
  7209:20130509:180125.990 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  column "i.trends" must appear in the GROUP BY clause or be used in an aggregate function
 [select i.itemid,min(t.clock),i.trends from trends t,items i where t.itemid=i.itemid group by i.itemid]
  7209:20130509:180125.991 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR:  column "i.trends" must appear in the GROUP BY clause or be used in an aggregate function
 [select i.itemid,min(t.clock),i.trends from trends_uint t,items i where t.itemid=i.itemid group by i.itemid]
  7209:20130509:180125.999 housekeeper deleted: 0 records from history and trends, 0 records of deleted items, 0 events, 0 sessions, 0 service alarms, 0 audit items

wiper RESOLVED in r35560

sasha TESTED

<richlv> this was merged -> CLOSED

Comment by Andris Zeila [ 2013 May 10 ]

Released in pre-2.1.0 r35581

Comment by richlv [ 2013 Sep 11 ]

this might have resulted in a translation-related regression : ZBX-6978

Comment by Oleksii Zagorskyi [ 2013 Sep 18 ]

Why we didn't mention in doc that in 2.2 the housekeeper will have better performance ?
https://www.zabbix.com/documentation/2.2/manual/introduction/whatsnew220#finer_control_over_housekeeping_tasks

The spec https://www.zabbix.org/wiki/Docs/specs/ZBXNEXT-1649#Performance_improvements says:

it should process one table after another (history->history_uint->etc) instead of grouping operations on per-itemid. In this case DB cache will be used with much better efficiency thus greatly reducing disk IO and seek operations.

Comment by Andris Zeila [ 2013 Sep 18 ]

Dunno, but there were multiple design changes that should positively affect the performance. Although the resulting performance improvements were not tested.

Comment by Oleksii Zagorskyi [ 2014 Feb 07 ]

I sent it once by email, but wort to mention it also here:

just got a stat from a prod installation (a middle power VM).
previously (in 2.0) housekeeper worked for 3,5 hours deleting ~ 3,3M of records.
now (in 2.2) it spends 1,5 hours for the same 3,3M records.
conclusion -> housekeeping in 2.2 is better

Comment by Javier [ 2014 May 02 ]

I'm still seeing same error on Zabbix 2.2.3:

140502 14:07:48 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. The statement is unsafe because it uses a LIMIT clause. This is unsafe because the set of rows included cannot be predicted. Statement: delete from history_uint where itemid=30748 limit 500

So, it's really fixed for Zabbix 2.2.0 ?

Thanks.

Comment by Oleksii Zagorskyi [ 2015 Apr 10 ]

Updated stat (server 2.2.9) as for my previous comment, just in case:

  3195:20150410:045134.982 housekeeper [deleted 5108319 hist/trends, 46500 items, 9427 events, 0 sessions, 0 alarms, 0 audit items in 7999.990979 sec, idle 1 hour(s)]
  3195:20150410:055134.982 executing housekeeper
  3195:20150410:075934.660 housekeeper [deleted 4929723 hist/trends, 46500 items, 10737 events, 0 sessions, 0 alarms, 0 audit items in 7679.676903 sec, idle 1 hour(s)]
  3195:20150410:085934.660 executing housekeeper
  3195:20150410:111026.979 housekeeper [deleted 4693741 hist/trends, 80009 items, 10044 events, 0 sessions, 0 alarms, 3 audit items in 7852.318823 sec, idle 1 hour(s)]

real NVPS year ago was ~210, now it's ~360

Comment by Oleksii Zagorskyi [ 2015 Oct 25 ]

Wort to mention that related changes were done further in ZBXNEXT-2016





[ZBXNEXT-1544] retention period for it services (service_alarms) Created: 2012 Dec 13  Updated: 2012 Dec 24

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Change Request Priority: Trivial
Reporter: richlv Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: housekeeper, itservices
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

it service states (services_alarms table) currently pile up forever and are not cleared by the housekeeper. would be useful to have a separate parameter to define how long this data should be kept






[ZBXNEXT-1539] Housekeeper does not delete deleted items/hosts which should be cleaned up and other cleanup that should be done Created: 2012 Dec 10  Updated: 2013 Dec 28

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: 2.0.4
Fix Version/s: None

Type: New Feature Request Priority: Trivial
Reporter: Boris Manojlovic Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: housekeeper, partitioning, patch, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

All systems with partitioned databases should have this functionality included


Attachments: File zabbix-2.0.4_housekeeping_partitioned-002.patch     File zabbix-2.0.4_housekeeping_partitioned.patch    
Issue Links:
Duplicate
duplicates ZBXNEXT-1649 Fine grained control of tasks perform... Closed

 Description   

zabbix_server do not clean up any auxilary table which should be cleaned up because of performance reasons

Patch included



 Comments   
Comment by Boris Manojlovic [ 2013 Jan 10 ]

This additional patch removes full table scan of all tables in case of partitioned system.

Comment by Oleksii Zagorskyi [ 2013 Mar 08 ]

ZBXNEXT-1649 is related

Comment by Alexei Vladishev [ 2013 Mar 12 ]

It will be resolved under ZBXNEXT-1649.





[ZBXNEXT-953] Zabbix write into housekeeper table with disabled housekeeper option Created: 2011 Sep 08  Updated: 2014 Sep 26  Resolved: 2014 Sep 26

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Frontend (F), Server (S)
Affects Version/s: None
Fix Version/s: None

Type: New Feature Request Priority: Major
Reporter: Alexey Pustovalov Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

lastest trunk revision


Issue Links:
Duplicate
duplicates ZBXNEXT-1649 Fine grained control of tasks perform... Closed

 Description   

I think not needed write into housekeeper table information about removed items (for few days we have 5million records in thise table, but we not using housekeeper)



 Comments   
Comment by richlv [ 2011 Sep 08 ]

hmm. the problem is that frontend has no idea that housekeeper is disabled - it currently can not query server for that, and server does not store that in the database...

Comment by Alexey Pustovalov [ 2011 Sep 08 ]

maybe housekeeper option move into database? in table config?

Comment by richlv [ 2011 Sep 08 ]

so let's consider this as a feature request to keep housekeeper config in the db & for frontend not to write housekeeper entries if housekeeping is disabled.

Comment by richlv [ 2011 Sep 12 ]

this would solve ZBXNEXT-30

Comment by Alexei Vladishev [ 2012 Oct 10 ]

I think not needed write into housekeeper table information about removed items (for few days we have 5million records in this table, but we not using housekeeper)

How do you know the housekeeper will not be enabled eventually? Let's move housekeeper setting to the front-end, but information regarding removed objects must be also stored regardless of housekeeper settings.

Comment by Alexei Vladishev [ 2013 Mar 12 ]

It will be resolved under ZBXNEXT-1649.

Comment by Pavels Jelisejevs (Inactive) [ 2014 Sep 26 ]

As far as I see, this issue can be closed.

CLOSED.





[ZBXNEXT-867] Keep history in hours Created: 2011 Jul 29  Updated: 2017 Jun 13  Resolved: 2017 Jun 13

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Change Request Priority: Minor
Reporter: azurIt Assignee: Unassigned
Resolution: Duplicate Votes: 1
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates ZBXNEXT-1675 Add macros support for update intervals Closed

 Description   

I have several thousand of items in which i need to keep history only for 1 hour. Zabbix is allowing me to keep it minimum for 1 day which is too long and my database is very big cos of this. Can you, please, allow to set keep history in hours ?



 Comments   
Comment by Glebs Ivanovskis (Inactive) [ 2017 Jun 13 ]

Implemented in ZBXNEXT-1675, closing as Duplicate.





[ZBXNEXT-207] Scheduled Housekeeping Created: 2010 Jan 19  Updated: 2015 Sep 29  Resolved: 2015 Feb 21

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Proxy (P), Server (S)
Affects Version/s: None
Fix Version/s: 2.5.0

Type: Change Request Priority: Major
Reporter: Peteris Assignee: Unassigned
Resolution: Fixed Votes: 45
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   

To manage workloads on Zabbix server Housekeeping feature should me more adjustable by adjusting start time of housekeeping event and/or inventing command line/frontend tools to initiate start of housekeeping process.

At this moment there is only possibility to set amount of housekeeping operations in 24 hours. I would like to set also a start time of Housekeeping operations, for example, if housekeeping option is set to 1 you can specify that housekeeping starts at 00:00, so this operation doesn't load Zabbix server at day time.

It would be great if housekeeping event could start from Command line and/or Frontend. If there would be a command line tool for housekeeping, administrators could make their own schedulers of housekeeping using cron jobs.



 Comments   
Comment by richlv [ 2010 Jan 19 ]

most likely - a switch for zabbix_server which signals a running server instance to start housekeeping. that could also be added as a script from a frontend etc

Comment by chlunde [ 2010 Jul 13 ]

I think this should be handled as two separate issues:
1) Specifying the housekeeping period(s) in the configuration file
2) Reloading the configuration file (safe options) without restart, this would satisfy the need for users wanting cronjobs to do fancy stuff (if such users exists?)

Our system has a fairly large housekeeping backlog so I would like to implement part 1. Part 2 is not important for me.

Suggested specification/implementation of part 1:
New configuration parameter:
HousekeepingPeriod
Simple example, run nightly all days:
1-7,00:00-07:00
Complex example, run outside workhours:
1-5,00:00-08:00;1-5,16:00-23:59;6-7,00:00-23:59;
An empty or undefined value behaves as the current implementation - i.e. runs all day with the specified HousekeepingFrequency.

All loops in the housekeeper process should stop when they are outside the configured period and sleep. Ideally the process should wake at the beginning of each configured period. This parameter combined with high values for HousekeepingFrequency could give some weird results, so HousekeepingFrequency=1 is probably best?

Alexei/Rich - do you think such an implementation would be accepted?

Comment by Simon [ 2010 Aug 03 ]

Any news about this feature ?

I'm also think it could be great if we could define when the housekeeping start.

Comment by richlv [ 2011 Mar 04 ]

generic runtime server control : ZBXNEXT-416
forced item check : ZBXNEXT-653

Comment by Andy Goldschmidt [ 2011 Aug 03 ]

Please implement this - very useful.

Comment by Sergey Vinogradov [ 2011 Aug 05 ]

I agree, this functionality will be very useful.

Comment by Guy Inigo [ 2012 Aug 08 ]

I would like to know if the implementation of this feature is planned for a next release?

It is a vital principle to be able to schedule resources consuming tasks.

Comment by richlv [ 2012 Aug 08 ]

there actually is some discussion about redesigning retention configuration to make housekeeper potentially unnecessary in the future, so housekeeper improvements currently are on hold

Comment by Alex Vorona [ 2012 Aug 09 ]

Using SSD for DB makes housekeeping impact on system load relatively low. We had forgot about housekeepeng after migrating to SSD.

Comment by Stefan [ 2012 Aug 09 ]

@richlv: table-partitioning?!

Comment by David Dixon [ 2013 Jul 12 ]

I agree, being able to schedule housekeeping to a specific time is very important - Zabbix is borderline unusable when it is running.

Comment by hdwtpl hdwtpl [ 2013 Oct 29 ]

Housekeeper starts if zabbix_server starts. Just create a cronjob to restart the zabbix_server and change the HouseKeepingFrequency to 24 in the zabbix_server.conf.

Comment by Raymond Kuiper [ 2014 Sep 12 ]

Steve Mushero talked about this during the Zabbix Conference 2014. Apparently he has a solution for this.
I'd also like this feature implemented, perhaps his patch can be used as a basis?

Comment by Igors Homjakovs (Inactive) [ 2014 Oct 02 ]

Fixed in svn://svn.zabbix.com/branches/dev/ZBXNEXT-207

Comment by Oleksii Zagorskyi [ 2014 Oct 03 ]

missing specification ....

wiper right, that would have helped to avoid a misunderstanding in current implementation

wiper I added housekeeper information to https://www.zabbix.org/wiki/Docs/specs/ZBXNEXT-101 (we should copy it to some other place for generic specifications).

Comment by Andris Zeila [ 2014 Oct 03 ]

(1) The config_cache_reload message should keep the old format and also the new housekeeper_execute message should be sent without defining target scope (like config_cache_reload).

Please review my changes in r49585

igorsh Thank you. Looks good.

wiper CLOSED

Comment by Andris Zeila [ 2014 Oct 03 ]

(2) Housekeeper wakeup should be renamed to housekeeper execute (in command line options and constant definitions).

igorsh RESOLVED in r49588.

wiper CLOSED

Comment by Igors Homjakovs (Inactive) [ 2014 Oct 03 ]

(3) man pages and documentation have to be updated

man pages have been updated in r49592.

wiper CLOSED

Comment by Andris Zeila [ 2014 Oct 06 ]

(4) Allow setting HousekeepingFrequency option in configuration file to zero to disable automatic housekeeping. This will allow to use only manual (scheduled) housekeeping procedures.

The following steps must be implemented:

  1. allow HousekeepingFrequency option to accept zero value
  2. add a function zbx_sleep_forever(), which would sleep until zbx_wakeup() is called
  3. update housekeeper process to use zbx_sleep_forever() if HousekeepingFrequency is set to zero

igorsh RESOLVED in r49907, r49912, r49913.

wiper CLOSED

Comment by Andris Zeila [ 2014 Oct 09 ]

The specifications were created https://www.zabbix.org/wiki/Docs/specs/ZBXNEXT-207

Comment by Janne Korkkula [ 2014 Oct 13 ]

It'd also be good to be able to configure the range the housekeeper cleans up in one run, not just a fixed 4x last sleeptime.

Comment by richlv [ 2014 Oct 13 ]

configuring the range would be a separate feature request, though

Comment by Janne Korkkula [ 2014 Oct 14 ]

The original requestee may or may not know what's happening under the hood and how the change may effect performance. Going from short housekeeping intervals to a once-per-night run may kill some servers which have more than the normal amount of cleanup to do. Implementing at least a configurable factor at the same time would be smart.

Comment by Andris Zeila [ 2014 Oct 14 ]

The cleanup interval must greater than the time between cleanups. So going from short intervals to a nightly run would require the cleanup interval to be at 24+ hours anyway.

Comment by Andris Zeila [ 2014 Oct 23 ]

(5) Documentation must be updated:

  1. new runtime control option -R housekeeper_execute
  2. disabling automatic housekeeping procedure by setting HousekeepingFrequency to 0 in server/proxy configuration files

igorsh RESOLVED in

https://www.zabbix.com/documentation/3.0/manual/introduction/whatsnew300
https://www.zabbix.com/documentation/3.0/manual/concepts/server
https://www.zabbix.com/documentation/3.0/manual/concepts/proxy
https://www.zabbix.com/documentation/3.0/manual/appendix/config/zabbix_server
https://www.zabbix.com/documentation/3.0/manual/appendix/config/zabbix_proxy

wiper CLOSED

Comment by Andris Zeila [ 2014 Oct 23 ]

Successfully tested, but please review coding style and HousekeepingFrequency description changes in r50116

igorsh Thank you. The canges look good.

Comment by Igors Homjakovs (Inactive) [ 2014 Oct 23 ]

(6) Perhaps help info also has to be updatated.

Now:

    Runtime control options:
      config_cache_reload               Reload configuration cache
      housekeeper_execute               Execute the housekeeper

but it should be like that

    Runtime control options:
      config_cache_reload               Reload configuration cache. Ignored if cache is being currently loaded.
      housekeeper_execute               Execute the housekeeper. Ignored if housekeeper is being currently executed.

wiper I'm not sure if I'd mention it even in manual.

<richlv> we mention it for the config cache reload in the manpages, but not in the help message - i would do the same for the housekeeper

igorsh Then it stays unchanged. CLOSED.

Comment by Andris Zeila [ 2014 Oct 24 ]

(7) Please take a look at r50146, r50147. It fixes few issues on systems without sigqueue support.

igorsh Thank you. CLOSED

Comment by Igors Homjakovs (Inactive) [ 2014 Oct 24 ]

Fixed in 2.5.0 (trunk) r50157.

Comment by Igors Homjakovs (Inactive) [ 2014 Oct 24 ]

(8) Man pages have to be updated:

https://www.zabbix.com/documentation/3.0/manpages/zabbix_server
https://www.zabbix.com/documentation/3.0/manpages/zabbix_proxy

<richlv> synced web versions from the manpages, RESOLVED

igorsh CLOSED

Comment by Alexander Vladishev [ 2015 Feb 10 ]

(9) Broken reloading of configuration cache on proxy side

Available in svn://svn.zabbix.com/branches/dev/ZBXNEXT-207 r52148.

RESOLVED.

wiper CLOSED

This fix is available in 2.5.0 (trunk) r52196.





[ZBXNEXT-30] simplify housekeeper configuration Created: 2009 Jul 08  Updated: 2014 Apr 16  Resolved: 2014 Apr 16

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Change Request Priority: Major
Reporter: richlv Assignee: Unassigned
Resolution: Duplicate Votes: 2
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates ZBXNEXT-1649 Fine grained control of tasks perform... Closed

 Description   

currently zabbix server and proxy configs have :

  1. Default value is 1 hour
  2. Housekeeping is removing unnecessary information from
  3. tables history, alert, and alarms
  4. This parameter must be between 1 and 24

#HousekeepingFrequency=1

  1. Uncomment this line to disable housekeeping

#DisableHousekeeping=1

it would get rid of one parameter if instead specifying 0 for HousekeepingFrequency would disable housekeeper



 Comments   
Comment by richlv [ 2009 Jul 15 ]

another similar combo :
DisablePassive and StartAgents (which is for passive instances only)

Comment by richlv [ 2011 Sep 12 ]

ZBXNEXT-953 calls for moving housekeeper config into the db

Comment by Alexei Vladishev [ 2012 Oct 10 ]

I think that housekeeper settings must be configurable in the front-end under Administration->General->Housekeeper. See also my comment in ZBXNEXT-953.

Comment by Alexei Vladishev [ 2013 Mar 12 ]

It will be resolved under ZBXNEXT-1649.

Comment by richlv [ 2014 Apr 16 ]

ZBXNEXT-1649 removed disablehousekeeping, so this issue is not relevant anymore





[ZBX-20177] Slow housekeeping of events due to missing index on foreign key"c_alerts_6". Created: 2021 Nov 05  Updated: 2024 Apr 10  Resolved: 2022 Jun 15

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: 5.0.18rc1, 5.4.8rc1, 6.0.0alpha7, 6.0 (plan)

Type: Problem report Priority: Trivial
Reporter: Kazuo Ito Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: housekeeper, postgresql
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Zabbix 5.0.14
PostgreSQL 12.5


Issue Links:
Causes
Duplicate
Team: Team B
Sprint: Sprint 82 (Nov 2021)
Story Points: 1

 Description   
1928:20211005:045440.479 housekeeper [deleted 0 hist/trends, 0 items/triggers, 18029 events, 169 problems, 0 sessions, 0 alarms, 0 audit, 0 records in 44895.761061 sec, idle for 1 hour(s)]

It took over 12 hours for the housekeeper to delete 18,029 records of events and 169 records of problems.

I checked the "sow query" and found that it took over 12 hours to delete events.

  1928:20211004:211956.063 slow query: 17611.322038 sec, "delete from events where (eventid ...
  1928:20211005:002622.993 slow query: 11186.895792 sec, "delete from events where (eventid ...
  1928:20211005:015255.291 slow query: 5192.229874 sec, "delete from events where (eventid ...
  1928:20211005:045438.434 slow query: 10900.706001 sec, "delete from events where (eventid ...

I checked the "EXPLAIN ANALYZE" and it seems that c_alerts_6 is taking a long time to execute.

Exist c_alerts_6

zabbix_db=> EXPLAIN ANALYZE delete from events where eventid=7755800;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Delete on events  (cost=0.43..5.45 rows=1 width=6) (actual time=1.130..1.131 rows=0 loops=1)
   ->  Index Scan using events_pkey on events  (cost=0.43..5.45 rows=1 width=6) (actual time=1.098..1.100 rows=1 loops=1)
         Index Cond: (eventid = 7755800)
 Planning Time: 0.069 ms
 Trigger for constraint c_alerts_2 on events: time=1.002 calls=1
 Trigger for constraint c_alerts_5 on events: time=0.083 calls=1
 Trigger for constraint c_acknowledges_2 on events: time=0.934 calls=1
 Trigger for constraint c_event_tag_1 on events: time=1.630 calls=1
 Trigger for constraint c_problem_1 on events: time=0.075 calls=1
 Trigger for constraint c_problem_2 on events: time=0.038 calls=1
 Trigger for constraint c_event_recovery_1 on events: time=0.844 calls=1
 Trigger for constraint c_event_recovery_2 on events: time=0.425 calls=1
 Trigger for constraint c_event_recovery_3 on events: time=0.055 calls=1
 Trigger for constraint c_event_suppress_1 on events: time=0.063 calls=1
 Trigger for constraint c_alerts_6 on acknowledges: time=8467.500 calls=1  <-- here
 Execution Time: 8473.814 ms
(16 行)

Not exist c_alerts_6

zabbix_db=> EXPLAIN ANALYZE delete from events where eventid=8331312;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 Delete on events  (cost=0.43..5.45 rows=1 width=6) (actual time=0.085..0.086 rows=0 loops=1)
   ->  Index Scan using events_pkey on events  (cost=0.43..5.45 rows=1 width=6) (actual time=0.028..0.029 rows=1 loops=1)
         Index Cond: (eventid = 8331312)
 Planning Time: 0.093 ms
 Trigger for constraint c_alerts_2: time=0.043 calls=1
 Trigger for constraint c_alerts_5: time=0.016 calls=1
 Trigger for constraint c_acknowledges_2: time=0.024 calls=1
 Trigger for constraint c_event_tag_1: time=0.018 calls=1
 Trigger for constraint c_problem_1: time=0.016 calls=1
 Trigger for constraint c_problem_2: time=0.014 calls=1
 Trigger for constraint c_event_recovery_1: time=0.018 calls=1
 Trigger for constraint c_event_recovery_2: time=0.091 calls=1
 Trigger for constraint c_event_recovery_3: time=0.017 calls=1
 Trigger for constraint c_event_suppress_1: time=0.021 calls=1
 Execution Time: 0.392 ms
(15 行)

The acknowledgeid of c_alerts_6 is not indexed.

ALTER TABLE `alerts` ADD CONSTRAINT `c_alerts_6` FOREIGN KEY (`acknowledgeid`) REFERENCES `acknowledges` (`acknowledgeid`) ON DELETE CASCADE;
zabbix=> \d alerts;
                                 Table "public.alerts"
    Column     |          Type           | Collation | Nullable |        Default        
---------------+-------------------------+-----------+----------+-----------------------
 alertid       | bigint                  |           | not null | 
 actionid      | bigint                  |           | not null | 
 eventid       | bigint                  |           | not null | 
 userid        | bigint                  |           |          | 
 clock         | integer                 |           | not null | 0
 mediatypeid   | bigint                  |           |          | 
 sendto        | character varying(1024) |           | not null | ''::character varying
 subject       | character varying(255)  |           | not null | ''::character varying
 message       | text                    |           | not null | ''::text
 status        | integer                 |           | not null | 0
 retries       | integer                 |           | not null | 0
 error         | character varying(2048) |           | not null | ''::character varying
 esc_step      | integer                 |           | not null | 0
 alerttype     | integer                 |           | not null | 0
 p_eventid     | bigint                  |           |          | 
 acknowledgeid | bigint                  |           |          | 
 parameters    | text                    |           | not null | '{}'::text
Indexes:
    "alerts_pkey" PRIMARY KEY, btree (alertid)
    "alerts_1" btree (actionid)
    "alerts_2" btree (clock)
    "alerts_3" btree (eventid)
    "alerts_4" btree (status)
    "alerts_5" btree (mediatypeid)
    "alerts_6" btree (userid)
    "alerts_7" btree (p_eventid)
Foreign-key constraints:
    "c_alerts_1" FOREIGN KEY (actionid) REFERENCES actions(actionid) ON DELETE CASCADE
    "c_alerts_2" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE
    "c_alerts_3" FOREIGN KEY (userid) REFERENCES users(userid) ON DELETE CASCADE
    "c_alerts_4" FOREIGN KEY (mediatypeid) REFERENCES media_type(mediatypeid) ON DELETE CASCADE
    "c_alerts_5" FOREIGN KEY (p_eventid) REFERENCES events(eventid) ON DELETE CASCADE
    "c_alerts_6" FOREIGN KEY (acknowledgeid) REFERENCES acknowledges(acknowledgeid) ON DELETE CASCADE


 Comments   
Comment by Sergey Simonenko (Inactive) [ 2021 Nov 12 ]

Available in:

Comment by dimir [ 2022 Feb 02 ]

Shouldn't there be a check for SERVER? Do we want to have that index in proxy?

UPD: Oh, judging by the code we always create indexes without the check for component.

Comment by Alexei Vladishev [ 2022 Jun 15 ]

I am not sure why it is still open. All DB upgrades require special permissions, there is nothing new. I am closing it.





[ZBX-19100] Records in the task table are not deleted. Created: 2021 Mar 08  Updated: 2021 Mar 09  Resolved: 2021 Mar 09

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Kazuo Ito Assignee: Zabbix Support Team
Resolution: Duplicate Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

4.0.7


Issue Links:
Duplicate
duplicates ZBX-18802 Task manager constantly busy with clo... Closed

 Description   

It is thought to occur by the following procedure.

  1. Massive number of problem events occur.
  2. Massive manual close.
  3. Delete the trigger of problem event that occurred.
  4. Event deleted after trigger data storage period.

At this time, if "manual close" is not finished, the record that does not disappear remains in the task table.

 

I reproduced it below.

  1. create problem event
  2. Stop Zabbix server
  3. manual close
    MariaDB [zabbix]> select * from task;
    +--------+------+--------+------------+-----+--------------+
    | taskid | type | status | clock      | ttl | proxy_hostid |
    +--------+------+--------+------------+-----+--------------+
    |      5 |    1 |      1 | 1615298733 |   0 |         NULL |
    |      6 |    4 |      1 | 1615298733 |   0 |         NULL |
    +--------+------+--------+------------+-----+--------------+
    2 rows in set (0.00 sec)
    
    MariaDB [zabbix]> SELECT
        ->     *
        -> FROM
        ->     task t
        -> JOIN
        ->     task_close_problem tcp
        -> ON  tcp.taskid = t.taskid
        -> WHERE
        ->     t.type = 1
        -> AND t.status = 1
        -> ;
    +--------+------+--------+------------+-----+--------------+--------+---------------+
    | taskid | type | status | clock      | ttl | proxy_hostid | taskid | acknowledgeid |
    +--------+------+--------+------------+-----+--------------+--------+---------------+
    |      5 |    1 |      1 | 1615298733 |   0 |         NULL |      5 |             3 |
    +--------+------+--------+------------+-----+--------------+--------+---------------+
    1 row in set (0.00 sec)
    
    MariaDB [zabbix]> select * from problem where eventid=20;
    +---------+--------+--------+----------+------------+-----------+-----------+---------+------+---------------+--------+------------------------------------------------------------+--------------+----------+
    | eventid | source | object | objectid | clock      | ns        | r_eventid | r_clock | r_ns | correlationid | userid | name                                                       | acknowledged | severity |
    +---------+--------+--------+----------+------------+-----------+-----------+---------+------+---------------+--------+------------------------------------------------------------+--------------+----------+
    |      20 |      0 |      0 |    13491 | 1615042800 | 640032169 |      NULL |       0 |    0 |          NULL |   NULL | Zabbix agent on Zabbix server is unreachable for 5 minutes |            0 |        3 |
    +---------+--------+--------+----------+------------+-----------+-----------+---------+------+---------------+--------+------------------------------------------------------------+--------------+----------+
    1 row in set (0.00 sec)
    
  1. Delete problem table and events table.(Delete instead of housekeeper.)
    MariaDB [zabbix]> delete from problem where eventid=20;
    Query OK, 1 row affected (0.01 sec)
    
    MariaDB [zabbix]> delete from events where eventid=20;
    Query OK, 1 row affected (0.03 sec)
    
    MariaDB [zabbix]> select * from acknowledges where acknowledgeid=3;
    Empty set (0.00 sec)
    
  1. Stafrt Zabbix server

 

After that, the record remains in the task table.
The records in the acknowledges table have been deleted.

MariaDB [zabbix]> SELECT
    ->     *
    -> FROM
    ->     task t
    -> JOIN
    ->     task_close_problem tcp
    -> ON  tcp.taskid = t.taskid
    -> WHERE
    ->     t.type = 1
    -> AND t.status = 1
    -> AND NOT EXISTS (
    ->     SELECT
    ->         *
    ->     FROM
    ->         acknowledges an
    ->     WHERE
    ->         an.acknowledgeid = tcp.acknowledgeid
    ->     )
    -> ;
+--------+------+--------+------------+-----+--------------+--------+---------------+
| taskid | type | status | clock      | ttl | proxy_hostid | taskid | acknowledgeid |
+--------+------+--------+------------+-----+--------------+--------+---------------+
|      5 |    1 |      1 | 1615298733 |   0 |         NULL |      5 |             3 |
+--------+------+--------+------------+-----+--------------+--------+---------------+
1 row in set (0.01 sec)

 



 Comments   
Comment by Vladislavs Sokurenko [ 2021 Mar 08 ]

The issue looks similar to ZBX-18802

Comment by Kazuo Ito [ 2021 Mar 09 ]

Excuse me.
It's the same problem.

confirmed.
It does not occur in Zabbx 4.0.29.





[ZBX-18206] Incorrect Housekeeping form behavior Created: 2020 Aug 06  Updated: 2024 Apr 10  Resolved: 2020 Sep 13

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F)
Affects Version/s: 5.0.3rc1
Fix Version/s: 5.0.4rc1

Type: Problem report Priority: Trivial
Reporter: Larisa Grigorjeva Assignee: Andrejs Griščenko
Resolution: Fixed Votes: 0
Labels: administration, housekeeper, housekeeping, settings
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Team: Team D
Sprint: Sprint 67 (Aug 2020), Sprint 68 (Sep 2020)
Story Points: 0.5

 Description   

Case 1:
1. Go to Housekeeping form
2. Reset it to default, save it.
3. Change settings:
Trigger data storage period => invalid value (5)
Services => unchecked
Audit=> unchecked
User sessions => unchecked
4. Try to save form, get error about invalid Trigger data storage period
Result: other settings changed to default, Services, Audit and User sessions became checked.

Case 2:
1. Uncheck Events and alerts
2. Set Services Data storage periodto invalid (1) and try to Save.
3. Error is present, but Events and alerts became checeked again.

Case 3:
1. Go to Housekeeping form
2. Reset it to default, save it.
3. Change settings:
Override item history period => checked
Data storage period => some custom value (945d)
Override item trend period => checked
Data storage periods => some custom value (945d)
4. Save form.
5. Press reset to defaults - Notice that all fields have changed to default values, also:
Override item history period => unchecked
Data storage period => 90d - disabled
Override item trend period => unchecked
Data storage periods => 365d - disabled
6. Press update.
7. Result: "Data storage period" values both remained custom, but not default.



 Comments   
Comment by Andrejs Griščenko [ 2020 Aug 12 ]

Resolved in development branch feature/ZBX-18206-5.0.

Comment by Andrejs Griščenko [ 2020 Sep 10 ]

Fixed in:





[ZBX-18169] housekeeper does not delete history/trends of deleted items if override period is used Created: 2020 Jul 30  Updated: 2024 Apr 10  Resolved: 2020 Aug 07

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 5.0.2
Fix Version/s: 5.0.3rc1, 5.2.0alpha1, 5.2 (plan)

Type: Problem report Priority: Blocker
Reporter: Oleksii Zagorskyi Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 1
Labels: housekeeper, regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Team: Team A
Sprint: Sprint 66 (Jul 2020), Sprint 67 (Aug 2020)
Story Points: 1

 Description   

I could not believe it is. But that's true.

In function we have these lines:

	for (table = hk_cleanup_tables; NULL != table->name; table++)
	{
		if (ZBX_HK_MODE_REGULAR != *table->poption_mode || ZBX_HK_OPTION_ENABLED == *table->poption_global)
			continue;

Imagine that for some reason I decided to keep less history and/or trends than set per-item level.
I've just enabled "Override item history/trend period" checkbox(s) and set custom number of days.
I keep "Enable internal housekeeping" enabled for history/trends, as it was.

In a result, number of records in "housekeeper" table started to grow.
Zabbix server executes this SQL to select records for cleanup:

select housekeeperid,tablename,field,value from housekeeper where tablename in ('events') order by tablename

Why is this so?
How history/trends of deleted items is different from history/trends of existing items?
Why for deleted ones it's preserved but for existing ones it's cleaned up ?

I'm pretty sure this is regression.
If it's by design - it's absolutely not clear for user why such logic is used.

As I see this logic has been changed in version 4.2.
4.0 was fine in this regard.



 Comments   
Comment by Oleksii Zagorskyi [ 2020 Aug 03 ]

Btw, affected installations likely may log slow SQL warnings for this SQL:

select housekeeperid,tablename,field,value from housekeeper where tablename in ('events') order by tablename

as the table became big.
I saw these slow SQL already on 2 installations.
They should go away after some period after upgrade, of course.

Comment by Vladislavs Sokurenko [ 2020 Aug 06 ]

Fixed in

  • pre-5.0.3rc1 9c3ca976629
  • pre-5.2.0alpha1 (master) 9078bba3fef




[ZBX-17471] Do not keep history and trends doesn't work for TimeScaleDB Created: 2020 Mar 18  Updated: 2024 Apr 10  Resolved: 2021 Feb 27

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 4.4.8rc1, 5.0.0alpha4
Fix Version/s: 5.0.10rc1, 5.2.6rc1, 5.4.0beta1, 5.4 (plan)

Type: Problem report Priority: Trivial
Reporter: Natalja Romancaka Assignee: Artjoms Rimdjonoks
Resolution: Fixed Votes: 2
Labels: TimescaleDB, housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Team: Team C
Sprint: Sprint 62 (Mar 2020), Sprint 63 (Apr 2020), Sprint 64 (May 2020), Sprint 65 (Jun 2020), Sprint 66 (Jul 2020), Sprint 67 (Aug 2020), Sprint 68 (Sep 2020), Sprint 69 (Oct 2020), Sprint 70 (Nov 2020), Sprint 71 (Dec 2020), Sprint 72 (Jan 2021), Sprint 73 (Feb 2021)
Story Points: 0.25

 Description   

Steps to reproduce:

  1. Setup Zabbix with TimescaleDB support
  2. Collect some data from items
  3. Go to Administration -> General-> Housekeeping
  4. Check that option "Override item period" is selected for history and trends
  5. Type 0 in "Data storage period" for history and trends, which means do not keep history and trends
  6. Reload configuration cache and execute housekeeper "zabbix_server -R housekeeper_execute"
  7. Check log and items values in DB

Result: warnings in log:

invalid history storage period for table 'history'
invalid history storage period for table 'history_str'
...
invalid history storage period for table 'trends'

Items values remained in the database.
Expected:
Items values deleted from DB and no warnings in log



 Comments   
Comment by Artjoms Rimdjonoks [ 2021 Feb 24 ]




[ZBX-16925] Deadlocks on events when table is big and housekeeper runing Created: 2019 Nov 15  Updated: 2020 Jun 03

Status: Confirmed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 4.2.8
Fix Version/s: None

Type: Problem report Priority: Trivial
Reporter: Elina Kuzyutkina (Inactive) Assignee: Zabbix Development Team
Resolution: Unresolved Votes: 4
Labels: events, housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Story Points: 1

 Description   

Mysql is backend DB. And there is two points to review:
1. Deadlocks themselfs. Events table has more then 9 million rows, houskeeper is running and here it's an example:

*** (1) TRANSACTION:
TRANSACTION 208532068729, ACTIVE 1 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
MySQL thread id 55502371, OS thread handle 48238530000640, query id 11973893554 some IP zabbix_server update
insert into problem (eventid,source,object,objectid,clock,ns,name,severity) values (.....)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 17517 page no 4384 n bits 1160 index problem_3 of table `zabbix`.`problem` trx id 208532068729 lock_mode X locks gap before rec insert intention waiting
*** (2) TRANSACTION:
TRANSACTION 208531990205, ACTIVE 565 sec fetching rows
mysql tables in use 1, locked 1
20493 lock struct(s), heap size 2203856, 3433920 row lock(s), undo log entries 76042
MySQL thread id 56387211, OS thread handle 48238574192384, query id 11972947767 some IP zabbix_server updating
delete from events where (eventid between ......
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 17517 page no 4384 n bits 1160 index problem_3 of table `zabbix`.`problem` trx id 208531990205 lock mode S locks gap before rec
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 17876 page no 23305 n bits 168 index PRIMARY of table `zabbix`.`events` trx id 208531990205 lock_mode X waiting
*** WE ROLL BACK TRANSACTION (1)

2. Storage period for internal events is configured to 1 day, but there is also alot of unsupported items. So houskeeper will not delete ones that are in problem state (still unsupported). It might make sense to review internal event storage. It is even possible to remove the generation of recovery events for internal data elements for the sake of being able to delete them after a set period (1day)?
In case of impossibility to get rid of unsupported data elements - it is more important to be able to control the size of the event tables than to be able to notify about the item (trigger\rule) status change to supported
At least this requires the note in the documentation.



 Comments   
Comment by Vladislavs Sokurenko [ 2019 Nov 15 ]

If talking about second part of the issue it would be nice not to generate internal events if checked in cache that there are no actions to be executed for them.





[ZBX-16885] Housekeeper for trends not working properly Created: 2019 Nov 07  Updated: 2019 Nov 12  Resolved: 2019 Nov 12

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 4.2.7
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Mariana Alves Marques Assignee: Edgar Akhmetshin
Resolution: Won't fix Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos7 - Zabbix Server 4.2.2


Attachments: JPEG File Housekeeping Screenshot.JPG     Text File show_create_table.txt    

 Description   

We have a Zabbix environment that we migrated from version 3.0.5 to 4.2.2.

 Just after the migration we noticed a very different behavior in housekeeper, it started running for almost 4 hours and we detected that not all history and trends data was being deleted.

 Our approach to the problem was:

1) We stopped housekeeper and deleted data by manually;

2) We started housekeeper only for history and it started running for about 20/25 minutes and deleting all data needed.

3) We started housekeeper for history + trends and we had the same issue, housekeeping running almost 4 hours.

We found that after housekeeper starts it stays for almost 3 hours running this query: "select itemid,min(clock) from trends group by itemid"

 

After that query completes it starts deleting. We decided to disable housekeeper from trends and doing it manually. Is this is a known issue?  

Note:

we have some partitioned tables on our MariaDB Zabbix Database:

'trends','trends_uint','history','history_uint','history_str'



 Comments   
Comment by Edgar Akhmetshin [ 2019 Nov 08 ]

Hello,

If history* and trends* tables are partitioned - housekeeping for that items should be disabled.
Add Administration - General - Housekeeping screenshot and show output from the following SQL as one single text file:

show create table history\G;
show create table trends\G;
... # add output for all history*/trends* tables, replace * with actual names
show create table trends_*\G;
show create table history_*\G;
select from_unixtime(min(clock)) from history_str limit 1;

Regards,
Edgar

Comment by Mariana Alves Marques [ 2019 Nov 08 ]

Hello,

information requested was attached.

Thank you,

Mariana

Comment by Edgar Akhmetshin [ 2019 Nov 12 ]

Hello,

All history*/trends* should be partitioned and housekeeping for this tables should be done by stored procedures or scripts, not by housekeeper. All other tables should be cleaned by housekeeper.

Also please note, 4.2 is an unsupported version now.

Please be advised that this section of the tracker is for bug reports only. The case you have submitted can not be qualified as one, so please reach out to [email protected] for commercial support or consultancy services. Alternatively, you can also use our IRC channel or community forum (https://www.zabbix.com/forum) for assistance. With that said, we are closing this ticket. Thank you for understanding.

Regards,
Edgar





[ZBX-16688] No use of events having non-existing objectid (triggerid) Created: 2019 Sep 26  Updated: 2019 Dec 13  Resolved: 2019 Dec 13

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.0.28, 4.0.12, 4.2.6
Fix Version/s: None

Type: Problem report Priority: Trivial
Reporter: Aigars Kadikis Assignee: Aigars Kadikis
Resolution: Cannot Reproduce Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Timeline.png     PNG File image-2019-09-27-11-10-47-438.png    
Issue Links:
Sub-task
Sprint: Sprint 56 (Sep 2019), Sprint 57 (Oct 2019), Sprint 58 (Nov 2019), Sprint 59 (Dec 2019)

 Description   

It appears that following SQL is reporting back historical records which are impossible to browse through a graphical user interface.

select objectid,name from events where source=0 and objectid not in (select triggerid from triggers) order by clock\G

These records are representing events which come from triggers which do not exist anymore in the instance.

 

Steps to reproduce:

create a new "Zabbix trapper" item with the key name "number" in host "Zabbix server"

link a trigger:

{Zabbix server:number.last()}>0

Send from the command line

zabbix_sender -z 127.0.0.1 -s "Zabbix server" -k number -o 9

Observe the trigger in problem state

Send another trap:

zabbix_sender -z 127.0.0.1 -s "Zabbix server" -k number -o 0

Observe problem is gone.

Delete trigger, leave the item.

The historical records are still in the database:

select objectid,name from events where source=0 and objectid not in (select triggerid from triggers) order by clock\G

The housekeeper does not clean these records.

 

Expectation:

After removing trigger in Zabbix, related events in the database must be removed too.



 Comments   
Comment by Aigars Kadikis [ 2019 Sep 27 ]

On 4.2.6 version housekeeper settings, ar installed as:

Did execute from command line housekeeper multiple times. But records with an old timestamp (bigger than 1d) remain in the database.

Comment by Alexander Vladishev [ 2019 Nov 28 ]

aigars.kadikis, I confirm problem with remaining records with old timestamp in version 4.0.15.

Initial issue should be documented because housekeeper shouldn't remove orphaned records within data storage period due to performance reasons.

Comment by Andrejs Tumilovics [ 2019 Dec 02 ]

First, let's clarify the rules that housekeeper follow.
As stated in HousekeepingFrequency parameter documentation:

Note: To prevent housekeeper from being overloaded (for example, when history and trend periods are greatly reduced), no more than 4 times HousekeepingFrequency hours of outdated information are deleted in one housekeeping cycle, for each item. Thus, if HousekeepingFrequency is 1, no more than 4 hours of outdated information (starting from the oldest entry) will be deleted per cycle.

Basically, there are sliding time point for cleanup, where items on the left are removed.


After Zabbix server is restarted, once first housekeeping takes place, the cleanup point is initialized relatively to oldest record that match "outdated" criteria (has no parent and is older than data storage period).
If time difference between oldest item and "cleanup point" is significant it will take time (many housekeeper turns) for housekeeper to reach it.
But, we can speed that process up by "housekeeper_execute" runtime option. It will move cleanup point right by 4 hours in one turn.
Perhaps, we could always rely on oldest existing item when moving cleanup point right, but this will hit the performance, because events table may be huge.
Normally, if housekeeper is running periodically, it will keep cleaning outdated items properly.
If HousekeepingFrequency is set to 0, then user himself is responsible for starting housekeeper by runtime option at least once a day.
Also, if server restarts often, it might happen that housekeeper never runs, because its first run is postponed for 30 minutes after server start.

aigars.kadikis does it clarify your observations?
If so, I am going to close this ticket, because there is no problem with housekeeper.

Comment by Aigars Kadikis [ 2019 Dec 02 ]

Thank you atumilovics for the picture. 

If I execute a housekeeper manually a few times, then for some point the data should be cleared away. 

But still, the events (not the plain metrics) which have been generated by now non-existing triggers, still remains.

This is the last output when I manually execute the housekeeper:

  3696:20191202:131344.518 forced execution of the housekeeper
  3696:20191202:131344.520 executing housekeeper
  3696:20191202:131349.931 housekeeper [deleted 2324 hist/trends, 0 items/triggers, 43 events, 0 problems, 55 sessions, 0 alarms, 0 audit, 0 records in 5.396280 sec, idle for 1 hour(s)]
  3696:20191202:131403.992 forced execution of the housekeeper
  3696:20191202:131403.992 executing housekeeper
  3696:20191202:131409.421 housekeeper [deleted 12 hist/trends, 0 items/triggers, 8 events, 0 problems, 0 sessions, 0 alarms, 0 audit, 0 records in 5.420089 sec, idle for 1 hour(s)]
  3696:20191202:131433.420 forced execution of the housekeeper
  3696:20191202:131433.420 executing housekeeper
  3696:20191202:131435.636 housekeeper [deleted 15 hist/trends, 0 items/triggers, 14 events, 0 problems, 0 sessions, 0 alarms, 0 audit, 0 records in 2.213003 sec, idle for 1 hour(s)]
  3696:20191202:131500.763 forced execution of the housekeeper
  3696:20191202:131500.764 executing housekeeper
  3696:20191202:131502.872 housekeeper [deleted 15 hist/trends, 0 items/triggers, 8 events, 0 problems, 2 sessions, 0 alarms, 0 audit, 0 records in 2.107308 sec, idle for 1 hour(s)]
  3696:20191202:131613.983 forced execution of the housekeeper
  3696:20191202:131613.985 executing housekeeper
  3696:20191202:131618.497 housekeeper [deleted 38 hist/trends, 0 items/triggers, 14 events, 0 problems, 1 sessions, 0 alarms, 0 audit, 0 records in 4.499046 sec, idle for 1 hour(s)] 

Should I need to schedule a cronjob to execute the housekeeper manually a few times per hour?

Comment by Andrejs Tumilovics [ 2019 Dec 02 ]

aigars.kadikis  I don't think you really need a chronjob, 1h period is configurable by HousekeepingFrequency=1 (which is default) and it should work fine. Yes, it might not happen on the next housekeeper run, but will be done after several housekeeper turns for recently orphaned items.

Comment by Aigars Kadikis [ 2019 Dec 04 ]

Thank you. Will try to increase HousekeepingFrequency= and wait a few days.

Comment by Aigars Kadikis [ 2019 Dec 13 ]

Cannot reproduce the issue.





[ZBX-16412] The description for "MaxHousekeeperDelete" parameter is not clear enough. Created: 2014 Nov 06  Updated: 2024 Apr 10  Resolved: 2020 Feb 17

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Documentation (D)
Affects Version/s: None
Fix Version/s: 5.0 (plan)

Type: Documentation task Priority: Trivial
Reporter: Oleg Ivanivskyi Assignee: Martins Valkovskis
Resolution: Fixed Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Team: Team D
Sprint: Sprint 56 (Sep 2019), Sprint 55 (Aug 2019), Sprint 54 (Jul 2019), Sprint 57 (Oct 2019), Sprint 58 (Nov 2019), Sprint 59 (Dec 2019), Sprint 60 (Jan 2020), Sprint 61 (Feb 2020)

 Description   

Please add some details/example to the documentation for "MaxHousekeeperDelete" parameter. I suggest to add such example and note:

For example, we have to remove 1 item prototype linked to 50 hosts and for every host this item prototype is expanded in 100 real items, in this way we have to remove 5000 Items (1*50*100). If we configure MaxHousekeeperDelete=500, Housekeeper process will have to remove up to 2500000 values (5000*500) for deleted items from history and trends tables.
Note, when a lot of items are deleted it increases load to the database, because housekeeper needs to remove all the history data that these items had.



 Comments   
Comment by Martins Valkovskis [ 2020 Feb 14 ]

Updated documentation for the MaxHousekeeperDelete parameter in versions 3.0, 4.0, 4.4, 5.0. More specifically, the additional information is added as a footnote that is clearly linked to from the parameter description.





[ZBX-15774] Server housekeeper memory leakage Created: 2019 Mar 06  Updated: 2024 Apr 10  Resolved: 2019 Mar 17

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 4.0.4
Fix Version/s: 4.0.6rc1, 4.2.0rc1, 4.2 (plan)

Type: Problem report Priority: Critical
Reporter: Oleg Morozov Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 0
Labels: elasticsearch, housekeeper, memoryleak
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File graph.png     PNG File hk-settings.png     File pmap.1     File pmap.2     File pmap.3     File zabbix_server.objdump.gz    
Team: Team A
Sprint: Sprint 50 (Mar 2019)
Story Points: 0.25

 Description   

After upgrade to 4 version + Elasticsearch I see constantly raising memory usage by housekeeper process. After some investigation (few hk runs, pmap dump after each run) found that hk process eats +15872 kbytes after each run. So with housekeeper every hour we got ~372 Mb memory leakage every day. For now we have to restart server every month.

Attached 3 pmap dumps and server memory usage graph.

Server configuration:

LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
SocketDir=/var/run/zabbix
DBHost=127.0.0.1
DBName=zabbix
DBUser=zabbix
DBPassword=***
DBPort=7001
HistoryStorageURL=http://localhost:9200
HistoryStorageDateIndex=1
StartPollers=4
StartTrappers=2
SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
MaxHousekeeperDelete=100000
CacheSize=2G
StartDBSyncers=16
HistoryCacheSize=2G
HistoryIndexCacheSize=256M
TrendCacheSize=128M
ValueCacheSize=6G
Timeout=5
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
LogSlowQueries=5000
ProxyConfigFrequency=60


 Comments   
Comment by Edgar Akhmetshin [ 2019 Mar 06 ]

Hello Oleg,

Thank you for reporting the issue. Please, provide the following information:

  1. operating system used and it's version
  2. objdump -Dswx $(which zabbix_server) | gzip -c > zabbix_server.objdump.gz
  3. ldd $(which zabbix_server)

Regards,
Edgar

Comment by Oleg Morozov [ 2019 Mar 06 ]

Hi Edgar, thanks for reply. Attached zabbix_server.objdump.gz

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.6 LTS
Release: 16.04
Codename: xenial

# uname -a
Linux *** 4.15.0-38-generic #41~16.04.1-Ubuntu SMP Wed Oct 10 20:16:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

# ldd $(which zabbix_server)
linux-vdso.so.1 => (0x00007ffc6a3f5000)
libmysqlclient.so.20 => /usr/lib/x86_64-linux-gnu/libmysqlclient.so.20 (0x00007f821e9b4000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f821e797000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f821e57d000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f821e274000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f821e06c000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f821de68000)
libiksemel.so.3 => /usr/lib/x86_64-linux-gnu/libiksemel.so.3 (0x00007f821dc5a000)
libxml2.so.2 => /usr/lib/x86_64-linux-gnu/libxml2.so.2 (0x00007f821d89f000)
libodbc.so.2 => /usr/lib/x86_64-linux-gnu/libodbc.so.2 (0x00007f821d636000)
libnetsnmp.so.30 => /usr/lib/x86_64-linux-gnu/libnetsnmp.so.30 (0x00007f821d359000)
libssh2.so.1 => /usr/lib/x86_64-linux-gnu/libssh2.so.1 (0x00007f821d130000)
libOpenIPMI.so.0 => /usr/lib/libOpenIPMI.so.0 (0x00007f821ce22000)
libOpenIPMIposix.so.0 => /usr/lib/libOpenIPMIposix.so.0 (0x00007f821cc1c000)
libevent-2.0.so.5 => /usr/lib/x86_64-linux-gnu/libevent-2.0.so.5 (0x00007f821c9d6000)
libssl.so.1.0.0 => /lib/x86_64-linux-gnu/libssl.so.1.0.0 (0x00007f821c76d000)
libcrypto.so.1.0.0 => /lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x00007f821c328000)
libldap_r-2.4.so.2 => /usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2 (0x00007f821c0d7000)
liblber-2.4.so.2 => /usr/lib/x86_64-linux-gnu/liblber-2.4.so.2 (0x00007f821bec8000)
libcurl.so.4 => /usr/lib/x86_64-linux-gnu/libcurl.so.4 (0x00007f821bc59000)
libresolv.so.2 => /lib/x86_64-linux-gnu/libresolv.so.2 (0x00007f821ba3e000)
libpcre.so.3 => /lib/x86_64-linux-gnu/libpcre.so.3 (0x00007f821b7ce000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f821b404000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f821b082000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f821ae6c000)
/lib64/ld-linux-x86-64.so.2 (0x00007f821f498000)
libgnutls.so.30 => /usr/lib/x86_64-linux-gnu/libgnutls.so.30 (0x00007f821ab3c000)
libicuuc.so.55 => /usr/lib/x86_64-linux-gnu/libicuuc.so.55 (0x00007f821a7a8000)
liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f821a586000)
libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f821a37c000)
libgcrypt.so.20 => /lib/x86_64-linux-gnu/libgcrypt.so.20 (0x00007f821a09b000)
libOpenIPMIutils.so.0 => /usr/lib/libOpenIPMIutils.so.0 (0x00007f8219e92000)
libsasl2.so.2 => /usr/lib/x86_64-linux-gnu/libsasl2.so.2 (0x00007f8219c77000)
libgssapi.so.3 => /usr/lib/x86_64-linux-gnu/libgssapi.so.3 (0x00007f8219a36000)
libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007f8219803000)
librtmp.so.1 => /usr/lib/x86_64-linux-gnu/librtmp.so.1 (0x00007f82195e7000)
libgssapi_krb5.so.2 => /usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2 (0x00007f821939d000)
libp11-kit.so.0 => /usr/lib/x86_64-linux-gnu/libp11-kit.so.0 (0x00007f8219139000)
libtasn1.so.6 => /usr/lib/x86_64-linux-gnu/libtasn1.so.6 (0x00007f8218f26000)
libnettle.so.6 => /usr/lib/x86_64-linux-gnu/libnettle.so.6 (0x00007f8218cf0000)
libhogweed.so.4 => /usr/lib/x86_64-linux-gnu/libhogweed.so.4 (0x00007f8218abd000)
libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f821883d000)
libicudata.so.55 => /usr/lib/x86_64-linux-gnu/libicudata.so.55 (0x00007f8216d86000)
libgpg-error.so.0 => /lib/x86_64-linux-gnu/libgpg-error.so.0 (0x00007f8216b72000)
libheimntlm.so.0 => /usr/lib/x86_64-linux-gnu/libheimntlm.so.0 (0x00007f8216969000)
libkrb5.so.26 => /usr/lib/x86_64-linux-gnu/libkrb5.so.26 (0x00007f82166df000)
libasn1.so.8 => /usr/lib/x86_64-linux-gnu/libasn1.so.8 (0x00007f821643d000)
libcom_err.so.2 => /lib/x86_64-linux-gnu/libcom_err.so.2 (0x00007f8216239000)
libhcrypto.so.4 => /usr/lib/x86_64-linux-gnu/libhcrypto.so.4 (0x00007f8216006000)
libroken.so.18 => /usr/lib/x86_64-linux-gnu/libroken.so.18 (0x00007f8215df0000)
libkrb5.so.3 => /usr/lib/x86_64-linux-gnu/libkrb5.so.3 (0x00007f8215b1e000)
libk5crypto.so.3 => /usr/lib/x86_64-linux-gnu/libk5crypto.so.3 (0x00007f82158ef000)
libkrb5support.so.0 => /usr/lib/x86_64-linux-gnu/libkrb5support.so.0 (0x00007f82156e4000)
libffi.so.6 => /usr/lib/x86_64-linux-gnu/libffi.so.6 (0x00007f82154dc000)
libwind.so.0 => /usr/lib/x86_64-linux-gnu/libwind.so.0 (0x00007f82152b3000)
libheimbase.so.1 => /usr/lib/x86_64-linux-gnu/libheimbase.so.1 (0x00007f82150a4000)
libhx509.so.5 => /usr/lib/x86_64-linux-gnu/libhx509.so.5 (0x00007f8214e59000)
libsqlite3.so.0 => /usr/lib/x86_64-linux-gnu/libsqlite3.so.0 (0x00007f8214b84000)
libcrypt.so.1 => /lib/x86_64-linux-gnu/libcrypt.so.1 (0x00007f821494c000)
libkeyutils.so.1 => /lib/x86_64-linux-gnu/libkeyutils.so.1 (0x00007f8214748000)
Comment by Edgar Akhmetshin [ 2019 Mar 06 ]

Oleg,

One more thing, please. Version of the ElasticSearch used?

Regards,
Edgar

Comment by Oleg Morozov [ 2019 Mar 06 ]
curl ***:9200
{
"name" : "***",
"cluster_name" : "zabbix",
"cluster_uuid" : "LihB0jtyTWiFsbwZXnyJ3w",
"version" : {
"number" : "6.1.4",
"build_hash" : "d838f2d",
"build_date" : "2018-03-14T08:28:22.470Z",
"build_snapshot" : false,
"lucene_version" : "7.1.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
Comment by Vladislavs Sokurenko [ 2019 Mar 06 ]

Does disabling housekeeping of history helps ?
Is there any history in MySQL database ?

select count(*) from history;
select count(*) from history_text;
select count(*) from history_uint;
select count(*) from history_str;
select count(*) from history_log;
Comment by Oleg Morozov [ 2019 Mar 06 ]

Vladislav, history tables are empty since we switched to elasticsearch. I've checked, no records in history* tables.

Housekeeping for history currently enabled, I'll try now disable it and make few hk runs.

Comment by Oleg Morozov [ 2019 Mar 06 ]

Disabled history housekeeping via web-interface and made 20 runs.

Memory leak is still present, but now it eats 6500 kbytes instead of 15872 kbytes for one run.

Comment by Oleg Morozov [ 2019 Mar 06 ]

10 runs with 5 sec delay (1 sec is enough according to log)

# for i in {1..10}; do zabbix_server -R housekeeper_execute; sleep 5; pmap -x 1286 > $i; done
zabbix_server [30896]: command sent successfully
zabbix_server [31050]: command sent successfully
zabbix_server [31163]: command sent successfully
zabbix_server [31277]: command sent successfully
zabbix_server [31420]: command sent successfully
zabbix_server [31626]: command sent successfully
zabbix_server [31847]: command sent successfully
zabbix_server [32013]: command sent successfully
zabbix_server [32210]: command sent successfully
zabbix_server [32325]: command sent successfully

# for i in {1..10}; do grep -m1 00005637a6b58000 $i; done
00005637a6b58000 3765408 3765224 3765224 rw--- [ anon ]
00005637a6b58000 3771908 3771724 3771724 rw--- [ anon ]
00005637a6b58000 3778408 3778224 3778224 rw--- [ anon ]
00005637a6b58000 3784908 3784724 3784724 rw--- [ anon ]
00005637a6b58000 3791408 3791224 3791224 rw--- [ anon ]
00005637a6b58000 3797908 3797724 3797724 rw--- [ anon ]
00005637a6b58000 3804408 3804224 3804224 rw--- [ anon ]
00005637a6b58000 3810908 3810724 3810724 rw--- [ anon ]
00005637a6b58000 3817408 3817224 3817224 rw--- [ anon ]
00005637a6b58000 3823908 3823724 3823724 rw--- [ anon ]
Comment by Vladislavs Sokurenko [ 2019 Mar 06 ]

What if trends housekeeping is disabled ?

Comment by Oleg Morozov [ 2019 Mar 06 ]

With trends housekeeping disabled no leakage after 10 runs.

Comment by Oleg Morozov [ 2019 Mar 06 ]

Enabled trends housekeeping and leak is back again, so definitively leak somewhere in that place.

Comment by Vladislavs Sokurenko [ 2019 Mar 06 ]

Reproduced. Please also provide screenshot of housekeeper configuration from frontend

steps:

  • Enable elastic
  • Override item history period and item trend period

Observe memory leak:

==11409== For counts of detected and suppressed errors, rerun with: -v
==11409== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 48 from 48)
==11413== 32,768 bytes in 1 blocks are possibly lost in loss record 70 of 75
==11413==    at 0x4838748: malloc (vg_replace_malloc.c:308)
==11413==    by 0x483AD63: realloc (vg_replace_malloc.c:836)
==11413==    by 0x849BF0: zbx_realloc2 (misc.c:550)
==11413==    by 0x810B58: zbx_default_mem_realloc_func (algodefs.c:331)
==11413==    by 0x8359C6: zbx_vector_ptr_reserve (vector.c:28)
==11413==    by 0x49B9D8: hk_history_delete_queue_prepare_global (housekeeper.c:491)
==11413==    by 0x49BCF8: hk_history_delete_queue_prepare_all (housekeeper.c:546)
==11413==    by 0x49C177: housekeeping_history_and_trends (housekeeper.c:651)
==11413==    by 0x49E427: housekeeper_thread (housekeeper.c:1197)
==11413==    by 0x83D318: zbx_thread_start (threads.c:132)
==11413==    by 0x4237EE: MAIN_ZABBIX_ENTRY (server.c:1165)
==11413==    by 0x80C8CD: daemon_start (daemon.c:392)
==11413== 
==11413== 196,608 bytes in 6 blocks are definitely lost in loss record 73 of 75
==11413==    at 0x4838748: malloc (vg_replace_malloc.c:308)
==11413==    by 0x483AD63: realloc (vg_replace_malloc.c:836)
==11413==    by 0x849BF0: zbx_realloc2 (misc.c:550)
==11413==    by 0x810B58: zbx_default_mem_realloc_func (algodefs.c:331)
==11413==    by 0x8359C6: zbx_vector_ptr_reserve (vector.c:28)
==11413==    by 0x49B9D8: hk_history_delete_queue_prepare_global (housekeeper.c:491)
==11413==    by 0x49BCF8: hk_history_delete_queue_prepare_all (housekeeper.c:546)
==11413==    by 0x49C177: housekeeping_history_and_trends (housekeeper.c:651)
==11413==    by 0x49E427: housekeeper_thread (housekeeper.c:1197)
==11413==    by 0x83D318: zbx_thread_start (threads.c:132)
==11413==    by 0x4237EE: MAIN_ZABBIX_ENTRY (server.c:1165)
==11413==    by 0x80C8CD: daemon_start (daemon.c:392)
==11413== 
==11413== 229,376 bytes in 7 blocks are definitely lost in loss record 75 of 75
==11413==    at 0x4838748: malloc (vg_replace_malloc.c:308)
==11413==    by 0x483AD63: realloc (vg_replace_malloc.c:836)
==11413==    by 0x849BF0: zbx_realloc2 (misc.c:550)
==11413==    by 0x810B58: zbx_default_mem_realloc_func (algodefs.c:331)
==11413==    by 0x8359C6: zbx_vector_ptr_reserve (vector.c:28)
==11413==    by 0x49ADE9: hk_history_prepare (housekeeper.c:294)
==11413==    by 0x49BD71: hk_history_delete_queue_prepare_all (housekeeper.c:553)
==11413==    by 0x49C177: housekeeping_history_and_trends (housekeeper.c:651)
==11413==    by 0x49E427: housekeeper_thread (housekeeper.c:1197)
==11413==    by 0x83D318: zbx_thread_start (threads.c:132)
==11413==    by 0x4237EE: MAIN_ZABBIX_ENTRY (server.c:1165)
==11413==    by 0x80C8CD: daemon_start (daemon.c:392)

Suspicious place to blame(notice skipping but no clearing of allocated vectors)

		if (ZBX_HK_MODE_DISABLED == *rule->poption_mode || FAIL == zbx_history_requires_trends(rule->type))
			continue;

Workaround, disable history and trends housekeeping since they are not stored in MySQL database anyway.

Comment by Oleg Morozov [ 2019 Mar 06 ]

Attached screenshot.

We cannot disable trends housekeeping since Zabbix cannot use elasticsearch for trends, only for history.

Comment by Vladislavs Sokurenko [ 2019 Mar 06 ]

It looks like housekeeper also does not delete trends when elastic search is enabled.

Comment by Oleg Morozov [ 2019 Mar 06 ]

It looks like housekeeper also does not delete trends when elastic search is enabled.

Nice.

Comment by Vladislavs Sokurenko [ 2019 Mar 06 ]

Fixed in development branch:
svn://svn.zabbix.com/branches/dev/ZBX-15774

fixed old trends not being deleted by housekeeper and a memory leak in housekeeper when elasticsearch is used.

Trends and item delete would be queued for deletion but never get deleted so vector would grow.

Comment by Vladislavs Sokurenko [ 2019 Mar 11 ]

Fixed in:

  • pre-4.0.6rc1 r90872
  • pre-4.2.0rc1 (trunk) r90873
Comment by Oleg Morozov [ 2019 Mar 11 ]

Спасибо. Thanks.





[ZBX-15210] "autoreg_host" table is never cleaned up Created: 2018 Nov 26  Updated: 2024 Apr 10  Resolved: 2023 Sep 12

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.0.23, 4.0.1
Fix Version/s: 6.0.22rc1, 6.4.7rc1, 7.0.0alpha5, 7.0 (plan)

Type: Problem report Priority: Major
Reporter: Oleksii Zagorskyi Assignee: dimir
Resolution: Fixed Votes: 2
Labels: autoregistration, housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Team: Team I
Sprint: Sprint 47, Dec 2018, Sprint 48, Jan 2019, Sprint 51 (Apr 2019), Sprint 52 (May 2019), Sprint 53 (Jun 2019), Sprint 103 (Aug 2023), Sprint 104 (Sep 2023)
Story Points: 1

 Description   

The "autoreg_host" contains a technical runtime information about auto-registered host.
Later, even when such hosts being deleted from zabbix, related records stay in the table forever.

Would be nice to find out some solution to delete unneeded information from the table.

For example delete host, missing in "hosts" table and/or for missing IP addresses.
I think some good and safe balance could be found.



 Comments   
Comment by Vladislavs Sokurenko [ 2018 Nov 26 ]

I know that this is not enough but if it is monitored by Zabbix proxy and you remove proxy then it will also delete autoreg_host

zalex_ua Yes, I know that, because of constraints on DB level.

Comment by Konstantin Kornienko [ 2019 Jun 03 ]

We have more than 2000000 rows in this table.

Could you advise, is it safe to perform something like this:

DELETE from autoreg_host where autoreg_hostid in (

select a.autoreg_hostid
from autoreg_host a left join hosts h
on a.host = h.host
where h.host is null

)

 

(not tested)

Comment by Andris Zeila [ 2019 Jul 22 ]

Depends. It wont crash or something, but some of auto registration events might lose the associated autoreg_host record (not likely, but in theory it can happen). What it means - if the autoregistration event was not yet processed it will be ignored. Of course when server will receive next active checks request from this host it will create new autoreg_host record and process the new event, so in most cases it would just delay host auto registration by 2 minutes (the default RefreshActiveChecks interval).

So yes, it should be safe unless you have some very specific setup/requirements.

Comment by dimir [ 2023 Aug 17 ]

Dear users, would you be able to run the following SQL SELECT statement and provide the output:

UPD: please see the SQL in comment below.

Comment by Constantin Oshmyan [ 2023 Aug 17 ]

Hi dimir, you wrote:

would you be able to run the following SQL SELECT statement and provide the output:

The result on my system (with PostgreSQL) is following:

Error: ERROR: function from_unixtime(unknown) does not exist
Hint: No function matches the given name and argument types. You might need to add explicit type casts.
Position: 217
SQLState: 42883
ErrorCode: 0

Comment by dimir [ 2023 Aug 17 ]

Oh, sorry, my bad, this function is non general-SQL. Could you try this one:

UPD: please see the SQL in the comment below.

Comment by Constantin Oshmyan [ 2023 Aug 17 ]

Could you try this one

For me, the result is: 0 (zero).
At the same time, the table autoreg_host contains 621 record, some of them are very old (these hosts do not exist several years yet).

Comment by dimir [ 2023 Aug 17 ]

That probably means for those autoreg_host records the corresponding events are already deleted.

Please try the following one (but please include the whole output, where the query time is seen):

select count(ah.autoreg_hostid)
    from autoreg_host ah
    where not exists (
            select null
                from hosts h
                where ah.host=h.host)
        and not exists (
                select null
                    from events e
                    where ah.autoreg_hostid=e.objectid
                        and e.source=2
                        and e.object=3);

I'm interested in amount of returned records and time spent on the statement.

Comment by Constantin Oshmyan [ 2023 Aug 17 ]

Please try the following one (but please include the whole output, where the query time is seen):

The result is:

336

Query 1 of 1, Rows read: 1, Elapsed time (seconds) - Total: 0.212, SQL query: 0.197, Reading results: 0.015

Comment by dimir [ 2023 Aug 17 ]

Thanks a lot! zalex_ua would it be possible for you to run that also?

<zalex_ua> asked to get. But I'm not sure about expectation.

Comment by dimir [ 2023 Aug 17 ]

konstantin.kornienko any chance to run the above mentioned SQL (this comment) to see the query execution time?

Comment by dimir [ 2023 Aug 25 ]

Fixed in development branch for 6.0.

Comment by dimir [ 2023 Sep 05 ]

Overview

The table autoreg_host was never cleared before this fix. After this fix the records in autoreg_host table will be handled by Housekeeper. Whenever it is run it will delete records in it that do not reference any of the autoregistration events (autoreg_host.autoreg_hostid=event.objectid) and any of the hosts (autoreg_host.host=host.host) respecting the MaxHousekeeperDelete option.

Fixed in

Comment by Martins Valkovskis [ 2023 Sep 07 ]

Updated documentation:

Comment by Constantin Oshmyan [ 2023 Sep 07 ]

Thank you, guys!





[ZBX-15188] Housekeeper configuration (disabled for history and trends) not preserved after upgrade from 3.4 to 4.0 Created: 2018 Nov 20  Updated: 2018 Dec 11  Resolved: 2018 Dec 10

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F), Server (S)
Affects Version/s: 4.0.1
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Nicolas C. Assignee: Zabbix Support Team
Resolution: Cannot Reproduce Votes: 0
Labels: configuration, housekeeper, partitioning, upgrade
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Zabbix server on Debian 9 with official packages from http://repo.zabbix.com/zabbix/4.0/debian/, MariaDB database (Debian package).



 Description   

Steps to reproduce:

  1. On Zabbix 3.4.14, in the housekeeping configuration (Administration, General,  Housekeeping), uncheck "Enable internal housekeeping" for history and trends
  2. Upgrade to Zabbix 4.0.1

Result:
In the housekeeping configuration, previously unchecked "Enable internal housekeeping" boxes are now checked.

Expected:
Disabled housekeeping for history and trends should stay disabled after upgrade. It's an issue for people using MySQL partitioning on those tables because after the upgrade the housekeeper start housekeeping them, triggering a huge i/o activity on the database.



 Comments   
Comment by dimir [ 2018 Nov 20 ]

Someone needs to try this. If yes, this is a regression.

Comment by Aleksejs Petrovs [ 2018 Nov 20 ]

Hello Nicolas,

Unfortunately I can't reproduce in my test environment. I will try to do this on a bigger database. Are you able to reproduce this?

Regards,
Aleksejs!

Comment by Nicolas C. [ 2018 Nov 20 ]

Hello,

Unfortunately, I can't reproduce the issue. I've restored the snapshot from before the upgrade and re-did the upgrade and this time the housekeeper configuration is OK (unchanged).

We have a 240GB MySQL database with now close to 3 years of trends so there is no way we could have run with the housekeeper enabled without issues. I checked Apache's logs: no one made changes to the conf.

I found a similar issue where someone with database partitioning also had a busy housekeeper after upgrading: https://support.zabbix.com/browse/ZBX-15102

Regards,

Comment by Aleksejs Petrovs [ 2018 Dec 10 ]

Hello Nicolas,

I'm closing the issue since it's not reproducible.

The ZBX you've mentioned isn't related to the Server but related to the Proxy.

Regards,
Aleksejs!





[ZBX-14778] Housekeeper is trying to delete history for item prototypes Created: 2018 Aug 29  Updated: 2024 Apr 10  Resolved: 2018 Sep 03

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.0.21, 3.4.13, 4.0.0beta2
Fix Version/s: 3.0.22rc1, 3.4.14rc1, 4.0.0beta2, 4.0 (plan)

Type: Problem report Priority: Trivial
Reporter: Vladislavs Sokurenko Assignee: Andris Zeila
Resolution: Fixed Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Causes
caused by ZBXNEXT-1675 Add macros support for update intervals Closed
Team: Team A
Sprint: Sprint 41
Story Points: 0.5

 Description   

Steps 1:
1. Create discovery rule with item prototype

agent.ping["{#MACRO}"]

2. Start zabbix server and execute housekeeper:

zabbix_server -R housekeeper_execute

Actual (when specified history storage period passes)

query [txnlev:0] [delete from history_uint where itemid=28253 and clock<1535542285]

Expected:
Zabbix does not try to search itemid that cannot have history.

Steps 2:
1. Create discovery rule with item prototype

agent.ping["{#MACRO}"] with Update interval={#MACRO}

2. Start zabbix server and execute housekeeper

Actual:
Error is logged if debug log level is set

invalid history storage '{#MACRO}' for itemid '28253'

Expected:
There should be no macro substitution for item prototypes as they can't have history and no such errors should be logged



 Comments   
Comment by Andris Zeila [ 2018 Aug 29 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-14778

Comment by Andris Zeila [ 2018 Aug 30 ]

Released in:

  • pre-3.0.22rc1 r84341
  • pre-3.4.14rc1 r84342
  • pre-4.0.0beta2 r84343




[ZBX-14777] Items with invalid storage period are silently skipped by housekeeper Created: 2018 Aug 29  Updated: 2024 Apr 10  Resolved: 2018 Sep 03

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.4.13, 4.0.0beta1
Fix Version/s: 3.4.14rc1, 4.0.0beta2, 4.0 (plan)

Type: Problem report Priority: Trivial
Reporter: Vladislavs Sokurenko Assignee: Andris Zeila
Resolution: Fixed Votes: 0
Labels: history, housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Causes
caused by ZBXNEXT-1675 Add macros support for update intervals Closed
Team: Team A
Sprint: Sprint 41
Story Points: 0.5

 Description   

1. Create item with history storage period={$MACRO}
2. Set macro {$MACRO}=1dd
3. Start Zabbix server
4. Execute housekeeper manually:

 ./sbin/zabbix_server -R housekeeper_execute 

Expected:
Item must become unsupported (possibly during configuration sync) or there should be clear indication to user that error occurred.

Actual:
No housekeeping will be performed and no warnings will show up in the log.
If log level is increased to debug, then it's possible to see

invalid history storage '1dd' for itemid '28289'


 Comments   
Comment by Andris Zeila [ 2018 Aug 29 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-14777

Comment by Andris Zeila [ 2018 Aug 30 ]

Released in:

  • pre-3.4.14rc1 r84349
  • pre-4.0.0beta2 r84350




[ZBX-13696] Removal limit for sessions table Created: 2018 Apr 04  Updated: 2024 Apr 10  Resolved: 2018 Apr 11

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.4.7
Fix Version/s: 3.4.9rc1, 4.0.0alpha6, 4.0 (plan)

Type: Incident report Priority: Trivial
Reporter: Alexey Pustovalov Assignee: Andris Zeila
Resolution: Fixed Votes: 1
Labels: housekeeper, sessions
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
Team: Team A
Sprint: Sprint 31
Story Points: 0.25

 Description   

Currently Zabbix performs SQL query:

DBexecute("delete from sessions where lastaccess<%d", now - cfg.hk.sessions);

It does not have limit for rows to remove. In case of big table (for example, when housekeeper is not enabled for sessions, the query can take long time to remove records).



 Comments   
Comment by Andris Zeila [ 2018 Apr 05 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-13696

Comment by Andris Mednis [ 2018 Apr 09 ]

Successfully tested.

Comment by Andris Zeila [ 2018 Apr 09 ]

Released in:

  • pre-3.4.9rc1 r79478
  • pre-4.0.0alpha6 r79479




[ZBX-13362] Housekeeper potentially deleting the wrong row when PostgreSQL partitioning is used Created: 2018 Jan 18  Updated: 2024 Apr 10  Resolved: 2018 Mar 14

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 3.4.6
Fix Version/s: 3.0.16rc1, 3.4.8rc1, 4.0.0alpha5, 4.0 (plan)

Type: Problem report Priority: Minor
Reporter: Raymond Tau Assignee: Sergejs Paskevics
Resolution: Fixed Votes: 1
Labels: housekeeper, partitioning, postgresql
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Database: PostgreSQL 9.5


Team: Team C
Sprint: Sprint 27, Sprint 28, Sprint 29
Story Points: 1

 Description   

When Housekeeper is enabled and MaxHousekeeperDelete is not 0 (defaults to 5000), it is found that SQLs like this are issued, to cleanup deleted items:

delete from history_uint where ctid = any(array(select ctid from history_uint where itemid=36109 limit 5000))

However, if partitioning is done, such as https://www.zabbix.org/wiki/Higher_performant_partitioning_in_PostgreSQL, the ctid may not be unique across inherited tables, demonstrated in http://sqlfiddle.com/#!17/b9e7d/3/0, preparing the test table as below:

CREATE OR REPLACE FUNCTION main_insert_trigger()
RETURNS TRIGGER AS 'BEGIN IF (NEW.id < 10) THEN INSERT INTO child_1 VALUES (NEW.*); ELSIF (NEW.id >= 10 and NEW.id < 20) THEN INSERT INTO child_2 VALUES (NEW.*); ELSE RETURN NEW; END IF; RETURN NULL; END;'
LANGUAGE plpgsql;

create table main (id INTEGER, value INTEGER);
create table child_1 (check (id < 10)) INHERITS (main);
create table child_2 (check (id >=10 and id < 20)) INHERITS (main);

CREATE TRIGGER insert_main_trigger
    BEFORE INSERT ON main
    FOR EACH ROW EXECUTE PROCEDURE main_insert_trigger();
insert into main values (1,1),(10,2),(100,3);

Which

select tableoid, ctid,* from main;

would produce result like:

tableoid	ctid	id	value
17361	(0,1)	100	3
17364	(0,1)	1	1
17368	(0,1)	10	2


 Comments   
Comment by Raymond Tau [ 2018 Jan 18 ]

The SQL should have been originated from src/zabbix_server/housekeeper/housekeeper.c, function DBdelete_from_table.

Comment by Marc [ 2018 Jan 19 ]

Good catch!
Since table partitioning is not officially supported, I wonder whether this is possibly rather a feature request than a bug...

Comment by Raymond Tau [ 2018 Jan 22 ]

Consider one of the suggested method to tackle high housekeeper usage is using table partitioning, and I think it would be rather easy to fix the problem (by adding the same filter to the outer delete statement, the low cost of tid scan would make any additional filter cost nothing), I wish they would be willing to fix it.

For example, the SQL mentioned could be changed to:

delete from history_uint where (itemid=36109) and ctid = any(array(select ctid from history_uint where itemid=36109 limit 5000))
Comment by Viktors Tjarve [ 2018 Feb 16 ]

Testing with more than 6 mil. values in history table:

# explain analyze delete from history where ctid=any(array(select ctid from history where (itemid=23676) limit 1000000));
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Delete on history  (cost=24653.59..24693.69 rows=10 width=6) (actual time=7287.174..7287.174 rows=0 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..24653.58 rows=1000000 width=6) (actual time=0.125..2122.481 rows=1000000 loops=1)
           ->  Seq Scan on history history_1  (cost=0.00..290320.90 rows=11776014 width=6) (actual time=0.122..1075.796 rows=1000000 loops=1)
                 Filter: (itemid = 23676)
   ->  Tid Scan on history  (cost=0.01..40.11 rows=10 width=6) (actual time=2765.850..3790.410 rows=1000000 loops=1)
         TID Cond: (ctid = ANY ($0))
 Planning time: 0.163 ms
 Execution time: 7289.960 ms
(9 rows)
# explain analyze delete from history where (itemid=23676) and ctid=any(array(select ctid from history where (itemid=23676) limit 1000000)); 
                                                                  QUERY PLAN                                                                  
----------------------------------------------------------------------------------------------------------------------------------------------
 Delete on history  (cost=24653.59..24693.72 rows=9 width=6) (actual time=7967.702..7967.702 rows=0 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..24653.58 rows=1000000 width=6) (actual time=9.761..2277.569 rows=1000000 loops=1)
           ->  Seq Scan on history history_1  (cost=0.00..290320.90 rows=11776014 width=6) (actual time=9.758..1217.197 rows=1000000 loops=1)
                 Filter: (itemid = 23676)
   ->  Tid Scan on history  (cost=0.01..40.14 rows=9 width=6) (actual time=2956.973..4110.316 rows=1000000 loops=1)
         TID Cond: (ctid = ANY ($0))
         Filter: (itemid = 23676)
 Planning time: 0.173 ms
 Execution time: 7969.142 ms
(10 rows)
# explain analyze delete from history where (itemid=23676) and ctid=any(array(select ctid from history limit 1000000));
                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Delete on history  (cost=18936.30..18976.42 rows=9 width=6) (actual time=7320.094..7320.094 rows=0 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..18936.29 rows=1000000 width=6) (actual time=0.188..1964.917 rows=1000000 loops=1)
           ->  Seq Scan on history history_1  (cost=0.00..256462.32 rows=13543432 width=6) (actual time=0.185..922.297 rows=1000000 loops=1)
   ->  Tid Scan on history  (cost=0.01..40.14 rows=9 width=6) (actual time=2615.487..3730.859 rows=1000000 loops=1)
         TID Cond: (ctid = ANY ($0))
         Filter: (itemid = 23676)
 Planning time: 0.163 ms
 Execution time: 7322.085 ms
(9 rows)

No significant increase in execution time with the introduced changes.

Comment by Viktors Tjarve [ 2018 Feb 16 ]

Successfully tested.

Comment by Sergejs Paskevics [ 2018 Mar 05 ]

Implemented:

  • 3.0.16rc1 in r78294.
Comment by Sergejs Paskevics [ 2018 Mar 12 ]

Implemented:

  • 3.4.8rc1 in r78533,
  • 4.0.0alpha5 (trunk) in r78534




[ZBX-12758] Postgresql problem table missing index on r_eventid while MySQL InnoDB automatically adds it Created: 2017 Sep 21  Updated: 2024 Apr 10  Resolved: 2017 Dec 28

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.2.8rc1, 3.4.2rc1
Fix Version/s: 3.4.6rc1, 4.0.0alpha2, 4.0 (plan)

Type: Problem report Priority: Critical
Reporter: JB Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 0
Labels: events, housekeeper, index, oracle, performance, postgresql, problem
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS Linux release 7.3.1611 (Core)
postgresql-9.2.18-1.el7.x86_64


Issue Links:
Sub-task
part of ZBX-11426 Events removed by housekeeper can cau... Closed
Team: Team A
Sprint: Sprint 18, Sprint 19, Sprint 21, Sprint 22, Sprint 23, Sprint 24
Story Points: 2

 Description   

Housekeeper hangs when trying to delete old events. Manually trying to delete events take long time.

Trying to delete one minute of events:

zabbix=# explain analyze delete FROM events where age(to_timestamp(events.clock)) > interval '7 days 18:59:00';
                                                                    QUERY PLAN                                              
                      
----------------------------------------------------------------------------------------------------------------------------
----------------------
 Delete on events  (cost=0.00..212828.92 rows=1978096 width=6) (actual time=12120.870..12120.870 rows=0 loops=1)
   ->  Seq Scan on events  (cost=0.00..212828.92 rows=1978096 width=6) (actual time=5015.858..12119.974 rows=282 loops=1)
         Filter: (age((('now'::cstring)::date)::timestamp with time zone, to_timestamp((clock)::double precision)) > '7 days
 18:59:00'::interval)
         Rows Removed by Filter: 5919455
 Trigger for constraint c_acknowledges_2: time=3.971 calls=282
 Trigger for constraint c_alerts_2: time=3.933 calls=282
 Trigger for constraint c_event_tag_1: time=3.530 calls=282
 Trigger for constraint c_problem_1: time=4.266 calls=282
 Trigger for constraint c_event_recovery_1: time=5.458 calls=282
 Trigger for constraint c_event_recovery_2: time=4.280 calls=282
 Trigger for constraint c_problem_2: time=17571.926 calls=282
 Trigger for constraint c_event_recovery_3: time=12.291 calls=282
 Trigger for constraint c_alerts_5: time=4.937 calls=282
 Total runtime: 29736.092 ms

Added a new index, and was able to delete 1 hour of data faster than 1 minute without the index:

zabbix=# create index problem_3 on problem(r_eventid);
zabbix=# explain analyze delete FROM events where age(to_timestamp(events.clock)) > interval '7 days 18:00:00';
                                                                    QUERY PLAN                                              
                      
----------------------------------------------------------------------------------------------------------------------------
----------------------
 Delete on events  (cost=0.00..214479.72 rows=1993439 width=6) (actual time=12327.789..12327.789 rows=0 loops=1)
   ->  Seq Scan on events  (cost=0.00..214479.72 rows=1993439 width=6) (actual time=5173.513..12297.065 rows=26258 loops=1)
         Filter: (age((('now'::cstring)::date)::timestamp with time zone, to_timestamp((clock)::double precision)) > '7 days
 18:00:00'::interval)
         Rows Removed by Filter: 5930573
 Trigger for constraint c_acknowledges_2: time=166.392 calls=26258
 Trigger for constraint c_alerts_2: time=183.011 calls=26258
 Trigger for constraint c_event_tag_1: time=163.226 calls=26258
 Trigger for constraint c_problem_1: time=184.327 calls=26258
 Trigger for constraint c_event_recovery_1: time=210.511 calls=26258
 Trigger for constraint c_event_recovery_2: time=200.982 calls=26258
 Trigger for constraint c_problem_2: time=188.991 calls=26258
 Trigger for constraint c_event_recovery_3: time=183.742 calls=26258
 Trigger for constraint c_alerts_5: time=185.132 calls=26258
 Total runtime: 14012.007 ms
(14 rows)


 Comments   
Comment by JB [ 2017 Sep 21 ]

After adding the index the housekeeper finish in no time!

Comment by Vladislavs Sokurenko [ 2017 Sep 21 ]

could you please be so kind and do
show create table problem;

Comment by JB [ 2017 Sep 21 ]
zabbix=# \d+ problem
                                   Table "public.problem"
    Column     |  Type   |         Modifiers          | Storage | Stats target | Description 
---------------+---------+----------------------------+---------+--------------+-------------
 eventid       | bigint  | not null                   | plain   |              | 
 source        | integer | not null default 0         | plain   |              | 
 object        | integer | not null default 0         | plain   |              | 
 objectid      | bigint  | not null default 0::bigint | plain   |              | 
 clock         | integer | not null default 0         | plain   |              | 
 ns            | integer | not null default 0         | plain   |              | 
 r_eventid     | bigint  |                            | plain   |              | 
 r_clock       | integer | not null default 0         | plain   |              | 
 r_ns          | integer | not null default 0         | plain   |              | 
 correlationid | bigint  |                            | plain   |              | 
 userid        | bigint  |                            | plain   |              | 
Indexes:
    "problem_pkey" PRIMARY KEY, btree (eventid)
    "problem_1" btree (source, object, objectid)
    "problem_2" btree (r_clock)
    "problem_3" btree (r_eventid)
Foreign-key constraints:
    "c_problem_1" FOREIGN KEY (eventid) REFERENCES events(eventid) ON DELETE CASCADE
    "c_problem_2" FOREIGN KEY (r_eventid) REFERENCES events(eventid) ON DELETE CASCADE
Referenced by:
    TABLE "problem_tag" CONSTRAINT "c_problem_tag_1" FOREIGN KEY (eventid) REFERENCES problem(eventid) ON DELETE CASCADE
Has OIDs: no
Comment by Vladislavs Sokurenko [ 2017 Sep 21 ]

Thank you very much for your report, please note that issue does not occur in InnoDB because it will create indexes automatically, that's why it was missed!

Comment by JB [ 2017 Sep 21 ]

Thank you for quick response!

Comment by Valdis Kauķis (Inactive) [ 2017 Dec 05 ]

Successfully tested, including fixed conflicts in r75416, ZBX-12758-4.0 branch.

Comment by Vladislavs Sokurenko [ 2017 Dec 28 ]

Fixed in:

  • pre-3.4.6rc1 r76394
  • pre-4.0.0alpha2 (trunk) r76395
Comment by Brian Beaulieu [ 2017 Dec 28 ]

Had this index missing in 3.4.5 on MySQL.. installing/updating from RPM.
Added it manually.. is there a post-update SQL update that was needed to be run manually?

Comment by Vladislavs Sokurenko [ 2017 Dec 29 ]

Could you please be so kind and provide output of:

show create table problem;

No, there are no post-update SQL, on MySQL index is created automatically.
are you using partitionning ?

Comment by Brian Beaulieu [ 2017 Dec 29 ]

Hello,

Index didn't help
27780 seconds..

831 zabbix localhost zabbix Query 27780 Sending data select min(clock) from events where events.source=3 and events.object=4 and not exists (select null

No partitioning. Haven't had any issues with housekeeping until 3.4.5

CREATE TABLE `problem` (
  `eventid` bigint(20) unsigned NOT NULL,
  `source` int(11) NOT NULL DEFAULT '0',
  `object` int(11) NOT NULL DEFAULT '0',
  `objectid` bigint(20) unsigned NOT NULL DEFAULT '0',
  `clock` int(11) NOT NULL DEFAULT '0',
  `ns` int(11) NOT NULL DEFAULT '0',
  `r_eventid` bigint(20) unsigned DEFAULT NULL,
  `r_clock` int(11) NOT NULL DEFAULT '0',
  `r_ns` int(11) NOT NULL DEFAULT '0',
  `correlationid` bigint(20) unsigned DEFAULT NULL,
  `userid` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`eventid`),
  KEY `problem_1` (`source`,`object`,`objectid`),
  KEY `problem_2` (`r_clock`),
  KEY `problem_3` (`r_eventid`),
  CONSTRAINT `c_problem_1` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE,
  CONSTRAINT `c_problem_2` FOREIGN KEY (`r_eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 
Comment by Brian Beaulieu [ 2017 Dec 29 ]
mysql> select count(*) from events;
+----------+
| count(*) |
+----------+
| 14245852 |
+----------+
1 row in set (16.20 sec)

mysql> show create table events;
CREATE TABLE `events` (
  `eventid` bigint(20) unsigned NOT NULL,
  `source` int(11) NOT NULL DEFAULT '0',
  `object` int(11) NOT NULL DEFAULT '0',
  `objectid` bigint(20) unsigned NOT NULL DEFAULT '0',
  `clock` int(11) NOT NULL DEFAULT '0',
  `value` int(11) NOT NULL DEFAULT '0',
  `acknowledged` int(11) NOT NULL DEFAULT '0',
  `ns` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`eventid`),
  KEY `events_1` (`source`,`object`,`objectid`,`clock`),
  KEY `events_2` (`source`,`object`,`clock`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
Comment by Vladislavs Sokurenko [ 2017 Dec 29 ]

It would be great to see whole query that is slow if possible please, you could also do explain on the query, but it looks like you simply have lots of events, how many problems do you have ? Please also provide MySQL version

Comment by Ronald Schaten [ 2017 Dec 29 ]

Same problem here, after update from 3.2.6 to 3.4.5. My tables look like Brian described, the full query is:

select min(clock) from events where events.source=3 and events.object=0 and not exists (select null from problem where events.eventid=problem.eventid or events.eventid=problem.r_eventid)

The events table has 132463031 rows, problem table has 766423 rows. MySQL is "Ver 14.14 Distrib 5.7.20, for Linux (x86_64) using EditLine wrapper", running on Ubuntu 16.04.

mysql> explain select min(clock) from events where events.source=3 and events.object=0 and not exists (select null from problem where events.eventid=problem.eventid or events.eventid=problem.r_eventid)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: events
   partitions: NULL
         type: ref
possible_keys: events_1,events_2
          key: events_2
      key_len: 8
          ref: const,const
         rows: 15964577
     filtered: 100.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: problem
   partitions: NULL
         type: ALL
possible_keys: PRIMARY,c_problem_2
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 708846
     filtered: 19.00
        Extra: Range checked for each record (index map: 0x9)
2 rows in set, 3 warnings (0,00 sec)

The database is partitioned, but events and problem tables are not.

Upgrade notes for 3.4.0 recommend to decrease storage period for some events from 365d to 1d. Coming from Zabbix 3.2, I did that on my system. But only after doing the update, so of course the tables still contain quite many rows.

Comment by Vladislavs Sokurenko [ 2017 Dec 29 ]

Could you please also attach show create table problem; ?

Comment by Ronald Schaten [ 2017 Dec 29 ]

As mentioned, to me it looks like the one Brian attached this morning:

CREATE TABLE `problem` (
  `eventid` bigint(20) unsigned NOT NULL,
  `source` int(11) NOT NULL DEFAULT '0',
  `object` int(11) NOT NULL DEFAULT '0',
  `objectid` bigint(20) unsigned NOT NULL DEFAULT '0',
  `clock` int(11) NOT NULL DEFAULT '0',
  `ns` int(11) NOT NULL DEFAULT '0',
  `r_eventid` bigint(20) unsigned DEFAULT NULL,
  `r_clock` int(11) NOT NULL DEFAULT '0',
  `r_ns` int(11) NOT NULL DEFAULT '0',
  `correlationid` bigint(20) unsigned DEFAULT NULL,
  `userid` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`eventid`),
  KEY `problem_1` (`source`,`object`,`objectid`),
  KEY `problem_2` (`r_clock`),
  KEY `c_problem_2` (`r_eventid`),
  CONSTRAINT `c_problem_1` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE,
  CONSTRAINT `c_problem_2` FOREIGN KEY (`r_eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
Comment by Vladislavs Sokurenko [ 2017 Dec 29 ]

Can you try this please

explain select min(clock) from events where events.source=3 and events.object=0 and not exists (select null from problem where events.eventid=problem.eventid) and not exists (select null from problem where events.eventid=problem.r_eventid)\G
Comment by Ronald Schaten [ 2017 Dec 29 ]
mysql> explain select min(clock) from events where events.source=3 and events.object=0 and not exists (select null from problem where events.eventid=problem.eventid) and not exists (select null from problem where events.eventid=problem.r_eventid)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: events
   partitions: NULL
         type: ref
possible_keys: events_1,events_2
          key: events_2
      key_len: 8
          ref: const,const
         rows: 15973658
     filtered: 100.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 3
  select_type: DEPENDENT SUBQUERY
        table: problem
   partitions: NULL
         type: ref
possible_keys: c_problem_2
          key: c_problem_2
      key_len: 9
          ref: zabbix.events.eventid
         rows: 1
     filtered: 100.00
        Extra: Using index
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: problem
   partitions: NULL
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: zabbix.events.eventid
         rows: 1
     filtered: 100.00
        Extra: Using index
3 rows in set, 3 warnings (0,00 sec)
Comment by Vladislavs Sokurenko [ 2017 Dec 29 ]

Thanks,

Can you confirm that second query is faster ?

select min(clock) from events where events.source=3 and events.object=0 and not exists (select null from problem where events.eventid=problem.eventid or events.eventid=problem.r_eventid);
select min(clock) from events where events.source=3 and events.object=0 and not exists (select null from problem where events.eventid=problem.eventid) and not exists (select null from problem where events.eventid=problem.r_eventid);
Comment by Ronald Schaten [ 2017 Dec 29 ]

Yes. The second query took less than 18 minutes, the original one ran more than eight hours before I terminated it.

Comment by Vladislavs Sokurenko [ 2017 Dec 29 ]

Thanks allot created ZBX-13275

Comment by Oleksii Zagorskyi [ 2018 Feb 19 ]

I'm on mysql and started my zabbix from 3.4.
Upgraded to 3.4.6 from 3.4.5.
Is that ok that I have now 2 identical indexes for "problem" table?

  KEY `c_problem_2` (`r_eventid`),
  KEY `problem_3` (`r_eventid`),

Before upgrade I had only `c_problem_2` index.

Maybe the procedure should be more smart to not create duplicated indexes for mysql?

vso this procedure does not create duplicate indexes, you can run patch multiple times and it will no longer create index with the name 'problem_3'. Could you please attach outout of show create table problem; ?

zalex_ua those 2 lines are from the such output. I did already "drop index c_problem_2 on problem;" as it's a production database, which I had to dump and restore to maintain it, there was a reason.
You can see that the table had an index with name "c_problem_2", but zabbix code checks for index name

static int      DBpatch_3040006(void)
{
        if (FAIL == DBindex_exists("problem", "problem_3"))
                return DBcreate_index("problem", "problem_3", "r_eventid", 0);

Ahh, I had db backup (because I had to manage it), here is schema before I ran 3.4.6 after 3.4.5:

mysql> show create table problem \G
*************************** 1. row ***************************
       Table: problem
Create Table: CREATE TABLE `problem` (
  `eventid` bigint(20) unsigned NOT NULL,
  `source` int(11) NOT NULL DEFAULT '0',
  `object` int(11) NOT NULL DEFAULT '0',
  `objectid` bigint(20) unsigned NOT NULL DEFAULT '0',
  `clock` int(11) NOT NULL DEFAULT '0',
  `ns` int(11) NOT NULL DEFAULT '0',
  `r_eventid` bigint(20) unsigned DEFAULT NULL,
  `r_clock` int(11) NOT NULL DEFAULT '0',
  `r_ns` int(11) NOT NULL DEFAULT '0',
  `correlationid` bigint(20) unsigned DEFAULT NULL,
  `userid` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`eventid`),
  KEY `problem_1` (`source`,`object`,`objectid`),
  KEY `problem_2` (`r_clock`),
  KEY `c_problem_2` (`r_eventid`),
  CONSTRAINT `c_problem_1` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE,
  CONSTRAINT `c_problem_2` FOREIGN KEY (`r_eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
1 row in set (0.00 sec)

Strange, I performed a clean test: created DB schema+images+data from 3.4.5 sources, upgrade it by 3.4.7 binary, what I had and received `c_problem_2` index magically replaced by `problem_3` index.
Hhh, then I recreated such DB again, then dumped it, then restored and then upgraded and viola - I got duplicated indexes:

mysql> show create table problem \G
*************************** 1. row ***************************
       Table: problem
Create Table: CREATE TABLE `problem` (
  `eventid` bigint(20) unsigned NOT NULL,
  `source` int(11) NOT NULL DEFAULT '0',
  `object` int(11) NOT NULL DEFAULT '0',
  `objectid` bigint(20) unsigned NOT NULL DEFAULT '0',
  `clock` int(11) NOT NULL DEFAULT '0',
  `ns` int(11) NOT NULL DEFAULT '0',
  `r_eventid` bigint(20) unsigned DEFAULT NULL,
  `r_clock` int(11) NOT NULL DEFAULT '0',
  `r_ns` int(11) NOT NULL DEFAULT '0',
  `correlationid` bigint(20) unsigned DEFAULT NULL,
  `userid` bigint(20) unsigned DEFAULT NULL,
  PRIMARY KEY (`eventid`),
  KEY `problem_1` (`source`,`object`,`objectid`),
  KEY `problem_2` (`r_clock`),
  KEY `c_problem_2` (`r_eventid`),
  KEY `problem_3` (`r_eventid`),
  CONSTRAINT `c_problem_1` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE,
  CONSTRAINT `c_problem_2` FOREIGN KEY (`r_eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
1 row in set (0.00 sec)

For both DBs (just created from schema+images+data SQLs and recovered from dump), upgrade debug log is identical:

z345:
 20173:20180219:125955.039 starting automatic database upgrade
 20173:20180219:125955.039 query [txnlev:1] [begin;]
 20173:20180219:125955.039 query [txnlev:1] [show index from problem where key_name='problem_3']
 20173:20180219:125955.040 query [txnlev:1] [create index problem_3 on problem (r_eventid)]
 20173:20180219:125955.058 query [txnlev:1] [update dbversion set optional=3040006]
 20173:20180219:125955.059 query [txnlev:1] [commit;]
 20173:20180219:125955.059 completed 100% of database upgrade
 20173:20180219:125955.059 database upgrade fully completed

z345recovered:
 20351:20180219:130053.868 starting automatic database upgrade
 20351:20180219:130053.868 query [txnlev:1] [begin;]
 20351:20180219:130053.868 query [txnlev:1] [show index from problem where key_name='problem_3']
 20351:20180219:130053.869 query [txnlev:1] [create index problem_3 on problem (r_eventid)]
 20351:20180219:130053.886 query [txnlev:1] [update dbversion set optional=3040006]
 20351:20180219:130053.887 query [txnlev:1] [commit;]
 20351:20180219:130053.887 completed 100% of database upgrade
 20351:20180219:130053.887 database upgrade fully completed

Of course mysqldump, taken from created database, contains a line

  KEY `c_problem_2` (`r_eventid`),

while schema.sql does not.

Fuhh, probably it should not be ignored, as it really happened in production

vso this is very unfortunate that MySQL acts differently when you restore backup, we could add another condition for 'c_problem_2' and would have to rename this index if it exists.

zalex_ua maybe. Please take care on it further. And probably 3.4.6 schema should be checked the same way, because you have added the index creation explicitly to schema.sql.

vso It's best if you create a separate bug report.

zalex_ua Reported as ZBX-13498
CLOSED





[ZBX-11590] Zabbix Server stop data gathering Created: 2016 Dec 13  Updated: 2018 Apr 20  Resolved: 2018 Apr 20

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G), Server (S)
Affects Version/s: 3.2.1
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Jose Augusto Ferrronato Assignee: Unassigned
Resolution: Unsupported version Votes: 0
Labels: action, alerter, database, history, housekeeper, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos 6.8
MySQL 5.5.54
Zabbix Server 3.2.1


Attachments: PNG File cpu_user.png     PNG File cpu_user_graph.png     PNG File data_gathering.png     JPEG File history_syncer.jpg     PNG File internal_process.png     PNG File screenshot-1.png     File zabbix_agentd.log-20161210.gz     JPEG File zabbix_info.jpg     File zabbix_server.conf     Zip Archive zabbix_server_log.zip    

 Description   

Zabbix has consistently failed to send alerts due to collection issues.
Zabbix server also stops collecting the data, generating the error of "Zabbix agent on {HOST.NAME} is unreachable for 5 minutes" in the collection machines, after a time zabbix itself recovers Usually, other times we have to restart the zabbix server service.

The graph shows "no data" and after a certain amount of time (an average of 5 minutes later) it fills the graphic of the missing data and re-collects when the collection returns automatically, when manual restart, the graph is no longer filled and all collection data is lost.

This error happened after a migration from zabbix 2.4 to zabbix 3.2

The housekeeper is disabled, even with the housekeeper enabled the problem continued.



 Comments   
Comment by Aleksandrs Saveljevs [ 2016 Dec 13 ]

Related issue: ZBX-11586.

Comment by Aleksandrs Saveljevs [ 2016 Dec 13 ]

Similar to ZBX-11586, it seems to do with poller and history syncer busyness.

Would it be possible to post screenshots that compare their busyness before and after the upgrade? How many pollers and history syncers do you have? Did you change any configuration after the upgrade except Zabbix version? Do you use proxies?

Based on "internal_process.png", history syncers get loaded every hour. Do you have an idea what is causing this? Do you get an unusual item traffic every hour? If so, what are those items?

Comment by Adriane Ázara [ 2016 Dec 13 ]

Inserted the image of the last 3 months

Update was held on 11/21

The parameters that were changed were those of cache and those of polles.

There is 2 zabbix proxy in different locales.

Comment by Jose Augusto Ferrronato [ 2016 Dec 13 ]

Hi,
We have no specify item on this, here the items we have:

Items report
==================

[INFO] Total de items: 72698
[INFO] Items enabled: 70005
[INFO] Items disabled: 134
[ERRO] Items not supported: 2559

Items by type
==============
[INFO] Items Zabbix Agent (passive): 593
[INFO] Items Zabbix Agent (active): 0
[INFO] Items Zabbix Trapper: 213
[INFO] Items Zabbix Internal: 73
[INFO] Items Zabbix Agreggate: 0
[INFO] Items SNMPv1: 174
[INFO] Items SNMPv2: 66396
[INFO] Items SNMPv3: 0
[INFO] Items SNMNP Trap: 0
[INFO] Items JMX: 0
[INFO] Items IPMI: 0
[INFO] Items SSH: 0
[INFO] Items Telnet: 0
[INFO] Items Web: 180
[INFO] Items Simple Check: 1504
[INFO] Items Calculated: 1261
[INFO] Items External Check: 863
[INFO] Items Database: 1487

Another info

[INFO] Number of items with icmpping key history for more than 7 days: 146
[INFO] Number of non-numeric items (active): 6822

NVPS: 158.7

Comment by Jose Augusto Ferrronato [ 2016 Dec 13 ]

The history syncer increase a lot after the upgrade

Comment by Matthew ISIDORE [ 2016 Dec 14 ]

I currently have the same problem did you try to lower
CacheSize=512M at like 128M or less ?
HistoryCacheSize=512M to default ?
HistoryIndexCacheSize=128M to default ?
TrendCacheSize=512M to default
ValueCacheSize=512M to default

Why i'm telling you that ? Just because i suppose that if your cache is too big the history syncer process will have too many item to process and (refering to your confing file) will do that every 120 seconds. I think you don't need as much cache as that. I have 480NVPS and i use less cache than you.

That's my only clue to solver your problem ^^
Tell me if it change anything for you

Edit:
Take note that if the server crash it's because the value are too low

Comment by Jose Augusto Ferrronato [ 2016 Dec 14 ]

The error already occurred before we changed the Cache values. It continued even after we adjusted with larger values. Thanks for the help.





[ZBX-11426] Events removed by housekeeper can cause trigger to be stuck in problem state Created: 2016 Nov 04  Updated: 2024 Apr 10  Resolved: 2017 Nov 28

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.2.1
Fix Version/s: 3.2.9rc1, 3.2.11rc1, 3.4.3rc1, 3.4.5rc1, 4.0.0alpha1, 4.0 (plan)

Type: Problem report Priority: Critical
Reporter: Andris Zeila Assignee: Andrea Biscuola (Inactive)
Resolution: Fixed Votes: 6
Labels: events, housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Causes
causes ZBX-13140 Potential leak of problem and events Open
causes ZBX-13275 Slow Housekeeping of events Closed
causes ZBX-14312 Proxy->Agent communication drops inte... Closed
causes ZBX-13277 Housekeeper does not delete old event... Closed
Duplicate
Sub-task
depends on ZBX-12758 Postgresql problem table missing inde... Closed
Team: Team A
Sprint: Sprint 14, Sprint 15, Sprint 16, Sprint 17, Sprint 19, Sprint 21, Sprint 22
Story Points: 3.5

 Description   

When housekeeper removes open problem event the trigger value/problem count is not updated. If this was the last open problem event then trigger will be stuck in problem state and keep generating recovery events.

To fix the current situation recovery events must update trigger value/problem count event if there were no open problems

To avoid this from happening in future housekeeper must not remove open problem events.



 Comments   
Comment by Oleksii Zagorskyi [ 2016 Nov 08 ]

ZBX-11439 is similar/related.

Comment by Aleksandrs Saveljevs [ 2016 Nov 08 ]

ZBX-11412 may be related, too.

Comment by Andris Zeila [ 2016 Nov 10 ]

ZBX-11454 was created to deal with the fallout while this issue will be kept open to fix the housekeeper.

Comment by Alexander Vladishev [ 2017 Aug 10 ]

ZBX-11768 also may be related.

Comment by Andrea Biscuola (Inactive) [ 2017 Sep 20 ]

Fixed in svn://svn.zabbix.com/branches/dev/ZBX-11426

Modified the filters in the housekeeping_events() function for checking through a subquery if an event have an associated problem in the problem table. Remove only the events without a corresponding record (open or closed).
Also, reordered the deletion query for an easier adding of the filter.

Comment by Andris Zeila [ 2017 Sep 20 ]

Successfully tested, please review minor changes in r72783

abs Looks good. CLOSED

Comment by Andrea Biscuola (Inactive) [ 2017 Sep 26 ]

Released in:

  • pre-3.2.9rc1 r72945-r72945
  • pre-3.4.3rc1 r72947
  • pre-4.0.0alpha1 (trunk) r72948
Comment by richlv [ 2017 Sep 28 ]

this might be worth documenting in the housekeeper section (and maybe also in the upgrade notes for 3.2.9 and 3.4.3)

Comment by Andrea Biscuola (Inactive) [ 2017 Sep 29 ]

richlv

Maybe a good idea, as now the housekeeper behaviour is explicit regarding how some types of events are kept or deleted. The issue itself was already mitigated in the past through another task and this is just the completion of that work.

Comment by richlv [ 2017 Oct 06 ]

indeed, currently the behaviour seems to be completely undocumented

Comment by Andris Zeila [ 2017 Oct 13 ]

With event housekeeping period set to 1d (or close to it) there is a danger of recovery events being removed while the recovered events are still in problem table.

I'm not sure if it's worth adding more complexity to event deleting queries (although it would be the safest way). I think acceptable workaround would be to call housekeeping_problems() before housekeeping_events() function. As the event housekeeping period cannot be less than problem cleanup period (24h) this would ensure that problems are removed from problem table before corresponding events are removed from events table.

wiper So it was decided to have proper fix. To do it we need to add problem.r_eventid index and also check for r_eventid when removing events.
Still it's better to swap housekeeping_problems() and housekeeping_events() calls so problems table could have potentially less records when housekeeping_events() is called.

Comment by Andrea Biscuola (Inactive) [ 2017 Nov 14 ]

Fixed in svn://svn.zabbix.com/branches/dev/ZBX-11426

Swap the calls to housekeeping_problems() and housekeeping_events().
Logically, it's safer to remove old problems first and after that the
related events if necessary.

Also added a filter to the events delete queries for checking the
problem.r_eventid field.
In this way we ensure that any event that is associated with a problem
record in any way (being it a problem or recovery event), will not be
deleted before the problem record itself, but only after.

Comment by Andris Zeila [ 2017 Nov 16 ]

Successfully tested, please review coding style fixes in r74679

abs style fix ok. CLOSED

Comment by Andrea Biscuola (Inactive) [ 2017 Nov 17 ]

Released in:

  • pre-3.2.11rc1 r74716
  • pre-3.4.5rc1 r74717
  • pre 4.0.0alpha1 (trunk) r74718
Comment by Andrea Biscuola (Inactive) [ 2017 Nov 17 ]

martins-v

The housekeeper final behaviour after this change is that an event
will be deleted ONLY if is not associated with a problem in any way.
This mean that if an event is either a PROBLEM or RECOVERY event,
it will not be deleted until the related problem record is removed.

Also, now the housekeeper will delete problems first and events
after, for avoiding potential problems with stale events or problem
records.





[ZBX-11109] Zabbix Proxy can't start with HousekeepingFrequency=0 Created: 2016 Aug 18  Updated: 2017 May 30  Resolved: 2016 Aug 25

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P)
Affects Version/s: 3.0.4
Fix Version/s: 3.0.5rc1, 3.2.0beta1

Type: Incident report Priority: Major
Reporter: Evgeny Kravchenko Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: housekeeper, runtimecontrol
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 14.04.5 LTS
Linux 4.4.0-34-generic



 Description   

Zabbix Proxy can not start if set the HousekeepingFrequency = 0



 Comments   
Comment by Aleksandrs Saveljevs [ 2016 Aug 22 ]

Seems to be the case indeed. Setting to "Confirmed".

Comment by Sergejs Paskevics [ 2016 Aug 23 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-11109

Comment by Andris Zeila [ 2016 Aug 24 ]

Successfully tested

Comment by Sergejs Paskevics [ 2016 Aug 24 ]

Fixed in:

  • pre3.0.5rc1 r61907
  • pre3.2.0alpha3 (trunk) r61908




[ZBX-10649] Schedule housekeeper manually on Zabbix 2.2 Created: 2016 Apr 11  Updated: 2017 May 30  Resolved: 2016 Apr 12

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F), Server (S)
Affects Version/s: 2.2.10
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Bezaleel Ramos Assignee: Unassigned
Resolution: Won't fix Votes: 0
Labels: database, housekeeper, zabbix_server
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Oracle 11g
Zabbix: 2.2.11
Apache: 2.2.15
Red Hat 6



 Description   

Hello,

We want to disable the housekeeper in Zabbix and do it manually. We would like to know which criterions are required to execute the housekeeper, because the (AWR)Report is generated in Oracle 11g as shown below.

> delete from history_log where itemid=:"SYS_B_0" and clock<:"SYS_B_1"

What is SYS_B_0 and SYS_B_1?
Does exist another way to delete ?

Regards

Beza



 Comments   
Comment by richlv [ 2016 Apr 11 ]

this tracker is for bug reports. please see http://zabbix.org/wiki/Getting_help for support options

Comment by Aleksandrs Saveljevs [ 2016 Apr 12 ]

Closing as "Won't fix" as per the comment above.





[ZBX-10015] Global housekeeper settings do not work as they should to show possible time period on graphs without data Created: 2015 Oct 28  Updated: 2019 Dec 10

Status: Open
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F)
Affects Version/s: 2.2.10, 2.4.6, 3.0.0alpha3
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: graphs, housekeeper, patch, timeperiodselection
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File show_correct_range-disallow_edit.diff    
Issue Links:
Duplicate

 Description   

Here are two aspects which are related each other.
(1)
Short history of the topic:
Fine grained control was introduced in a ZBXNEXT-1649.

In r40560 (a ZBXNEXT-2016 commit to trunk) $config['hk_history_global'] was taken into account, trends were skipped here:

// override item history setting with housekeeping settings
			if ($config['hk_history_global']) {
				$real_item['history'] = $config['hk_history'];

Later, in r40649 (a ZBX-4063 commit to trunk) there was a fix for trend:

$trendsEnabled = $config['hk_trends_global'] ? ($config['hk_trends'] > 0) : ($item['trends'] > 0);
			if (!$trendsEnabled

But both these changes were related to PNG images building only, but not to displaying possible time periods for selection on Grpah/SimpleGraph pages.

Yes, this logic is actual for graphs which don't have history/trends data in database yet, because existing data (min(clock)) have a priority when estimating possible graph range.

So now if you set an item keep history/trends settings and this item doesn't have any history/trend yet, when you open a graph with the item, you will see possible maximal period from items settings, but not globally defined.
That's wrong as for user friendly perspective, because if you open item configuration form, zabbix frontend explicitly says that the value is "Overridden by global housekeeping settings (NN days)".

(2)
Spec for the ZBXNEXT-1649 has a "Discussed topics" section, quoting:

Housekeeper parameters are not disabled in the item configuration form to allow changing them before enabling housekeeper on global level - otherwise housekeeper could remove data before the configuration is changed. The parameters must be controlled even if housekeeper is disabled for items state.

I suppose there was a logical mistake in the quote, I suppose correctly it should be like this, changed part is highlighted:

Housekeeper parameters are not disabled in the item configuration form to allow changing them before disabling housekeeper on global level - otherwise housekeeper could remove data before the configuration is changed. The parameters must be controlled even if housekeeper is disabled for items state.

but it doesn't matter already.

Note that after the initial ZBXNEXT-1649 there was the ZBXNEXT-2016 which changed the picture a lot.
Currently the quote is not actual, because internal housekeeping for history/trend tables may be enabled/disabled independently of the global "Data storage period" for the tables.

So, for better user friendly I suggest to disallow those values editing in item configuration form if they are defined on global level.
I believe it will be more consistent.

Current inconsistency may mislead when you for example perform partitioning and test then graphs building.
In such cases it's general practice to truncate history tables before partitioning and change global housekeeper settings.

Suggested patch fixes all issues I've described.
Patch based on v2.4.6






[ZBX-10012] Housekeeper tries to delete data which could not exist (0 days keep history/trends) Created: 2015 Oct 27  Updated: 2017 May 30  Resolved: 2015 Nov 05

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.2.10, 2.4.6, 3.0.0alpha3
Fix Version/s: None

Type: Incident report Priority: Minor
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Won't fix Votes: 1
Labels: housekeeper, sql
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Investigated in zabbix server v 3.0.0 alpha4
Suppose we have numeric items which have keep history or keep trends set to 0 days.
According to documentation, zabbix server will not store history or trends if corresponding keep parameter is 0.
https://www.zabbix.com/documentation/2.4/manual/config/items/history_and_trends

But housekeeper process still each time is trying to delete data for such items, disregarding that they could not store any data. From 2nd and for following executions:

 10286:20151027:175335.549 In housekeeping_history_and_trends() now:1445961215
 10286:20151027:175335.549 In hk_history_delete_queue_prepare_all()
 10286:20151027:175335.550 query [txnlev:0] [select i.itemid,i.value_type,i.history,i.trends from items i,hosts h where i.hostid=h.hostid and h.status in (0,1)]
 10286:20151027:175335.550 End of hk_history_delete_queue_prepare_all()
 10286:20151027:175335.550 query without transaction detected
 10286:20151027:175335.550 query [txnlev:0] [delete from history_uint where itemid=27785 and clock<1445961215]
 10286:20151027:175335.550 query without transaction detected
 10286:20151027:175335.550 query [txnlev:0] [delete from trends where itemid=27796 and clock<1445961215]
 10286:20151027:175335.551 query without transaction detected
...

note that the time stamps in "clock<1445961215" conditions are always
= now(), which is sort of correct.

The same happens if you change some item "keep" parameter to 0 days when server is running - those SQLs will start to appear.

One interesting point that such unneeded delete SQLs performed starting from 2nd housekeeper execution only.

I've tested another case - when an item has history/trends data, but on server start the items settings is 0 for keep history or trends.
In this case housekeeper correctly starts to delete already outdated data in 4*HousekeepingFrequency batches (note, from 1st housekeeper execution).
But would be good if it would stop to do unneeded deletions if it deleted everything (has deleted 0 rows during last batch) and the item has 0 days in settings.

Yeah, I understand that this request will improve housekeeper just a little, but still ...
I always look with suspicion on unneeded SQL queries, and it has happened this time too

Also, taking into account that those deletes performed from 2nd housekeeper execution only, I suppose that there could be a small bug.



 Comments   
Comment by Alexander Vladishev [ 2015 Nov 05 ]

It works as expected! Between housekeeper executions an user can enable these parameters to collect some data and then to set these parameters in 0.

I close the issue.

Comment by Oleksii Zagorskyi [ 2015 Nov 05 ]

Sasha, but the same may happen when, for example, after server is started someone sends data for periods older than keep history/trends.
In this case housekeeper (HK) will not see these outdated data until next server restart.

This is a real case I once saw: ~2000 specific hardware boxes without hardware clock are running zabbix agent active.
After reboot (it happens quite often as I see) the boxes set real time as 2000-01-01 11:00:00 and start to send agent active data with such outdated time stamps and these data are stored in database as is. After 1-6 hours the boxes update real time clock (supposedly by NTP) and then send data with correct time stamps.
HK will notice and delete these data only after server restart (after performing select min(clock)).
This real case is not covered by HK logic and I don't think we can do something with this.

What I want to say by this real example - there always will be edge cases when zabbix server will notice existing data (which should be deleted) only after restart.
So, I still think that all I said in issue description is still actual and worth to be considered.

Related request is ZBXNEXT-2860, which can change existing HK logic a lot.

Comment by Oleksii Zagorskyi [ 2015 Nov 05 ]

I'd reopen it, but don't want to do it myself





[ZBX-10001] misleading duplicated log lines for housekeeper Created: 2015 Oct 26  Updated: 2017 May 30  Resolved: 2015 Dec 21

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 2.4.6, 3.0.0alpha4
Fix Version/s: 3.0.0alpha5

Type: Incident report Priority: Minor
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: housekeeper, logging, troubleshooting
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Run housekeeper in DebugLevel=4, notice 2 duplicated lines after housekeeper activity:

 6384:20151026:170613.154 housekeeper [deleted 375 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 0.212998 sec, idle for 1 hour(s)]
  6384:20151026:170613.154 housekeeper [deleted 375 hist/trends, 0 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 0.212998 sec, idle for 1 hour(s)] 

They may mislead as with DbugLevel=3 we see only one such line.

The problem is in a function:

void	__zbx_zbx_setproctitle(const char *fmt, ...)
{
#if defined(HAVE_FUNCTION_SETPROCTITLE) || defined(PS_OVERWRITE_ARGV) || defined(PS_PSTAT_ARGV)
    char	title[MAX_STRING_LEN];
    va_list	args;

    va_start(args, fmt);
    zbx_vsnprintf(title, sizeof(title), fmt, args);
    va_end(args);

    zabbix_log(LOG_LEVEL_DEBUG, "%s", title);
#endif

which prints the same line as it gets in an input.

I'd suggest to add a prefix before value, for example "process title set: "

zabbix_log(LOG_LEVEL_DEBUG, "process title set: %s", title);


 Comments   
Comment by Oleksii Zagorskyi [ 2015 Oct 26 ]

It will help for many other cases too, for example where in Debug log a single line may look a bit unexpected, like:

 27180:20151026:180541.042 housekeeper [removing old history and trends]
...
 27180:20151026:180541.043 housekeeper [removing deleted items data]
...
 27180:20151026:180541.044 housekeeper [removing old events]

and not for housekeeper process only !

Comment by Viktors Tjarve [ 2015 Nov 16 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-10001

Comment by Oleksii Zagorskyi [ 2015 Nov 16 ]

Discussed with Viktors suggested changes. I'd still go for the simple change I suggested in description.
Maybe just worth to combine the long string

 ("%s [deleted %d hist/trends, %d items, %d events, %d sessions, %d alarms, %d audit "
				"items in " ZBX_FS_DBL " sec, %s]",
				get_process_type_string(process_type), d_history_and_trends, d_cleanup, d_events,
				d_sessions, d_services, d_audit, sec, sleeptext)

to a variable and then use it in zabbix_log and zbx_setproctitle calls.

Comment by Viktors Tjarve [ 2015 Nov 17 ]

Simplified fix in development branch svn://svn.zabbix.com/branches/dev/ZBX-10001 r56762

Comment by Oleksii Zagorskyi [ 2015 Nov 17 ]

I like the change now, in general it's what I've requested for.

Comment by Andris Zeila [ 2015 Nov 20 ]

Successfully tested

Comment by Viktors Tjarve [ 2015 Nov 20 ]

Released in:

  • pre-3.0.0alpha5 r56833
Comment by Aleksandrs Saveljevs [ 2015 Nov 20 ]

(1) This breaks formatting a bit:

 	char	title[MAX_STRING_LEN];
+	const char	*__function_name = "__zbx_zbx_setproctitle";
 	va_list	args;

By convention, __function_name should be the first variable definition in a funtion and variable names should be aligned, too.

asaveljevs It should also be discussed whether we should always have an "End of %s()" line if we have "In %s()" line.

zalex_ua Good point, it indeed can be suspicious that for some functions we can see both "In ..." and "End of ..." but for other functions we see only "In ..."
I, as an user who loves to investigate zabbix debuglogs, could suggest is to use "Start of ..." and "End of ...." text in cases when we log both lines and use "In ..." if we need to log only one line as this function does not have to much things to worth log on start and on end

viktors.tjarve Maybe in this case it is acceptable to leave it without "In" or "Start" and simply have it as:

zabbix_log(LOG_LEVEL_DEBUG, "%s() title:'%s'", __function_name, title);

viktors.tjarve RESOLVED in development brunch svn://svn.zabbix.com/branches/dev/ZBX-10001 r56983.

asaveljevs Looks good! CLOSED.

viktors.tjarve Released in:

  • pre-3.0.0alpha5 r57001




[ZBX-9278] Better housekeeper logic for deleting history of deleted/changed items Created: 2015 Feb 03  Updated: 2019 Dec 10

Status: Open
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F), Server (S)
Affects Version/s: 2.4.3
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Unresolved Votes: 5
Labels: housekeeper, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   

2 sub issues described in comments below.



 Comments   
Comment by Oleksii Zagorskyi [ 2015 Feb 03 ]

(1) [server]
Next I'll use just "HK" shortening for "housekeeper" word.

First HK invocation looks like this:

 24358:20150121:134433.198 housekeeper [removing old history and trends] 
 24358:20150121:134433.198 In housekeeping_history_and_trends() now:1421840673
 24358:20150121:134433.198 In hk_history_delete_queue_prepare_all()
 24358:20150121:134433.198 query [txnlev:0] [select itemid,min(clock) from history group by itemid]
 24358:20150121:134433.201 query [txnlev:0] [select itemid,min(clock) from history_str group by itemid]
 24358:20150121:134433.201 query [txnlev:0] [select itemid,min(clock) from history_log group by itemid]
 24358:20150121:134433.201 query [txnlev:0] [select itemid,min(clock) from history_uint group by itemid]
 24358:20150121:134433.206 query [txnlev:0] [select itemid,min(clock) from history_text group by itemid]
 24358:20150121:134433.207 query [txnlev:0] [select itemid,min(clock) from trends group by itemid]
 24358:20150121:134433.207 query [txnlev:0] [select itemid,min(clock) from trends_uint group by itemid]
 24358:20150121:134433.220 query [txnlev:0] [select i.itemid,i.value_type,i.history,i.trends from items i,hosts h where i.hostid=h.hostid and h.status in (0,1)]
 24358:20150121:134433.221 End of hk_history_delete_queue_prepare_all()
...
 24358:20150121:134433.871 housekeeper [removing deleted items data]
 24358:20150121:134433.871 In housekeeping_cleanup()
 24358:20150121:134433.871 query [txnlev:0] [select housekeeperid,tablename,field,value from housekeeper where tablename in ('history','history_log','history_str','history_text','history_uint','trends','trends_uint') order by tablename]
 24358:20150121:134433.871 query without transaction detected
 24358:20150121:134433.871 query [txnlev:0] [delete from history where itemid=27853 limit 5000]
...

next invocations look like this:

 24358:20150121:134557.509 housekeeper [removing old history and trends]
 24358:20150121:134557.509 In housekeeping_history_and_trends() now:1421840757
 24358:20150121:134557.509 In hk_history_delete_queue_prepare_all()
 24358:20150121:134557.509 query [txnlev:0] [select i.itemid,i.value_type,i.history,i.trends from items i,hosts h where i.hostid=h.hostid and h.status in (0,1)]
 24358:20150121:134557.511 End of hk_history_delete_queue_prepare_all()
...
 24358:20150121:134558.129 housekeeper [removing deleted items data]
 24358:20150121:134558.129 In housekeeping_cleanup()
 24358:20150121:134558.129 query [txnlev:0] [select housekeeperid,tablename,field,value from housekeeper where tablename in ('history','history_log','history_str','history_text','history_uint','trends','trends_uint') order by tablename]
 24358:20150121:134558.130 query without transaction detected
 24358:20150121:134558.130 query [txnlev:0] [delete from history_uint where itemid=27721 limit 5000]
 24358:20150121:134558.199 End of housekeeping_cleanup():5000
...

We see that only for first invocation the HK scans all history tables (using indexes) to know which itemid has history and which doesn't - let's call it as "delete queues cache".
For next invocations HS uses that internal cache to not scan the tables again.

I want to pay your attention to [removing deleted items data] stage.

For example when I upgraded to 2.4 I forgot to enable HK in Administration menu
At some point I noticed that I have HK table with 96M rows and it's ~5GB size.
Yes, it's unusual case - caused by wrong LLD rule with full OID tree scan and creating hundreds of items and deleting them afterward.
No history were collected at all for that items.

When I enabled HK - it took ~10 hours for first activity and deleted 99,99% rows of the HK table.
Of course it's not so bad, but still zabbix could avoid to waste that DB resources.

According to the debuglog above the HK did 95M SQLs like this (for different history tables of course):

delete from history where itemid=27853 limit 5000

Most big my history tables are 300G, 200G, 150G, so you can imagine.
To perform such each query HK needs to scan the table every time. Innodb pool doesn't help too much in such case.

Question is - why HK tries to delete history for items while it is practically aware that these items don't have history at all ("delete queues cache").
It would be logical to just delete corresponding records from the HK table.

Of course there is a question - is it possible to determine had an item any history before it was deleted when zabbix server was already running - I don't know.
I don't know does zabbix server keep the "delete queues cache" fresh for every received value or not - need help from devs.
If it is - then my suggestion could be implemented without significant code changes.

Feel free to ask me if something is not clear.

Comment by Oleksii Zagorskyi [ 2015 Feb 03 ]

(2) [frontend]
It's addition to (1) but should be taken into account in any case
When we delete an item in zabbix frontend - 7 records inserted to the HK table (5 for histories, 2 for trends).
We do that disregarding on current item type - to be sure that all possible history, which supposedly could be collected in the past when item supposedly was different type, will be deleted from database.

Such a logic leads to bad performance - described in (1)

Even if (1) will be fixed as suggested - here is next problem which will not be resolved.

Suppose I had an item in template (linked to hundreds of hosts) with "integer" type. After one year I changed history type for the item to "float".
I'm not absolutely sure, but looks like In this case existing history/trends for that hundreds integer items will stay in database FOREVER, which of course is very bad.

What I suggest is to change logic when we change item type or delete it.

When change item type frontend should do:

1.   delete from HK where tablename="new_item_type" and value="itemid"; <- sanity check for cases if item type was changed front and back
1.1. the same for trends if item is numeric                             <- sanity check for cases if item type was changed front and back
2.   INSERT INTO housekeeper (tablename,field,value,housekeeperid) VALUES ('previous_item_type','itemid','29848','323')
2.1. the same for trends if item is numeric

It will cover the problem I described in this comment above.

Then in case if item is deleted, we need to just insert 1 (or 2 if item is numeric) record to the HK table - for current item type only.

Comment by Oleksii Zagorskyi [ 2015 Nov 06 ]

Most likely all idea I posted here will be taken into account in ZBXNEXT-2860, but let's wait to make sure and then will close current report.





[ZBX-8995] Housekeeper info in server log is misleading Created: 2014 Nov 05  Updated: 2017 May 30  Resolved: 2015 Jul 07

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Filipp Sudanov (Inactive) Assignee: Unassigned
Resolution: Won't fix Votes: 1
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Currently housekeeper writes the following to the log:

housekeeper [deleted 123 hist/trends, 456 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 26.273207 sec, idle 1 hour(s)]

The misleading thing is "456 items" - it actually means "456 values for deleted items"
May be it could be like:

housekeeper [deleted 123 hist/trends values, 456 values for del. items, 0 events, 0 sessions, 0 alarms, 0 audit items in 26.273207 sec, idle 1 hour(s)]

Also, in proxy log it's saying

 [deleted 789 records in ...

Shouldn't we use "values" instead of "records" ?



 Comments   
Comment by richlv [ 2014 Nov 05 ]

the first part has been discussed. supposedly it is the way it is to keep it short

Comment by Filipp Sudanov (Inactive) [ 2014 Nov 05 ]

Well, it's shorter as it is, but takes much longer for thousands of people to understand what it means.

BTW, do we have any guideline of how short the log messages and other stuff should be? Are we trying to squeeze into 80 characters of http://en.wikipedia.org/wiki/VT100 or we can feel a bit more relaxed?

Comment by richlv [ 2014 Nov 05 ]

i seem to recall length limit coming from the commandline status that we update - andris should know more

Comment by Andris Mednis [ 2014 Nov 05 ]

The text "housekeeper [deleted 123 hist/trends, 456 items, ..." IN LOG FILE and a similar text IN COMMANDLINE STATUS is produced independently (see file src/zabbix_server/housekeeper/housekeeper.c). One can be easily changed without affecting the other.
We do not try to squeeze into 80 characters, a comfortable size for viewing commandline status messages is approximately 190 characters per line. The length limit for "updatable" commandline status is determined at runtime.

Comment by dimir [ 2014 Nov 05 ]

How about combining obsoleted values and values of deleted items?

housekeeper [deleted 579 hist/trends, 0 events, 0 sessions, 0 alarms, 0 audit items in 26.273207 sec, idle 1 hour(s)]

And for consistency probably change "records" to "values" in proxy.

Comment by Filipp Sudanov (Inactive) [ 2014 Nov 05 ]

No, old values and values of deleted items should be separately - then it's easier to see why housekeeper load suddenly increased - because someone deleted items. Otherwise it will not be seen.
Ideally they could be even more detailed, e.g.:

deleted 50000 history values for 100 items, 50000 trend values for 100 items, 50000 values for 100 deleted items

In this way everyone would understand how MaxHousekeeperDelete configuration parameter works.

Comment by richlv [ 2014 Nov 05 ]

having different text in commandline & log would be quite confusing. combining them (obsolete values and values for deleted items) would lose a very important detail.
if we can't clarify it while keeping the text the same, i'd just go with detailed description in the manpage

Comment by Filipp Sudanov (Inactive) [ 2014 Nov 05 ]

Then something like

housekeeper [deleted 50000(100), 50000 (100), 50000 (100) values (items), ...

or

housekeeper [deleted 50000/100, 50000/100, 50000/100 values/items, ...

could be fine, see documentation or man page for decoding.

But it should not say "456 items", when actually it means "values" - this is lie

Comment by Filipp Sudanov (Inactive) [ 2014 Nov 06 ]

Results of today's discussion with Sasha:

Original format:

housekeeper [deleted 1234 hist/trends, 567 items, 0 events, 0 sessions, 0 alarms, 0 audit items in 26.273207 sec, idle 1 hour(s)]

Option one:

housekeeper [deleted 1234 hist/trends, 567 hist/trends (del items), 0 events, 0 sessions, 0 alarms, 0 audit in 26.273207 sec, idle 1 hour(s)]

Option two (length is kept intact!):

housekeeper [deleted 1234(567) hist/trends (del items), 0 events, 0 sessions, 0 alarms, 0 audit in 26.273207 sec, idle 1 hour(s)]

<filipp.sudanov> I would vote for option one - option two may make one to think that 567 is included in 1234.

<richlv> "hist/trends" -> values ?

<filipp.sudanov> values are omitted, but implied. For that reasons, that sessions, alarms, etc - are also sort of values, or records, or something. So we don't write, what was deleted, buy only write from where.

<dimir> +1 for rich's idea and option 1. Also I noticed not very experienced users are often far more comfortable with term "value" rather than "history" or "trends".

<richlv> for the record, i suggested replacing "hist/trends" with plain "values" as that's a bit shorter, and we use word "values" in zabbix context to mean history/trend values usually, not when talking about events etc (it's clear that it could be used, but we haven't done so and it seems to be a nice shorthand to me )

<filipp.sudanov> I'd rather adree with dimir's idea, that novice may even not know that data is stored in places named history and trends, but they know that they have values that are stored and deleted somehow. So this goes as option three:

housekeeper [deleted 1234 values, 567 values (del items), 0 events, 0 sessions, 0 alarms, 0 audit in 26.273207 sec, idle 1 hour]

<richlv> oh, the bikeshedding

housekeeper [deleted 1234 values, 567 del item values, 0 events, 0 sessions, 0 alarms, 0 audit in 26.273207 sec, idle 1 hour]

<dimir> Looks great.

<filipp.sudanov> Yeee, bikeshedding is a great thing. . Just tested above two options on training students - " 567 values (del items)" ir more understandable then "567 del item values" - it's hard to understand what "del" is.

Comment by Alexander Vladishev [ 2015 Jul 07 ]

There is nothing to do. I close the issue.





[ZBX-8949] Possible deadlock on ids table on "housekeeper" row Created: 2014 Oct 24  Updated: 2017 May 30  Resolved: 2015 Jun 30

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F), Server (S)
Affects Version/s: 2.2.7rc1, 2.4.1
Fix Version/s: 2.2.10rc1, 2.4.6rc1, 2.5.0

Type: Incident report Priority: Blocker
Reporter: Alexey Pustovalov Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: deadlock, housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate

 Description   

It can happen because Zabbix frontend and server can use the same table row for updating at the same time:

31628:20141020:094957.796 [Z3005] query failed: [1213] Deadlock found when trying to get lock; try restarting transaction [update ids set nextid=nextid+14 where nodeid=0 and table_name='housekeeper' and field_name='housekeeperid']
zabbix_server [31628]: ERROR [file:db.c,line:999] Something impossible has just happened.

 31616:20141020:094930.468 [Z3005] query failed: [1213] Deadlock found when trying to get lock; try restarting transaction [update ids set nextid=nextid+14 where nodeid=0 and table_name='housekeeper' and field_name='housekeeperid']
zabbix_server [31616]: ERROR [file:db.c,line:999] Something impossible has just happened.


 Comments   
Comment by Andris Zeila [ 2015 May 13 ]

During item removal we are deleting from screens_items (also profiles) table by using non-indexed fields in where clause. With mysql this results in all table records being locked, which can easily lead to deadlocks.

To avoid it we should first select the corresponding identifiers (sreenitemid, profileid) and perform sql delete based on identifiers.

Comment by Andris Zeila [ 2015 May 18 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-8949

Comment by Andris Zeila [ 2015 Jun 04 ]

Backported fixes to 2.2 branch (svn://svn.zabbix.com/branches/dev/ZBX-8949_2.2)

Comment by dimir [ 2015 Jun 04 ]

Here is the scenario wiper proposed (lld).

process1: deletes item1
- item1 is in graph1
- graph1 is not on any screen
- item1 is in screen1 as simple graph

process2: deletes item2
- item2 is in a graph2
- graph2 is in a screen2

What actually happens and in which order:

- process1: delete from screens_items by graphid - NOTHING TO DO
- process1: update ids ("housekeeper") - ids LOCKED
- process2: delete from screens_items by graphid - screens_items LOCKED
- process1: delete from screens_items by itemid - WAIT ON screens_items LOCK
- process2: update ids ("housekeeper") - WAIT ON ids LOCK (deadlock)

In order to organize that I have added the code to remove items from 2 different poller processes. Each removes one item. Also added some sleep() calls to ensure the needed order. This is what could be seen in the server log, so before the fix:

 26549:20150604:124320.090 server #3 started [poller #1]

 26550:20150604:124320.098 server #4 started [poller #2]
 [sleep 2]

 26549:20150604:124320.093 query [txnlev:1] [begin;]
 26549:20150604:124320.095 query [txnlev:1] [update ids set nextid=nextid+7 where table_name='housekeeper' and field_name='housekeeperid']
 [sleep 4]

 26550:20150604:124322.114 query [txnlev:1] [begin;]
 26550:20150604:124322.116 query [txnlev:1] [delete from screens_items where resourcetype=0 and resourceid=547;
 26550:20150604:124322.117 query [txnlev:1] [update ids set nextid=nextid+7 where table_name='housekeeper' and field_name='housekeeperid']

 26549:20150604:124324.096 query [txnlev:1] [delete from screens_items where resourcetype in (3,1) and resourceid=23662;

 26550:20150604:124324.098 query [txnlev:1] [delete from screens_items where resourcetype in (3,1) and resourceid=23663;
 26550:20150604:124324.099 query [txnlev:1] [commit;]

 26549:20150604:124324.137 [Z3005] query failed: [1213] Deadlock found when trying to get lock; try restarting transaction [delete from screens_items where resourcetype in (3,1) and resourceid=23662;
 26549:20150604:124324.137 query [delete from screens_items where resourcetype in (3,1) and resourceid=23662;
 26549:20150604:124324.137 query [txnlev:1] [rollback;]

After the fix:

 13969:20150604:150549.489 server #3 started [poller #1]

 13970:20150604:150549.483 server #4 started [poller #2]
 [sleep 2]

 13969:20150604:150549.576 query [txnlev:1] [begin;]
 13969:20150604:150549.599 query [txnlev:1] [update ids set nextid=nextid+7 where table_name='housekeeper' and field_name='housekeeperid']
 [sleep 4]

 13970:20150604:150551.575 query [txnlev:1] [begin;]
 13970:20150604:150551.578 query [txnlev:1] [delete from screens_items where screenitemid=77;
 13970:20150604:150551.595 query [txnlev:1] [update ids set nextid=nextid+7 where table_name='housekeeper' and field_name='housekeeperid']

 13969:20150604:150553.600 query [txnlev:1] [delete from screens_items where screenitemid=76;
 13969:20150604:150553.600 query [txnlev:1] [commit;]

 13970:20150604:150553.607 query [txnlev:1] [update ids set nextid=nextid+7 where table_name='housekeeper' and field_name='housekeeperid']
 13970:20150604:150553.608 query [txnlev:1] [delete from screens_items where screenitemid=78;
 13970:20150604:150553.608 query [txnlev:1] [commit;]
Comment by dimir [ 2015 Jun 04 ]

Tested. Please review my changes in r53954.

wiper thanks

Comment by Andris Zeila [ 2015 Jun 08 ]

Released in:

  • pre-2.2.10rc1 r53970
  • pre-2.4.6rc1 r53971
  • pre-2.5.0 r53972




[ZBX-8557] Postpone first housekeeper activity from start to 30th minute. Created: 2014 Jul 31  Updated: 2017 May 30  Resolved: 2014 Aug 01

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: None
Fix Version/s: 2.3.4

Type: Incident report Priority: Minor
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: housekeeper, performance
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

When zabbix server/proxy is starts it runs housekeeper which start housekeeping immediately.
This creates troubles in different cases as at server start there are other tasks (like ZBX-8042) and in general server has a lot of work.

It's suggested to postpone first housekeeper activity to 30th minute oh a hour.



 Comments   
Comment by Oleksii Zagorskyi [ 2014 Jul 31 ]

We discussed this idea a bit with Sasha and he was agree in general.

Comment by Marc [ 2014 Jul 31 ]

Should that possibly be configurable?

Comment by Oleksii Zagorskyi [ 2014 Jul 31 ]

Configurable - then it would be closer to other already existing ZBXNEXTs, so it would be duplicate.

Main idea is to simply not start housekeeping in the same time server is starting.

Comment by Alexander Vladishev [ 2014 Jul 31 ]

(1) documentation

arturs.galapovs Done. Please review changes:

asaveljevs The last two pages say that "if your HousekeepingFrequency is configured to 1: the very first housekeeping procedure after server start will run after 1 hour and 30 minutes". That does not seem to be true. Instead, the first housekeeping will be performed after the initial delay of 30 minutes.

When the text for HousekeepingFrequency note is settled upon, it would also be nice to add it to conf/zabbix_server.conf and conf/zabbix_proxy.conf.

Also, it would be nice to replace "start-up" with "startup" - the latter version is what we seem to use in our documentation. REOPENED.

arturs.galapovs Done. Please review updated documentation and changes in configuration files - svn://svn.zabbix.com/branches/dev/ZBX-8557 -r47834

asaveljevs I have updated the following two pages by making the note style consistent with the note above it and also by adding the version where this postponing behavior got introduced:

I have also changed "repeat every hour" to "repeat with one hour delay" for server, because the former is only true for proxy - they have different delay algorithms.

Regarding the configuration files, you added information about the usage of "4xHousekeepingFrequency" algorithm for proxy, but proxy does its housekeeping in a different way and that note is not applicable there. Hence, I have removed it and also changed outdated information in r47845. Please take a look. RESOLVED.

arturs.galapovs Should I remove usage of "4xHousekeepingFrequency" algorithm in https://www.zabbix.com/documentation/2.4/manual/appendix/config/zabbix_proxy as well? NEED INFO

asaveljevs It turns out the proxy follows the same "4xHousekeepingFrequency" approach, but it does not define HK_MAX_DELETE_PERIODS, so I missed it. The note on the proxy page is thus correct, except the proxy acts on different tables - "proxy_history", "proxy_dhistory", "proxy_autoreg_host". So it is incorrect to claim that proxy removes unnecessary information from "history, alert, and alarms tables". Just "proxy history" tables, as mentioned in the amended proxy configuration file, would probably be OK.

asaveljevs You might also wish to bring the "4xHousekeepingFrequency" note back into the proxy configuration file (I removed it in r47845), but correct the mentioned tables.

arturs.galapovs Corrected tables affected by housekeeper - https://www.zabbix.com/documentation/2.4/manual/appendix/config/zabbix_proxy. Updated HousekeepingFrequency option description in proxy configuration file - r47875. RESOLVED

asaveljevs Looks good, except proxy's housekeeper does not depend on history and trend settings, so I have removed their mention - both on the linked page and in the configuration file in r47880. Please take a look. RESOLVED.

arturs.galapovs Looks good. CLOSED

Comment by Alexander Vladishev [ 2014 Jul 31 ]

Quick spec:

  • it will affect housekeepers on server and proxy sides
  • the delay (30 min) will unconfigurable

zalex_ua Could you clarify - the delay is just delay from start, or it's similarly to timer process and as I requested - postponed to 30th minute ?
If it will be just delay - it's great that it doesn't depend on a moment when server was started and we can guarantee the delay for example for building up values cache.

But initially I created this issue because of a fact that trends being calculated at start of every hour and it was an issue in some zabbix installation.

Of course any on mentioned approaches will "decreases guaranteed delay" for another one.

Now, after your quick spec, I'd agree for a just delay (sleep(30m)) as initial value cache filling is more critical task.
Also, housekeeper eventually can work longer than 30 minutes and it easily reaches trends calculation moment.

Do, as you wish.

asaveljevs Currently, after housekeeper has done its work, it sleeps for exactly HousekeepingFrequency hours. Since it might take housekeeper a long time to do its job, its starting time drifts in time. So scheduling the initial check at 30th minute after hour to avoid it coinciding with trend calculation has little value in the current implementation.

zalex_ua Correct! But as Sasha explained me details about trends calculations - 1st calculation after server restart is more hard task (regarding DB load) than all next calculations. So it was actual only for 1st trends calculation.

Comment by Arturs Galapovs (Inactive) [ 2014 Aug 01 ]

Implemented in svn://svn.zabbix.com/branches/dev/ZBX-8557

Comment by Oleksii Zagorskyi [ 2014 Aug 01 ]

So you did it as just (sleep(30m)). Ok.

Comment by Aleksandrs Saveljevs [ 2014 Aug 05 ]

(2) Looks good, but please review r47819 before merging - according to Sasha's suggestion, it replaces STARTUP_IDLE_TIME with HOUSEKEEPER_STARTUP_DELAY in include/common.h, similar to POLLER_DELAY and DISCOVERER_DELAY.

arturs.galapovs Done. CLOSED

Comment by Arturs Galapovs (Inactive) [ 2014 Aug 07 ]

Fixed in pre-2.3.4 (trunk) r47887

Comment by Oleksii Zagorskyi [ 2015 Aug 07 ]

Just to keep things together - I asked Sasha about details how first trends calculation is done, as I got it's:
When db syncer flushes a value to DB - it checks current hour and if it's new hour - db syncer will insert a trend value for passed hour to trend* table.

But for very first trends calculation db syncer needs to perform a select from trend* (to check possibly existing value for previous hour) and then perform an update of the possibly found value in trend* table or an insert if the value is missing.
So there are selects and updates on trend* tables .... which is not easy task.

Item update interval is important here, for example if all items have 60 seconds update interval - we may suppose that all the items will cause select/update|insert at start of next hour.
But in case if items have mostly big update interval (5+ minutes etc) then the task will be more spread on the 5+ minutes period at start of next hour.





[ZBX-7105] very slow query in housekeeper Created: 2013 Oct 07  Updated: 2017 May 30  Resolved: 2013 Oct 23

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.1.7
Fix Version/s: 2.1.9

Type: Incident report Priority: Critical
Reporter: richlv Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: housekeeper, performance, regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

new housekeeper is much more flexible, but it does some selects against events table that are very slow :

select min(clock) from events where source=1

on a not-too-fast test system with 11 million events this query took 2 hours.

mysql> explain extended select min(clock) from events where source=1;
------------------------------------------------------------------------------

id select_type table type possible_keys key key_len ref rows filtered Extra

------------------------------------------------------------------------------

1 SIMPLE events ref events_1 events_1 4 const 5593118 100.00  

------------------------------------------------------------------------------



 Comments   
Comment by Alexander Vladishev [ 2013 Oct 07 ]

Related issues: ZBX-7103, ZBX-6869, ZBX-6763

Comment by Alexander Vladishev [ 2013 Oct 16 ]

Fixed in the development branch svn://svn.zabbix.com/branches/dev/ZBX-7105

Comment by richlv [ 2013 Oct 16 ]

(1) we'll have to describe this in whatsnew, and it would be helpful in general to know what was changed
my guess :

a) patches 2010041 and 2010042 have been removed (they removed & recreated events_1 index);
b) both events indexes have been redone :
events_1 - source,object,objectid,clock
events_2 - source,clock
c) some frontend pages have been changed, so i assume performance in those has been improved ?

  • monitoring -> triggers
  • monitoring -> events
  • monitoring -> events -> event details

sasha Performance of these pages are not improved. Was changed only order:

  • before the change:
    all events was sorted by eventid
  • after the change:
    all events are sorted by clock field

sasha CLOSED. There is nothing to describe in "what's new".

Comment by Andris Zeila [ 2013 Oct 21 ]

[IS] server side tested

sasha The second index was changed in r39483. Please retest the server side.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Oct 22 ]

(2) I suggest we add a source,object,clock index. It may be used by the API.

sasha RESOLVED in r39483.

wiper Database upgrade patch and housekeeper changes reviewed and tested.
CLOSED

Comment by Pavels Jelisejevs (Inactive) [ 2013 Oct 22 ]

(3) I've made some changes in r39485, please review.

sasha REVIEWED

Please review my changes in r39506.

jelisejev CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Oct 22 ]

(4) The object and objectid sort columns must be restored.

sasha RESOLVED in r39507 and r39521.

jelisejev CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Oct 23 ]

(5) The parameters and examples of event.get need to be updated in https://www.zabbix.com/documentation/2.2/manual/api/reference/event/get

sasha RESOLVED

Also updated:

jelisejev CLOSED.

Comment by Pavels Jelisejevs (Inactive) [ 2013 Oct 23 ]

(6) I've fixed ZBX-6389 (7) in r39520.

sasha CLOSED

Comment by Pavels Jelisejevs (Inactive) [ 2013 Oct 24 ]

Frontend TESTED.

Comment by Alexander Vladishev [ 2013 Oct 24 ]

Fixed in version pre-2.1.9 r39547





[ZBX-6716] Zabbix housekeeper fails with mysql binlog errors Created: 2013 Jun 17  Updated: 2017 May 30  Resolved: 2013 Aug 21

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.0.4
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Stephen Wood Assignee: Unassigned
Resolution: Won't fix Votes: 1
Labels: housekeeper, mysql
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

After some sleuthing into why my zabbix server went down today, I realized it had to do with the housekeeper and mysql. The mysql error logs had the following:

130617 18:03:35 [Note] The following warning was suppressed 50 times during the last 33 seconds in the error log
130617 18:03:35 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. The statement is unsafe because it uses a LIMIT clause. This is unsafe because the set of rows included cannot be predicted. Statement: delete from history where itemid=215621 limit 100

I tried first disabled mysql binlogs but that didn't work. The only solution I've found is simply to disable housekeeping by setting "DisableHousekeeping=1" in the zabbix_server.conf

After setting that and restarting Zabbix it seems to be behaving normally.



 Comments   
Comment by Noah Leaman [ 2013 Jun 27 ]

I too am seeing this (v2.0.6). Haven't tried anything to resolve the issue yet.

Comment by Alexei Vladishev [ 2013 Jul 19 ]

I do not think there is something we should fix. MySQL considers it as an unsafe SQL statement. Ok, fine.

Comment by Alexander Vladishev [ 2013 Aug 21 ]

It is already fixed under ZBXNEXT-1649 for Zabbix 2.2.0 (not released yet).

I'm closing the issue.

Comment by Javier [ 2014 May 02 ]

I'm still seeing same error on Zabbix 2.2.3:

140502 14:07:48 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. The statement is unsafe because it uses a LIMIT clause. This is unsafe because the set of rows included cannot be predicted. Statement: delete from history_uint where itemid=30748 limit 500

So, it's really fixed for Zabbix 2.2.0 ?

Thanks.





[ZBX-6542] Document the potential role of HousekeepingFrequency in connection with the range of history/trends cleanups Created: 2013 Apr 26  Updated: 2017 May 30  Resolved: 2014 Apr 15

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Documentation (D)
Affects Version/s: 2.0.6
Fix Version/s: 2.0.14rc1, 2.2.8rc1, 2.4.3rc1, 2.5.0

Type: Incident report Priority: Trivial
Reporter: Marc Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

If I understand delete_history() right it doesn't always delete everything older than the configured history of an item.
It's only done this way if the configured history subtracted from the current time is lower than the time of the oldest history entry added with four times HousekeepingFrequency.

Maybe this should be mentioned in the documentation - on condition that I'm not wrong



 Comments   
Comment by Oleksii Zagorskyi [ 2013 Apr 27 ]

Should be probably fixed together with ZBX-6541

Comment by Oleksii Zagorskyi [ 2013 Apr 27 ]

Also - not everything older than the configured history of an item !
There is one sanity check to save performance if you for example change history/trends from 720 days to 2 days.

It will be deleted not in one go, but in several/many small batches.

Comment by Martins Valkovskis [ 2014 Apr 04 ]

This detail added to documentation:

https://www.zabbix.com/documentation/2.0/manual/appendix/config/zabbix_server
https://www.zabbix.com/documentation/2.2/manual/appendix/config/zabbix_server

(see 'HousekeepingFrequency' parameter description)

Comment by Marc [ 2014 Apr 04 ]

I'm happy with that

Comment by richlv [ 2014 Apr 12 ]

(1) i was considering for some time whether we should add the same info to the config files to make them more useful & be more consistent.
i got to the conclusion that we should

sasha RESOLVED in:

  • svn://svn.zabbix.com/branches/2.0 r44359:44360
  • svn://svn.zabbix.com/branches/2.2 r44361
  • svn://svn.zabbix.com/trunk r44362

please review

martins-v Looking good to me.

<richlv>
a) sorry for nitpicking, but "4xHousekeepingFrequency" doesn't look very readable to me. "4 x HousekeepingFrequency" is a bit strange as well, so i'd suggest to make it "4 times HousekeepingFrequency"
b) while we're at this, there is a line saying "Housekeeping is removing unnecessary information from history, alert, and alarms tables.". we don't have 'alarms' table, and this is missing quite a bunch of tables. we should either list all the tables or just change it to something more generic like "...removing outdated information from the database."

martins-v Introduced to:

(and also the same docs for 2.0, 2,2 and 3.0 branches). Please review. Again, for consistency, this would have to be synced with config files.

sasha Very good! But Zabbix proxy doesn't apply "4 times HousekeepingFrequency" for "each item". It applied for all values in history tables.

.. and "when history periods are greatly reduced" should be replaced by "when configuration parameters ProxyLocalBuffer or ProxyOfflineBuffer are greatly reduced"

martins-v Thanks, fixed according to suggestions: https://www.zabbix.com/documentation/2.4/manual/appendix/config/zabbix_proxy

RESOLVED.

sasha Thank you. CLOSED

Comment by Alexander Vladishev [ 2014 Oct 27 ]

(2) For consistency, this would have to be synced with config files.

sasha RESOLVED directly in 2.0, 2.2, 2.4 and trunk branches in r50247.

(2a) asaveljevs In trunk configuration file for proxy, it says "In this case the period of outdated history...". For server, it says "In this case the period of outdated information...". It should be "information" in both cases.

sasha RESOLVED in r50566.

asaveljevs CLOSED

(2b) asaveljevs Also, let's change "Housekeeper" to "the housekeeper". REOPENED.

sasha RESOLVED in 1.8, 2.0, 2.2, 2.4 and 3.0 documentation pages. Also changed "Housekepper Frequency" to "HousekepperFrequency" in 3.0.

asaveljevs Additionally changed "Housekeeper" to "housekeeper" at https://www.zabbix.com/documentation/2.4/manual/appendix/config/zabbix_server - that seems to have been missed. RESOLVED.

sasha Many thanks! CLOSED





[ZBX-6541] Point out that MaxHousekeeperDelete corresponds to deleted items only Created: 2013 Apr 26  Updated: 2017 May 30  Resolved: 2014 Jan 14

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Documentation (D)
Affects Version/s: 2.0.6
Fix Version/s: 1.8.20, 2.0.11, 2.2.2

Type: Incident report Priority: Trivial
Reporter: Marc Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Before I looked at the source code it was never really clear to me what MaxHousekeeperDelete actually does.
Now, if I understand it right MaxHousekeeperDelete limits only the deletion of data related to already deleted items and isn't considered by the other housekeeping tasks (e.g. removing outdated history and trends of existing items).

It would be nice if the documentation point out more clearly which housekeeping task(s) consider MaxHousekeeperDelete and maybe which tasks don't.



 Comments   
Comment by Oleksii Zagorskyi [ 2013 Apr 27 ]

ZBX-6542 is related

Comment by Martins Valkovskis [ 2014 Jan 14 ]

Documented in:

See 'MaxHousekeeperDelete' parameter.

Comment by Marc [ 2014 Jan 14 ]

Still corresponds to 'tasks' and doesn't point out that it only affects the task of cleaning data from deleted items.
Or did the behavior has changed in the meantime?

Comment by Oleksii Zagorskyi [ 2014 Jan 14 ]

Marc, you probably need to check documentation change diff once more.

I consider issue as resolved.

Comment by Marc [ 2014 Jan 14 ]

My bad!

Tend to stop reading when reading 1.8....
Have to work on that

Comment by Aleksandrs Saveljevs [ 2014 Jan 31 ]

Documentation update looks good.





[ZBX-6493] After upgrade to 2.0.5 (with Oracle DB) housekeeper is always busy at 100% Created: 2013 Apr 16  Updated: 2017 May 30  Resolved: 2013 Apr 30

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.0.5, 2.1.0
Fix Version/s: 2.0.7rc1, 2.1.0

Type: Incident report Priority: Blocker
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: housekeeper, oracle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File zab_int_proc_after_2.0.5_patch.png     PNG File zab_queue_after_2.0.5_patch.png    
Issue Links:
Duplicate

 Description   

2.0.5 housekeeper with oracle is always 100% busy - days+ ! and newer drops back to 0%
Small part of housekeeper strace with 2.0.5 you can see here http://pastebin.com/9tERRGUE

After one week with 2.0.5 I downgraded to 2.0.4 and it solved the problem immediately.

Last day housekeeper deletes 4 times more records as usually (there is special code for this).

Today morning I tried 2.0.5 sources BUT without changes from ZBX-5920 (thanks dimir for preparing tar.gz)
And housekeeper is working as it should !

So I suppose the ZBX-5920 introduced some side effect for housekeeper on Oracle.



 Comments   
Comment by Oleksii Zagorskyi [ 2013 Apr 16 ]

How zabbix_server.log looks (from another log file monitoring, with reverted order) :

16.Апр.2013 14:10:41	16.Апр.2013 14:11:16	27739:20130416:141116.189 housekeeper deleted: 9364938 records from history and trends, 0 records of deleted items, 0 events, 0 alerts, 0 sessions
16.Апр.2013 13:05:40	16.Апр.2013 13:04:38	27739:20130416:130438.786 executing housekeeper
16.Апр.2013 13:05:40	16.Апр.2013 13:04:38	27739:20130416:130438.786 server #65 started [housekeeper #1]
>>> upgrade to 2.0.5 without ZBX-5920
16.Апр.2013 12:00:41	16.Апр.2013 11:58:53	5370:20130416:115853.016 housekeeper deleted: 9468885 records from history and trends, 0 records of deleted items, 0 events, 0 alerts, 0 sessions
16.Апр.2013 10:55:41	16.Апр.2013 10:56:50	5370:20130416:105650.863 executing housekeeper
16.Апр.2013 07:55:39	16.Апр.2013 07:55:53	5370:20130416:075553.847 housekeeper deleted: 9446067 records from history and trends, 0 records of deleted items, 0 events, 0 alerts, 0 sessions
16.Апр.2013 07:20:39	16.Апр.2013 07:21:37	5370:20130416:072137.771 executing housekeeper
16.Апр.2013 04:20:38	16.Апр.2013 04:20:31	5370:20130416:042031.821 housekeeper deleted: 9462472 records from history and trends, 0 records of deleted items, 0 events, 0 alerts, 0 sessions
16.Апр.2013 03:35:38	16.Апр.2013 03:37:37	5370:20130416:033737.843 executing housekeeper
16.Апр.2013 00:35:38	16.Апр.2013 00:36:41	5370:20130416:003641.820 housekeeper deleted: 9518184 records from history and trends, 316 records of deleted items, 0 events, 0 alerts, 0 sessions
15.Апр.2013 23:55:38	15.Апр.2013 23:53:34	5370:20130415:235334.072 executing housekeeper
15.Апр.2013 20:50:37	15.Апр.2013 20:52:43	5370:20130415:205243.417 housekeeper deleted: 9485817 records from history and trends, 3000 records of deleted items, 0 events, 0 alerts, 0 sessions
15.Апр.2013 20:20:37	15.Апр.2013 20:23:07	5370:20130415:202307.051 executing housekeeper
15.Апр.2013 17:20:36	15.Апр.2013 17:22:20	5370:20130415:172220.977 housekeeper deleted: 9520144 records from history and trends, 9399 records of deleted items, 0 events, 0 alerts, 0 sessions
15.Апр.2013 16:40:36	15.Апр.2013 16:38:16	5370:20130415:163816.000 executing housekeeper
15.Апр.2013 13:35:33	15.Апр.2013 13:37:13	5370:20130415:133713.758 housekeeper deleted: 8330024 records from history and trends, 23350 records of deleted items, 0 events, 0 alerts, 0 sessions
15.Апр.2013 13:05:32	15.Апр.2013 13:04:40	5370:20130415:130440.870 executing housekeeper
15.Апр.2013 13:05:32	15.Апр.2013 13:04:40	5370:20130415:130440.870 server #65 started [housekeeper #1]
>>> downgrade to 2.0.4
12.Апр.2013 14:25:18	12.Апр.2013 14:25:12	19212:20130412:142512.326 executing housekeeper
12.Апр.2013 14:25:18	12.Апр.2013 14:25:12	19212:20130412:142512.325 server #65 started [housekeeper #1]
10.Апр.2013 12:57:18	10.Апр.2013 12:58:39	17685:20130410:125839.758 executing housekeeper
10.Апр.2013 12:57:18	10.Апр.2013 12:58:39	17685:20130410:125839.758 server #65 started [housekeeper #1]
08.Апр.2013 08:54:15	08.Апр.2013 08:53:58	16753:20130408:085358.122 executing housekeeper
08.Апр.2013 08:54:15	08.Апр.2013 08:53:58	16753:20130408:085358.122 server #65 started [housekeeper #1]
06.Апр.2013 12:34:07	06.Апр.2013 12:31:54	18502:20130406:123154.750 executing housekeeper
06.Апр.2013 12:34:07	06.Апр.2013 12:31:54	18502:20130406:123154.749 server #65 started [housekeeper #1]
>>> upgrade to   2.0.5
06.Апр.2013 11:04:35	06.Апр.2013 11:04:44	11247:20130406:110444.834 housekeeper deleted: 2461891 records from history and trends, 0 records of deleted items, 0 events, 0 alerts, 0 sessions
06.Апр.2013 10:54:35	06.Апр.2013 10:56:33	11247:20130406:105633.762 executing housekeeper
06.Апр.2013 07:54:31	06.Апр.2013 07:55:24	11247:20130406:075524.761 housekeeper deleted: 2452628 records from history and trends, 0 records of deleted items, 0 events, 0 alerts, 0 sessions
06.Апр.2013 07:44:31	06.Апр.2013 07:46:42	11247:20130406:074642.105 executing housekeeper
06.Апр.2013 04:44:29	06.Апр.2013 04:45:36	11247:20130406:044536.642 housekeeper deleted: 2452794 records from history and trends, 0 records of deleted items, 0 events, 0 alerts, 0 sessions
06.Апр.2013 04:39:29	06.Апр.2013 04:37:31	11247:20130406:043731.563 executing housekeeper
  • During 2.0.5 release usage I observed that count of records in "housekeeper" table did not changes and I know that this table processed after cleaning outdated data (examples in ZBX-4298)
  • Also, now housekeeper deleted 4 times more data

So I suppose that on 2.0.5 housekeeper didn't delete almost? anything.

Comment by Oleksii Zagorskyi [ 2013 Apr 17 ]

Just in case - strace of working housekeeper on the modified 2.0.5 without ZBX-5920
http://pastebin.com/TkQL4qcn

Comment by dimir [ 2013 Apr 17 ]

Simple case to reproduce:

while (1)
{
        result = DBselect("select 1 from hosts where hostid is null");
        DBfree_result(result);
}

with prefetch (2 MB):

  • zabbix_server process cpu usage is constantly 96-100 %

without prefetch:

  • zabbix_server process cpu usage is constantly 6-9 %
Comment by dimir [ 2013 Apr 18 ]

Changing prefetching from memory based (2 MB) to row count based decreases CPU usage:

100 rows: ~7 %
200 rows: ~7 %
300 rows: ~10 %
500 rows: ~17 %

Besed on those measurments it seems that 200 rows is pretty good candidate.

Comment by Oleksii Zagorskyi [ 2013 Apr 19 ]

On the graphs attached you can see difference between 204 and 205 with latest dimir's patch.
Here is several copy-paste from server log:

 32004:20130419:101627.484 In DCsync_configuration()
 32004:20130419:101628.513 DCsync_configuration() config     : sql:0.002860 sync:0.000034 sec.
 32004:20130419:101628.513 DCsync_configuration() items      : sql:0.003365 sync:0.594233 sec.
 32004:20130419:101628.513 DCsync_configuration() triggers   : sql:0.084339 sync:0.129877 sec.
 32004:20130419:101628.513 DCsync_configuration() trigdeps   : sql:0.001640 sync:0.002999 sec.
 32004:20130419:101628.513 DCsync_configuration() functions  : sql:0.025511 sync:0.158740 sec.
 32004:20130419:101628.513 DCsync_configuration() hosts      : sql:0.002165 sync:0.007971 sec.
 32004:20130419:101628.513 DCsync_configuration() templates  : sql:0.001424 sync:0.003118 sec.
 32004:20130419:101628.513 DCsync_configuration() globmacros : sql:0.000835 sync:0.000018 sec.
 32004:20130419:101628.513 DCsync_configuration() hostmacros : sql:0.000704 sync:0.000007 sec.
 32004:20130419:101628.513 DCsync_configuration() interfaces : sql:0.001616 sync:0.007051 sec.
 32004:20130419:101628.513 DCsync_configuration() total sync : 0.904048 sec.
 32004:20130419:101628.513 DCsync_configuration() total      : 1.028507 sec.
 32004:20130419:101628.513 End of DCsync_configuration()
...
 32004:20130419:101730.144 In DCsync_configuration()
 32004:20130419:101731.137 DCsync_configuration() config     : sql:0.002176 sync:0.000034 sec.
 32004:20130419:101731.137 DCsync_configuration() items      : sql:0.003458 sync:0.560860 sec.
 32004:20130419:101731.137 DCsync_configuration() triggers   : sql:0.088649 sync:0.120709 sec.
 32004:20130419:101731.137 DCsync_configuration() trigdeps   : sql:0.001391 sync:0.002424 sec.
 32004:20130419:101731.137 DCsync_configuration() functions  : sql:0.027063 sync:0.161773 sec.
 32004:20130419:101731.137 DCsync_configuration() hosts      : sql:0.002222 sync:0.008198 sec.
 32004:20130419:101731.137 DCsync_configuration() templates  : sql:0.001452 sync:0.003387 sec.
 32004:20130419:101731.137 DCsync_configuration() globmacros : sql:0.000856 sync:0.000018 sec.
 32004:20130419:101731.137 DCsync_configuration() hostmacros : sql:0.000736 sync:0.000006 sec.
 32004:20130419:101731.137 DCsync_configuration() interfaces : sql:0.001477 sync:0.005783 sec.
 32004:20130419:101731.137 DCsync_configuration() total sync : 0.863194 sec.
 32004:20130419:101731.138 DCsync_configuration() total      : 0.992673 sec.
 32004:20130419:101731.138 End of DCsync_configuration()
...
 32004:20130419:101831.754 In DCsync_configuration()
 32004:20130419:101833.006 DCsync_configuration() config     : sql:0.002465 sync:0.000021 sec.
 32004:20130419:101833.006 DCsync_configuration() items      : sql:0.003598 sync:0.808478 sec.
 32004:20130419:101833.006 DCsync_configuration() triggers   : sql:0.074323 sync:0.131056 sec.
 32004:20130419:101833.006 DCsync_configuration() trigdeps   : sql:0.001218 sync:0.003015 sec.
 32004:20130419:101833.006 DCsync_configuration() functions  : sql:0.026347 sync:0.172022 sec.
 32004:20130419:101833.006 DCsync_configuration() hosts      : sql:0.002403 sync:0.009451 sec.
 32004:20130419:101833.006 DCsync_configuration() templates  : sql:0.001491 sync:0.003945 sec.
 32004:20130419:101833.006 DCsync_configuration() globmacros : sql:0.000851 sync:0.000030 sec.
 32004:20130419:101833.006 DCsync_configuration() hostmacros : sql:0.001537 sync:0.000010 sec.
 32004:20130419:101833.006 DCsync_configuration() interfaces : sql:0.001809 sync:0.008543 sec.
 32004:20130419:101833.006 DCsync_configuration() total sync : 1.136571 sec.
 32004:20130419:101833.006 DCsync_configuration() total      : 1.252614 sec.
 32004:20130419:101833.007 End of DCsync_configuration()
Comment by richlv [ 2013 Apr 19 ]

(1) added as a whatsnew item for 2.0.7; to be reviewed when the issue is finished
https://www.zabbix.com/documentation/2.0/manual/introduction/whatsnew207#improved_performance_with_oracle

zalex_ua hmm, are you sure that it will not be included in 2.0.6 ?

Comment by dimir [ 2013 Apr 19 ]

I do not see any speed difference of simple selects that return single value when 200 rows prefetch is enabled or disabled:

-----------------------------------------------------------

for (itemid = 1; itemid < 100001; itemid++)
{
        result = DBselect("select 1 from dual");
        DBfree_result(result);
}

prefetching disabled
100000 'select 1 from dual' selects took 101.193656 seconds

prefetching enabled
100000 'select 1 from dual' selects took 101.422586 seconds

-----------------------------------------------------------

for (itemid = 1; itemid < 100001; itemid++)
{
        result = DBselect("select min(clock) from history where itemid=%d", itemid);
        DBfree_result(result);
}

prefetching disabled
100000 'select min(clock) from history where itemid=n' selects took 202.616726 seconds

prefetching enabled
100000 'select min(clock) from history where itemid=n' selects took 202.443773 seconds

Comment by dimir [ 2013 Apr 30 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-6493 .

Comment by Andris Zeila [ 2013 May 02 ]

Reviewed, successfully tested. Please review my changes in r35378

dimir Looks good!

Comment by dimir [ 2013 May 02 ]

Fixed in pre-2.0.7 r35379, pre-2.1.0 r35380 .

Prefetching when working with Oracle is needed because otherwise it fetches only 1 row at a time when doing selects (default behavior). There are 2 ways to do prefetching: memory based and rows based.

Based on the study optimal (speed-wise) memory based prefetch is 2 MB. But in case of many subsequent selects CPU usage jumps up to 100 %. Using rows prefetch with up to 200 rows does not affect CPU usage, it is the same as without prefetching at all.

Before this fix:

  • 2 MB memory based prefetch was used with Oracle for selects
  • in case of big database housekeeper hits 100 % CPU usage

After this fix:

  • 200 rows prefetch is used with Oracle for selects
  • in case of big database CPU usage is not affected by housekeeper

More info:
http://docs.oracle.com/cd/B28359_01/appdev.111/b28395/oci04sql.htm





[ZBX-6297] History and Trends data keeps growing Created: 2013 Feb 20  Updated: 2017 May 30  Resolved: 2013 Feb 20

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 1.8.15
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: john matchett Assignee: Unassigned
Resolution: Won't fix Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

RHEL 6.3 Postgres 9.0



 Description   

Originally had History set to 365days .I noticed Database was over 23G in size and changed history to 2 days. Housekkeeper does not seem to be doing its job and removing old history data.
How can I verify Housekeeper runs t o automaticly correct the
Is there also a manual way to correct this?

Here is my zabbix status showing the number of items and triggers.
Zabbix server is running Yes localhost:10051
Number of hosts (monitored/not monitored/templates) 130 84 / 7 / 39
Number of items (monitored/disabled/not supported) 7065 7000 / 19 / 46
Number of triggers (enabled/disabled)[problem/unknown/ok] 6372 6372 / 0 [168 / 5281 / 923]
Number of users (online) 5 2
Required server performance, new values per second 124.75 -



 Comments   
Comment by richlv [ 2013 Feb 20 ]

unless disabled on purpose, housekeeper most likely runs and the issue is with the db not releasing used diskspace.
please use zabbix forums, irc and other channels for community support

Comment by richlv [ 2013 Feb 20 ]

reopen to change resolution





[ZBX-6089] Increase the housekeeper limit for deleting items (or make it a config variable) Created: 2013 Jan 09  Updated: 2017 May 30  Resolved: 2013 Jan 14

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.0.4
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Mattias Geniar Assignee: Unassigned
Resolution: Won't fix Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

In src/zabbix_server/housekeeper/housekeeper.c the following is defined on line 236:
min_clock = MIN(now - keep_history * SEC_PER_DAY, min_clock + 4 * CONFIG_HOUSEKEEPING_FREQUENCY * SEC_PER_HOUR);

This has a drawback: if you change an item from the default "keep history in days" of 60 to, say, 2, it will take a very long time for all the old data to be cleared. I suggest to change that to:
min_clock = MIN(now - keep_history * SEC_PER_DAY, min_clock + 32 * CONFIG_HOUSEKEEPING_FREQUENCY * SEC_PER_HOUR);

In the current code, if you have hourly housekeeping running, it'll only delete the "currently lowest clock value + 4 hours" worth of data. If changing the History-retention from 60 days to 2, this takes a very long time

Either the hard-coded value of 4 should be config parameter with sufficient warnings about the possibility of table- and transaction logs, or the default value should be increased slightly just to speed up the clearing of old history.



 Comments   
Comment by Alexei Vladishev [ 2013 Jan 14 ]

The existing constant is a trade off between influence on database performance and speed of removal of older historical data. I believe the existing value is fine for the most of setups and shouldn't be changed.

In the same time I think that we could make it configurable to address various needs Zabbix users may have in the future.

Anyway I am closing it.





[ZBX-6065] housekeeper does seemingly useless deletes Created: 2013 Jan 07  Updated: 2017 May 30  Resolved: 2013 Jan 08

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: 2.1.0

Type: Incident report Priority: Minor
Reporter: richlv Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

2.0 branch rev 32487



 Description   

starting on a fresh system, zabbix housekeeper still does execute a large amount of delete queries. excerpt from debug log :

5374:20130107:151212.512 In delete_history() table:'trends' itemid:30668 keep_history:365 now:1357564317
5374:20130107:151212.512 query [txnlev:0] [select min(clock) from trends where itemid=30668]
5374:20130107:151212.512 query without transaction detected
5374:20130107:151212.512 query [txnlev:0] [delete from trends where itemid=30668 and clock<1326028317]

as can be seen, first a select is made, and even though we do not have any data that should be deleted, a delete query is executed.

also, for every item warning "query without transaction detected" is printed, which can't be good



 Comments   
Comment by Alexander Vladishev [ 2013 Jan 07 ]

Rich,

I cannot reproduce it on my system. What database you use?

Fresh database doesn't have item with id=30668.

Comment by Alexander Vladishev [ 2013 Jan 08 ]

Fixed in the development branch svn://svn.zabbix.com/branches/dev/ZBX-6065

Comment by dimir [ 2013 Jan 09 ]

Successfully tested!

Comment by Alexander Vladishev [ 2013 Jan 11 ]

Fixed in version pre-2.1.0 r32688.





[ZBX-5887] Housekeeper delete_history function does not respect MaxHousekeeperDelete - in slow databases results in broken DM sync Created: 2012 Nov 23  Updated: 2017 Oct 17  Resolved: 2017 Oct 17

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.0.3
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Johan Venter Assignee: Unassigned
Resolution: Unsupported version Votes: 0
Labels: dm, housekeeper, patch
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Hand-compiled Zabbix 1.8.12, Ubuntu 12.04, MySQL 5.5.28

Master runs in KVM VM with 16 GB RAM, 8 vCPU cores, all virtio disks/networking with cache='writeback' on disks for performance.

MySQL tweaks include:
innodb_buffer_pool_size = 10G
innodb_log_buffer_size = 10M
innodb_log_file_size = 2G
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT


Attachments: Text File housekeeper.c.patch    

 Description   

We have a 1 master, 3 slave node DM configuration and I noticed that some data from one of the slave nodes was simply not being synced - it was consistently trying to sent the same ~300K of history_sync data over and over again.

After much investigation I discovered that the "INSERT INTO history ..." query of the sync process was unable to proceed due to long held row logs by the housekeeper performing a "DELETE FROM history WHERE itemid = ___ AND clock < ____". These queries were taking sometimes multiple 1000s of seconds to complete (we have about 40G of data in our master's history table) and holding the row locks much longer than the innodb_lock_wait_timeout default of 50 seconds - even increasing this to 10 or 15 minutes was not enough.

I discovered the MaxHousekeeperDelete option of the Zabbix Server and implemented that at 1000 as our MySQL instance seems OK with LIMIT 1000 on the history table deletes (they complete anywhere between 0 and 5 seconds).

However, the housekeeper delete_history function does not respect MaxHousekeeperDelete, only the housekeeping_cleanup function checks for it.

After applying the attached patch we now have working DM sync and while the housekeeper is probably taking longer to clean up it's no longer keeping rows locked for long periods and preventing other queries on history from running.

The patch works with MySQL however doesn't implement the alternative limiting queries for other database engines such as PostgreSQL or Oracle, but it certainly solved our problem and hopefully brings to light a potential issue of the housekeeping process for resolution in future versions.

The patch was against 1.8.12, but it doesn't look like the code is any different in 2.0.3.



 Comments   
Comment by richlv [ 2012 Nov 23 ]

discussed in http://www.zabbix.com/forum/showthread.php?t=24983





[ZBX-5862] DELETE query from housekeeper table is too long on Oracle database Created: 2012 Nov 16  Updated: 2017 May 30  Resolved: 2013 Jan 03

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 2.0.4rc1, 2.1.0
Fix Version/s: 2.0.5rc1, 2.1.0

Type: Incident report Priority: Critical
Reporter: Alexey Pustovalov Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: housekeeper, oracle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Oracle


Attachments: Text File oracle_error.log     File zbx-5862-patch.bz2    
Issue Links:
Duplicate
is duplicated by ZBX-5949 SELECT query is too long on Oracle da... Closed
is duplicated by ZBX-4190 Need to break apart the compounding o... Closed

 Description   

46133:20121116:172221.922 [Z3005] query failed: [-1] ORA-00913: too many values [delete from housekeeper where (housekeeperid in (1001000000223308,1001000000223309,1001000000223310,...1001000000330563,1001000000330565,1001000000330566,1001000000330567,1001000000330568))]

Query length is 1729345 characters.



 Comments   
Comment by Andris Mednis [ 2012 Nov 30 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-5862

Comment by richlv [ 2012 Nov 30 ]

does the change that was mentioned in commit message in rev 31795 affect all databases ?

Comment by Andris Mednis [ 2012 Nov 30 ]

Yes, the change is not particular to Oracle. It should be carefully tested on different databases as the changed DBadd_condition_alloc() function is used in many places and impact on performance is not yet known.

Comment by Alexey Pustovalov [ 2012 Dec 06 ]

(7) Frontend should use similar algorithm also.

Eduards RESOLVED

sasha CLOSED

Comment by Alexander Vladishev [ 2012 Dec 07 ]

Server side is successfully tested. Please review my changes in r31960.
andris Reviewed r31960.

Comment by Eduards Samersovs (Inactive) [ 2012 Dec 13 ]

Frontend fixed in same development branch svn://svn.zabbix.com/branches/dev/ZBX-5862

Comment by Alexey Fukalov [ 2012 Dec 14 ]

(8)
I added tests for dbConditionInt function, one is not passed.

Eduards RESOLVED, please retest with phpunit

sasha CLOSED

Comment by Alexey Fukalov [ 2012 Dec 14 ]

(9)
Seems dbConditionInt can be used in more places in Caction.php, CDrule.php and maybe more, where ids are used.

Eduards Varchar fields will use old DbCondition function.

Vedmak CLOSED

Comment by Alexey Fukalov [ 2012 Dec 14 ]

(10)
In CItem.php:973 function has 4 params, but it supports only 3.

Eduards RESOLVED

Vedmak CLOSED

Comment by Alexey Fukalov [ 2012 Dec 14 ]

(11)
In class.frontendsetup.php:454 seems wrong line inserted.

Eduards RESOLVED

Vedmak CLOSED

Comment by Alexey Pustovalov [ 2012 Dec 16 ]

(12)
dbConditionInt and dbCondition should return field = value (field != value) and with quotes for string values for one value.

Eduards RESOLVED

Vedmak CLOSED

Comment by Alexey Fukalov [ 2012 Dec 17 ]

(13)
I added more tests, some fail.

Eduards Very good! 2 fail tests fixed

Vedmak CLOSED

Comment by Alexey Fukalov [ 2012 Dec 18 ]

(14)
There are some places with no space before Dbcondition calls, it results in incorrect sql

' WHERE'.dbConditionInt('hostid', $allids).

Eduards RESOLVED r.32185

Vedmak CLOSED

Comment by Eduards Samersovs (Inactive) [ 2012 Dec 18 ]

Fixed in versions pre-2.0.5rc1 r32192

Comment by Eduards Samersovs (Inactive) [ 2012 Dec 18 ]

Fixed in versions Fixed in versions pre-2.1.0 (beta) r32218, pre-2.0.5rc1 r32192

Comment by Eduards Samersovs (Inactive) [ 2012 Dec 20 ]

Fixed in re-created development branch svn://svn.zabbix.com/branches/dev/ZBX-5862

Comment by Eduards Samersovs (Inactive) [ 2013 Jan 03 ]

Tested!

Comment by Alexander Vladishev [ 2013 Jan 03 ]

Fixed in versions pre-2.0.5 r32404 and pre-2.1.0 (trunk) r32405.





[ZBX-5240] server shuts down if housekeeper can not access the db Created: 2012 Jun 25  Updated: 2017 May 30  Resolved: 2012 Aug 23

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 2.0.0
Fix Version/s: 2.0.3rc1, 2.1.0

Type: Incident report Priority: Major
Reporter: richlv Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: crash, database, housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

when a db disappears, server works for a while, trying to survive. that succeeds, but then housekeeper starts, can not connect to the db & dies

5148:20120619:104339.372 server #16 started housekeeper #1
5131:20120619:104339.377 server #0 started [main process]
...
5157:20120625:125317.955 Sending configuration data to proxy 'proxy.company.local'. Datalen 28616
5134:20120625:125413.663 [Z3001] connection to database 'zabbix' failed: [2005] Unknown MySQL server host 'mysql.company.local' (1)
5134:20120625:125413.663 watchdog: database is down
5134:20120625:125527.660 [Z3001] connection to database 'zabbix' failed: [2005] Unknown MySQL server host 'mysql.company.local' (1)
5134:20120625:125527.660 watchdog: database is down
5134:20120625:125641.658 [Z3001] connection to database 'zabbix' failed: [2005] Unknown MySQL server host 'mysql.company.local' (1)
5134:20120625:125641.658 watchdog: database is down
5134:20120625:125755.655 [Z3001] connection to database 'zabbix' failed: [2005] Unknown MySQL server host 'mysql.company.local' (1)
5134:20120625:125755.655 watchdog: database is down
5157:20120625:125817.433 Sending configuration data to proxy 'proxy.company.local'. Datalen 28616
5134:20120625:125909.653 [Z3001] connection to database 'zabbix' failed: [2005] Unknown MySQL server host 'mysql.company.local' (1)
5134:20120625:125909.653 watchdog: database is down
5148:20120625:125932.691 executing housekeeper
5148:20120625:125946.652 [Z3001] connection to database 'zabbix' failed: [2005] Unknown MySQL server host 'mysql.company.local' (1)
5148:20120625:125946.652 Cannot connect to the database. Exiting...
5131:20120625:125946.653 One child process died (PID:5148,exitcode/signal:255). Exiting ...
5131:20120625:130002.651 [Z3001] connection to database 'zabbix' failed: [2005] Unknown MySQL server host 'mysql.company.local' (1)
5131:20120625:130002.651 Cannot connect to the database. Exiting...

related question : how does proxy data sending work in this case ? is it present in server config cache ?
is it the same for active and passive proxies (this one is passive) ?



 Comments   
Comment by Andris Mednis [ 2012 Aug 23 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-5240

Comment by dimir [ 2012 Aug 28 ]

Successfully tested. The MySQL error

[2005] Unknown MySQL server host

is handled properly. Please review my small logging and code formatting changes in proxy housekeeper (r29911).

<Andris>Thanks for testing! I reviewed your changes and agree.

Comment by Andris Mednis [ 2012 Aug 29 ]

Fixed in versions pre-2.0.3 rev. 29915 and pre-2.1.0 rev.29916.





[ZBX-5059] Usage of Housekeeper keeps 100% for a long time util restarting zabbix server Created: 2012 May 24  Updated: 2017 May 30  Resolved: 2015 Feb 23

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 1.9.9 (beta)
Fix Version/s: None

Type: Incident report Priority: Major
Reporter: Guoyu Li Assignee: Unassigned
Resolution: Won't fix Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

App: redhat-release-5Server-5.5.0.2/8 CPU/16GBRAM
DB: redhat-release-5Server-5.5.0.2/16CPU/32GBRAM/Oracle 11g

Zabbix 1.9.7
Number of hosts (monitored/not monitored/templates) 103 / 0 / 61
Number of items (monitored/disabled/not supported) 4069 / 1119 / 57
Number of triggers (enabled/disabled)[problem/unknown/ok] 104 / 0 [0 / 0 / 104]
Required server performance, new values per second 96.25 -


Attachments: PNG File zabbix_app_cpu.png     PNG File zabbix_app_network.png     PNG File zabbix_db_cpu.png     PNG File zabbix_db_network.png     PNG File zabbix_poller.png    
Issue Links:
Duplicate

 Description   

The usage of Zabbix housekeeper was pegged at 100% for a long time in our Zabbix server, util we restarted Zabbix server.
But from the AWR report we took, it seems the most busy queries don't come from housekeeper, and we didn't find any queries which come from housekeeper in AWR top queries too. Also, we didn't see any peak of CPU utilization of Zabbix Server and Database when the usage of housekeeper is 100%, nor network usage(our Oracle data file is NFS).

Please see the attachment for details



 Comments   
Comment by Guoyu Li [ 2012 May 24 ]

Feel free to let me know if you need any further information.

Comment by Alexei Vladishev [ 2012 Jun 13 ]

What do you see on MySQL side? What queries are running there?

Comment by Guoyu Li [ 2012 Jun 14 ]

I'm using Oracle 11gR1 instead of MySQL. I can't tell what queries are running there, but from the IO usage and CPU usage of DB server, there is not more pressure on DB Server.

Comment by Alexei Vladishev [ 2014 Mar 28 ]

Please check if the issue can be reproduced with the latest 2.2.x.

Comment by Alexander Vladishev [ 2015 Feb 23 ]

Some Housekeeper improvements was implemented in version 2.2.0 under ZBXNEXT-1649.

I close the issue.





[ZBX-4322] sessions table not cleared Created: 2011 Nov 06  Updated: 2017 May 30  Resolved: 2012 Feb 25

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Frontend (F)
Affects Version/s: 1.8.8
Fix Version/s: None

Type: Incident report Priority: Minor
Reporter: Robert Jerzak Assignee: Unassigned
Resolution: Won't fix Votes: 0
Labels: database, housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Table 'sessions' is not properly cleared:

> select count from sessions;
----------

count

----------

2331717

----------

Oldest record in our system is from 06 Nov 2010:

> select * from sessions order by lastaccess limit 2;
--------------------------------------------------------+

sessionid userid lastaccess status

--------------------------------------------------------+

6b8b53fc8a067e724098c816e9ae69d1 19 1289071306 0
1572001ce4c8bd6e34af34671d99e11f 19 1289071500 0

--------------------------------------------------------+

Is there any place to configure clearing sessions behavior?



 Comments   
Comment by richlv [ 2011 Nov 06 ]

what is auto-logout set to for user with id 19, for example ?

Comment by Robert Jerzak [ 2011 Nov 06 ]

No, user with id 19 has no option 'Auto-logout' enabled.

Does it mean that sessions for this user are never expired?

Comment by Igor Danoshaites (Inactive) [ 2011 Nov 15 ]

Hi,

According to the source code, Zabbix housekeeper should delete session that are older than 1 year, not depending on the settings for the "Auto-logout" option.

<zalex> I can confirm this behavior:
Configuration:
mysql> SELECT alert_history, event_history FROM config;
----------------------------+

alert_history event_history

----------------------------+

2 2

----------------------------+

Debuglog:
21142:20111127:115348.681 In housekeeping_sessions() now:1322387614
21142:20111127:115348.681 query without transaction detected
21142:20111127:115348.681 query [txnlev:0] [delete from sessions where lastaccess<1290851614]
21142:20111127:115348.682 deleted 0 records from table 'sessions'
21142:20111127:115348.682 End of housekeeping_sessions():0

Comment by Alexei Vladishev [ 2012 Feb 25 ]

I am closing it, nothing to fix here.





[ZBX-4298] No messages in the LEVEL_WARNING log about deleted values by housekeeper (table "housekeeper") Created: 2011 Oct 31  Updated: 2017 May 30  Resolved: 2011 Dec 19

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 1.8.8
Fix Version/s: 1.8.10, 1.8.11, 1.9.9 (beta)

Type: Incident report Priority: Minor
Reporter: Oleksii Zagorskyi Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: housekeeper
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

DebugLevel=3



 Description   

When zabbix_server works with the DebugLevel=3 it reports to the log this message:
"Deleted <NNN> records from history and trends"

But here are not included values deleted by housekeeper for the items already deleted from configuration (table "housekeeper").
1. This lack of information could confuse when troubleshooting of DB performance.
2. It's not consistent that the housekeeper is reporting count of the deleted outdated values but does not report count of values for items already deleted from configuration.

Try to imagine a situation when some user has performed "Unlink and clear" action with some big template linked to the several hosts and in the nearest hour he felt some problem with the DB performance and he thinks - what happened ?

Here is part of DebugLevel=4 (I added several EOL for better view):

15500:20111031:150250.029 End of housekeeping_history_and_trends():0

15500:20111031:150250.029 Deleted 0 records from history and trends

15500:20111031:150250.029 In housekeeping_process_log()
15500:20111031:150250.029 query [txnlev:0] [select housekeeperid,tablename,field,value from housekeeper order by table name]
15500:20111031:150250.029 query without transaction detected

15500:20111031:150250.029 query [txnlev:0] [delete from history where itemid=22578 limit 100]
15500:20111031:150250.044 deleted 100 records from table 'history'

15500:20111031:150250.044 query without transaction detected

15500:20111031:150250.044 query [txnlev:0] [delete from history_uint where itemid=22577 limit 100]
15500:20111031:150250.060 deleted 100 records from table 'history_uint'

15500:20111031:150250.060 End of housekeeping_process_log():SUCCEED
15500:20111031:150250.060 In housekeeping_events() now:1320062561
15500:20111031:150250.060 query [txnlev:0] [select event_history from config]
15500:20111031:150250.060 query [txnlev:0] [select eventid from events where clock<1288526561]
15500:20111031:150250.060 End of housekeeping_events():SUCCEED

15500:20111031:150250.060 In housekeeping_alerts() now:1320062561
15500:20111031:150250.060 query [txnlev:0] [select alert_history from config]
15500:20111031:150250.060 query without transaction detected
15500:20111031:150250.060 query [txnlev:0] [delete from alerts where clock<1288526561]
15500:20111031:150250.060 deleted 0 records from table 'alerts'
15500:20111031:150250.060 End of housekeeping_alerts():SUCCEED

15500:20111031:150250.060 In housekeeping_sessions() now:1320062561
15500:20111031:150250.060 query without transaction detected
15500:20111031:150250.060 query [txnlev:0] [delete from sessions where lastaccess<1288526561]
15500:20111031:150250.061 deleted 0 records from table 'sessions'
15500:20111031:150250.061 End of housekeeping_sessions():SUCCEED
15500:20111031:150250.061 sleeping for 3600 seconds

(in this example "MaxHousekeeperDelete=100")

As you see in the function "housekeeping_process_log()", where the values are deleted from the tables history, history_uint (and not only), no messages are added for the LOG_LEVEL_WARNING.

I ask to add these messages to log. Maybe individually per each table, maybe summarize history+trends, I don't know how is better.

If any such values were not deleted then needn't to report any messages.



 Comments   
Comment by Oleksii Zagorskyi [ 2011 Oct 31 ]

(1) By the way, the name of function "housekeeping_process_log()" seems is not optimal.

I suggest to change it to "housekeeping_process_deleted_items()" or similar.

<dimir> I chose housekeeping_cleanup(), RESOLVED in r23553

<zalex> Excellent ! CLOSED

Comment by dimir [ 2011 Nov 25 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4298

Comment by dimir [ 2011 Nov 25 ]

I've chosen "housekeeping_cleanup()", if nobody minds. So the report will look like this:

11921:20111125:180134.824 housekeeper deleted 0 records from history and trends, 28561 records of deleted items, 0 events, 0 alerts and 0 sessions

This will be logged on every housekeeper step even if nothing was removed. Agreed with sasha that this way there will will be no questions whether it actually was or not an attempt to delete the data.

Comment by Oleksii Zagorskyi [ 2011 Nov 27 ]

Dev branch tested. Works as expected. Single line for report is very-very good.

(2) I would suggest to write that line as:
housekeeper deleted: 0 records from history and trends, 28561 records of deleted items, 0 events, 0 alerts, 0 sessions
It seems for me it will be in more readable form.

<dimir> RESOLVED in r23560

<zalex> Many thanks. CLOSED.

Comment by dimir [ 2011 Nov 30 ]

Fixed in pre-1.8.10 r23632, pre-1.9.9 r23633.

Comment by Alexander Vladishev [ 2011 Nov 30 ]

(1) Broken compilation of the latest trunk.

housekeeper.c: In function ‘housekeeping_cleanup’:
housekeeper.c:135: error: ‘ids_alloc’ undeclared (first use in this function)
housekeeper.c:135: error: (Each undeclared identifier is reported only once
housekeeper.c:135: error: for each function it appears in.)
housekeeper.c:135: error: ‘ids_num’ undeclared (first use in this function)
housekeeper.c: In function ‘housekeeping_alerts’:
housekeeper.c:186: warning: ‘return’ with a value, in function returning void
housekeeper.c: In function ‘housekeeping_events’:
housekeeper.c:213: warning: ‘return’ with a value, in function returning void
housekeeper.c: In function ‘main_housekeeper_loop’:
housekeeper.c:328: error: void value not ignored as it ought to be
housekeeper.c:331: error: void value not ignored as it ought to be

<dimir> sorry for the broken trunk, RESOLVED in r23663 directly in trunk

<zalex> tested. trunk r23663 compiled ok and it works.

<sasha> CLOSED with small change in r23688.

Comment by richlv [ 2011 Nov 30 ]

(2) also :
what's the difference between "history and trends" and "records of deleted items" ? i assume the latter is not the amount of items, but the amount of history and trends values for those items ?

<dimir> "Old history and trends" is the outdated information (as configured in the item keep this and that), "records of deleted items" is all the data (basically what's in "housekeeper" table) related to removed item.

<zalex> maybe would be better to replace all words "records" to the "values"? It will be more clear.

<dimir> For me "deleted 2 values from history and trends" is not more clear than "deleted 2 records from hostory and trends". What I'd add is singular value support.

<dimir> if there are no objections, RESOLVED in r23681

<zalex> dev branch r23681 tested. it works (see 1 event):
"housekeeper deleted: 7056 records from history and trends, 0 records of deleted items, 1 event, 0 alerts, 0 sessions"
he-he, it's some "alternative" to gettext

<dimir> We decided to discard these changes as we don't have anything like it anywhere. CLOSED

Comment by dimir [ 2011 Dec 01 ]

Oleksiy, thank you for testing!

Comment by dimir [ 2011 Dec 01 ]

Fixed in trunk r23663.

Comment by richlv [ 2011 Dec 01 ]

(3) i suspect "d_clenup" is a typo

<dimir> Right, RESOLVED in pre-1.8.10 r23732, pre-1.9.9 r23733.
<sasha> CLOSED

Comment by Alexander Vladishev [ 2011 Dec 02 ]

Closing resolved issue

Comment by dimir [ 2011 Dec 02 ]

Reopening to assign to myself.

Comment by dimir [ 2011 Dec 02 ]

Closed.

Comment by richlv [ 2011 Dec 19 ]

48 deleted events reported as 1 in 1.8.10rc1

strace output :

event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=11", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=18", 40) = 40
event_strace.13938:write(6, "#\0\0\0\3delete from events where eventid=5", 39) = 39
event_strace.13938:write(6, "#\0\0\0\3delete from events where eventid=9", 39) = 39
event_strace.13938:write(6, "#\0\0\0\3delete from events where eventid=1", 39) = 39
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=47", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=38", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=44", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=43", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=36", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=42", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=34", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=35", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=31", 40) = 40
event_strace.13938:write(6, "#\0\0\0\3delete from events where eventid=8", 39) = 39
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=30", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=40", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=37", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=13", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=39", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=33", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=46", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=29", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=48", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=41", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=14", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=32", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=10", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=27", 40) = 40
event_strace.13938:write(6, "#\0\0\0\3delete from events where eventid=6", 39) = 39
event_strace.13938:write(6, "#\0\0\0\3delete from events where eventid=7", 39) = 39
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=12", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=26", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=45", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=24", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=25", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=17", 40) = 40
event_strace.13938:write(6, "#\0\0\0\3delete from events where eventid=2", 39) = 39
event_strace.13938:write(6, "#\0\0\0\3delete from events where eventid=3", 39) = 39
event_strace.13938:write(6, "#\0\0\0\3delete from events where eventid=4", 39) = 39
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=15", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=16", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=20", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=21", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=22", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=23", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=19", 40) = 40
event_strace.13938:write(6, "$\0\0\0\3delete from events where eventid=28", 40) = 40
event_strace.13938:write(7, " 13938:20111217:175347.261 housekeeper deleted: 0 records from history and trends, 0 records of deleted items, 1 events, 0 alerts, 0 sessions\n", 142) = 142

Comment by Alexander Vladishev [ 2011 Dec 19 ]

Fixed in the development branch svn://svn.zabbix.com/branches/dev/ZBX-4298

Comment by dimir [ 2011 Dec 19 ]

Tested successfully.

Comment by Alexander Vladishev [ 2011 Dec 28 ]

Available in version pre-1.8.11, r24309.





[ZBX-4190] Need to break apart the compounding of deletes followed by a logical or up in housekeeper.c Created: 2011 Sep 30  Updated: 2017 May 30  Resolved: 2013 Dec 31

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 1.8.8
Fix Version/s: None

Type: Incident report Priority: Minor
Reporter: Kam Lane Assignee: Unassigned
Resolution: Duplicate Votes: 1
Labels: housekeeper, mysql
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MySQL Community Server 5.5.15-log
Scientific Linux release 6.1 (Carbon)


Issue Links:
Duplicate
duplicates ZBX-5862 DELETE query from housekeeper table i... Closed

 Description   

I'm not sure if this is a bug or an improvement, but I'm seeing a significant amount of this in my zabbix_server.log. The most interesting part is that my mysql server is on the same box with my data dir sitting on a SAN LUN and zabbix is connecting via a socket connection. The database isn't restarting; I have a feeling the code is losing it's connection due to the query below in addition to a timeout.

Is there any way to break this query up into multiple statements that operate on a smaller number of rows...like everytime an "OR" condition is thrown in the example below? That logical OR is actually a pain point in this query due to the number of items it has to lookup in the index.

26063:20110930:130423.330 [Z3005] query failed: [2006] MySQL server has gone away [delete from housekeeper where (housekeeperid in (100100000120079,100100000120080,100100000120081,100100000120082,100100000120083,100100000120084,100100000120085,100100000120086,100100000120087,100100000120088,100100000120089,100100000120090,100100000120091,100100000120092,100100000120093,100100000120094,100100000120095,100100000120096,100100000120097,100100000120098,100100000120099,100100000120100,100100000120101,100100000120102,100100000120103,100100000120104,100100000120105,100100000120106,100100000120107,100100000120108,100100000120109,100100000120110,100100000120111,100100000120112,100100000120113,100100000120114,100100000120115,100100000120116,100100000120117,100100000120118,100100000120119,100100000120120,100100000120121,100100000120122,100100000120123,100100000120124,100100000120125,100100000120126,100100000120127,100100000120128,100100000120129,100100000120130,100100000120131,100100000120132,100100000120133,100100000120134,100100000120135,100100000120136,100100000120137,100100000120138,100100000120139,100100000120140,100100000120141,100100000120142,100100000120143,100100000120144,100100000120145,100100000120146,100100000120147,100100000120148,100100000120149,100100000120150,100100000120151,100100000120152,100100000120153,100100000120154,100100000120155,100100000120156,100100000120157,100100000120158,100100000120159,100100000120160,100100000120161,100100000120162,100100000120163,100100000120164,100100000120165,100100000120166,100100000120167,100100000120168,100100000120169,100100000120170,100100000120171,100100000120172,100100000120173, ...
0000121932,100100000121933,100100000121934,100100000121935,100100000121936,100100000121937,100100000121938,100100000121939,100100000121940,100100000121941,100100000121942,100100000121943,100100000121944,100100000121945,100100000121946,100100000121947,100100000121948,100100000121949,100100000121950,100100000121951,100100000121952,100100000121953,100100000121954,100100000121955,100100000121956,100100000121957,100100000121958,100100000121959,100100000121960,100100000121961,100100000121962,100100000121963,100100000121964,100100000121965,100100000121966,100100000121967,100100000121968,100100000121969,100100000121970,100100000121971,100100000121972,100100000121973,100100000121974,100100000121975,100100000121976,100100000121977,100100000121978) or housekeeperid in (100100000121979,100100000121980,100100000121981,100100000121982,100100000121983,100100000121984,100100000121985,100100000121986,100100000121987,100100000121988,100100000121989,100100000121990,100100000121991,100100000121992,100100000121993,100100000121994,100100000121995,100100000121996,100100000121997,100100000121998,100100000121999,100100000122000,100100000122001,100100000122002,100100000122003,100100000122004,100100000122005,100100000122006,100100000122007,100100000122008,100100000122009,100100000122010,100100000122011,100100000122012,100100000122013,100100000122014,100100000122015,100100000122016,100100000122017,100100000122018,100100000122019,100100000122020,10010000
...
0000123844,100100000123845,100100000123846,100100000123847,100100000123848,100100000123849,100100000123850,100100000123851,100100000123852,100100000123853,100100000123854,100100000123855,100100000123856,100100000123857,100100000123858,100100000123859,100100000123860,100100000123861,100100000123862,100100000123863,100100000123864,100100000123865,100100000123866,100100000123867,100100000123868,100100000123869,100100000123870,100100000123871,100100000123872,100100000123873,100100000123874,100100000123875,100100000123876,100100000123877,100100000123878) or housekeeperid in (100100000123879,100100000123880,100100000123881,100100000123882,100100000123883,100100000123884,100100000123885,100100000123886,100100000123887,100100000123888,100100000123889,100100000123890,100100000123891,100100000123892,100100000123893,100100000123894,100100000123895,100100000123896,100100000123897,100100000123898,100100000123899,100100000123900,100100000123901,100100000123902,100100000123903,100100000123904,100100000123905,100100000123906,100100000123907,100100000123908,100100000123909,100100000123910,100100000123911,100100000123912,100100000123913,100100000123914,100100000123915,100100000123916,100100000123917,100100000123918,100100000123919,100100000

zabbix[;]> show table status where Name = 'housekeeper';
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

housekeeper InnoDB 10 Compact 220902 59 13123584 0 0 27262976 NULL 2011-09-23 21:10:55 NULL NULL utf8_general_ci NULL    

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

zabbix[;]> show index from housekeeper;
----------------------------------------------------------------------------------------------------------------------------

Table Non_unique Key_name Seq_in_index Column_name Collation Cardinality Sub_part Packed Null Index_type Comment Index_comment

----------------------------------------------------------------------------------------------------------------------------

housekeeper 0 PRIMARY 1 housekeeperid A 222050 NULL NULL   BTREE    

----------------------------------------------------------------------------------------------------------------------------
1 row in set (0.00 sec)

zabbix[;]> select count from housekeeper;
----------

count

----------

221172

----------
1 row in set (0.00 sec)



 Comments   
Comment by Kam Lane [ 2011 Oct 10 ]

Further analysis executing the same queries by hand that end up in the log [as seen above], led me to discover that even a manual run of these queries [or subsets of these queries splitting at the 'OR'] were to big in terms of character/string size and the mysql server would kill my connection. I was able to cut down on this occurance significantly, but not completely, by modifying the mysql my.cnf file and adding:

max_allowed_packet = 16M

I tried many other values for packet size, but 16M was really the only value that I could change "max_allowed_packet" to in MySQL 5.5.15 and actually have that value hold through the server restart and be visible when calling the variable [show variables like 'max_allowed_%';]. The MySQL developer site talks about the max_allowed_packet size [the maximum size of one packet or any generated/intermediate string] by default being 1MB and obviously these queries can exceed that. I've also been wondering if zabbix is holding on to open database connections to long and round robin-ing them, but that's for another investigation.

To work around this, I propose that the code that generates the housekeeper delete queries be refactored to keep all queries under 1MB, as that's the default for the MySQL server. I also recommend that the documentation be updated in the interim stating that the 'max_allowed_packet' be increased from the default of 1MB.

The big issue here is if the code isn't re-factored breaking apart the compounding queries, it only takes a few failed queries before the housekeeper table starts queuing up and the process won't ever be able to cleanup as it's exceeded the max_allowed_packet size. The max max_allowed_packet size is 1GB on both the client and server. If it were to ever queue up to this, manual intervention would be required to cleanup the table.

Comment by Alexander Vladishev [ 2013 Dec 31 ]

Closed as duplicate.





Generated at Fri Apr 26 00:53:36 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.