Details
-
Problem report
-
Status: Closed
-
Trivial
-
Resolution: Duplicate
-
6.0.4
-
None
-
None
Description
At some point proxy's mysql db size started to grow very quickly. There were no changes on zabbix's side, and there were no issues in communication between proxy and server. After some digging I found that few records in proxy_history table had wrong (up to 1 day behind) timestamp in clock field.
MariaDB [zabbix_proxy]> select * from proxy_history order by clock limit 1\G *************************** 1. row *************************** id: 77883976481 itemid: 3599450 clock: 1652602629 // Sun May 15 14:17:09 +06 2022 timestamp: 0 source: severity: 0 value: 1.9942587644402614 logeventid: 0 ns: 10687388 state: 0 lastlogsize: 0 mtime: 0 flags: 0 write_clock: 1652689030 // Mon May 16 01:42:53 +06 2022 1 row in set (0.01 sec)
By such records I found monitored host with wrong clock settings.
Steps to reproduce:
proxy, HousekeepingFrequency=1, housekeep will remove up to 4 hours of old history, starting from oldest record (select min(clock) from proxy_history).
Setup monitoring for few hosts with correct time settings and 1 host with wrong time settings (-1 day, for example).
Collect some metrics for 1+ hour.
Wait for housekeeper execute of trigger it manually.
Observe housekeeper to delete records from problem hosts since they are "oldest" by clock field. Records from other hosts with correct time settings still will be there.
Result:
Collected metrics stored in proxy_history with monitored hosts' time in clock field. db size will grow up since cleanup of most records will be delayed with time lag of problem host.
Example. Normal housekeeper run:
1774:20220514:113111.182 housekeeper [deleted 4136184 records in 35.317049 sec, idle for 1 hour(s)]
Housekeep run after time changed on one monitored host:
1774:20220514:123111.708 housekeeper [deleted 20293 records in 0.136450 sec, idle for 1 hour(s)]
Expected:
Monitored hosts cannot break server-side logic.
PS. Didn't checked if server's housekeeper also affected.
PPS. Caught on 5.2.5, but latest 6.0.4 also affected, if I understood source code correctly.