[ZBX-19204] Large trend cache breaks history sync Created: 2021 Apr 02 Updated: 2025 Mar 04 Resolved: 2024 Feb 01 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | None |
Affects Version/s: | None |
Fix Version/s: | None |
Type: | Problem report | Priority: | Major |
Reporter: | Aaron Whiteman | Assignee: | Vladislavs Sokurenko |
Resolution: | Duplicate | Votes: | 20 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
Description |
Environment: **Zabbix Server 5.0.10, CentOS linux, 2 cores, 16Gb RAM Dedicated Db: Posgresql 12, timescaledb 2, 1 month shards as per recommendation of Zabbix (see Dedicated Front End server, Zabbix 5.0.10, nginx. Server monitors minimal items (primarily itself and the database), all primary monitoring performed by a set of 9 proxies. Steps to reproduce:
Result: **Zabbix will use all history syncs to store trend data until the trend data is flushed . For particularly large databases, using the recommended timescale settings of one shard per month (see Expected:
Possible solutions (there are likely others, but these are possibilities):
|
Comments |
Comment by Tomáš Heřmánek [ 2021 May 18 ] | ||||||||||||||||||
Hi, same issue with zabbix 5.2. This is improved in zabbix 5.4? https://support.zabbix.com/browse/ZBXNEXT-782 https://support.zabbix.com/browse/ZBXNEXT-6280 https://support.zabbix.com/browse/ZBXNEXT-6503 Thank you, Tom | ||||||||||||||||||
Comment by Aaron Whiteman [ 2021 Aug 06 ] | ||||||||||||||||||
Since I updated the OS on my zabbix-server last night, I got to experience this joy again.
Currently, the only thing I can do is wait for zabbix to go catatonic, then kill -9 the service to flush the trend cache, and start again. This is certainly not an expected feature of a monitoring tool that claims to be enterprise class.
For reference, I have 6 db syncers (much testing shows this is the optimal count for "most of the time") to support 10 proxies, monitoring 7308 enabled hosts (113546 enabled items), with a required performance of 2589.51. Those syncers can push 70000 items in about 10 seconds, EACH when the server comes online, as long as there's no trends to sync.
But if I have to sync existing trend data, performance falls to about 700 synced items per 70 seconds. That's so slow that the syncers have no possible way to catch up.
Unfortunately, the three issues resolved in 9.4 above don't really help this case, because the database continues to have excellent performance for other queries, and the issue is the bottleneck in how the history syncers specifically handle trend writes. Another option may to stop using the trend cache at all; the following is an example of how I populated trends for a one hour period after I had to restart the database. (pgsql 13, with timescaledb). I use the ON CONFLICT function here because zabbix may have partially written data that I need to replace. insert into trends insert into trends_uint | ||||||||||||||||||
Comment by bunkzilla [ 2021 Dec 10 ] | ||||||||||||||||||
having similar issue in 5.4. Brought ingestion do a crawl, set off a trigger storm due to triggers setup for NODATA. I heard there was perhaps some magic about when one can stop the zabbix and then start it. But ideally I'd like to be able to restart zabbix at any time and not have the top of the hour cripple everything and cause false triggers. | ||||||||||||||||||
Comment by GOID [ 2023 Apr 11 ] | ||||||||||||||||||
same behavior on Zabbix 6 (PostgreSQL + TimeScaleDB) and older version Zabbix 5 too.
For now solution is let all queries done. Until that - monitoring overloaded by triggers with unreacheable hosts | ||||||||||||||||||
Comment by Sergey [ 2023 May 25 ] | ||||||||||||||||||
6.2.3 too | ||||||||||||||||||
Comment by Tomáš Heřmánek [ 2023 Jun 27 ] | ||||||||||||||||||
Maybe vso can give us some help or clue. I think Zabbix need to have first parallel writing history syncer. Definitely not easy task to solve. | ||||||||||||||||||
Comment by GOID [ 2023 Jul 26 ] | ||||||||||||||||||
nothing changed at 6.0.19 - after restart on begin of next hour "updates" takes 20minutes | ||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Jan 17 ] | ||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2024 Feb 01 ] | ||||||||||||||||||
Issue was fixed under | ||||||||||||||||||
Comment by Vladislavs Sokurenko [ 2025 Mar 03 ] | ||||||||||||||||||
Please check if |