[#ZBX-19204] Large trend cache breaks history sync

Since I updated the OS on my zabbix-server last night, I got to experience this joy again.

Currently, the only thing I can do is wait for zabbix to go catatonic, then kill -9 the service to flush the trend cache, and start again. This is certainly not an expected feature of a monitoring tool that claims to be enterprise class.

For reference, I have 6 db syncers (much testing shows this is the optimal count for "most of the time") to support 10 proxies, monitoring 7308 enabled hosts (113546 enabled items), with a required performance of 2589.51. Those syncers can push 70000 items in about 10 seconds, EACH when the server comes online, as long as there's no trends to sync.

But if I have to sync existing trend data, performance falls to about 700 synced items per 70 seconds. That's so slow that the syncers have no possible way to catch up.

Unfortunately, the three issues resolved in 9.4 above don't really help this case, because the database continues to have excellent performance for other queries, and the issue is the bottleneck in how the history syncers specifically handle trend writes.

Another option may to stop using the trend cache at all; the following is an example of how I populated trends for a one hour period after I had to restart the database. (pgsql 13, with timescaledb). I use the ON CONFLICT function here because zabbix may have partially written data that I need to replace.

insert into trends
select
itemid
,clock - (clock % 3600) as clock
,count( * ) as num
,min(value) as value_min
,avg(value) as value_avg
,max(value) as value_max
from history
where
history.clock >= (extract(epoch from '2021-08-06 06:00:00-07'::timestamptz))::bigint
and history.clock < (extract(epoch from '2021-08-06 07:00:00-07'::timestamptz))::bigint
group by itemid,2
ON CONFLICT (itemid,clock)
DO
UPDATE SET num=EXCLUDED.num,value_min=EXCLUDED.value_min,value_avg=EXCLUDED.value_avg,value_max=EXCLUDED.value_max;

insert into trends_uint
select
itemid
,clock - (clock % 3600) as clock
,count( * ) as num
,min(value) as value_min
,avg(value) as value_avg
,max(value) as value_max
from history_uint
where
history_uint.clock >= (extract(epoch from '2021-08-06 06:00:00-07'::timestamptz))::bigint
and history_uint.clock < (extract(epoch from '2021-08-06 07:00:00-07'::timestamptz))::bigint
group by itemid,2
ON CONFLICT (itemid,clock)
DO
UPDATE SET num=EXCLUDED.num,value_min=EXCLUDED.value_min,value_avg=EXCLUDED.value_avg,value_max=EXCLUDED.value_max;

Number of hosts (enabled/disabled)

1847

1724 / 123

Number of templates

323

Number of items (enabled/disabled/not supported)

332317

303556 / 25964 / 2797

Number of triggers (enabled/disabled [problem/ok])

297151

255789 / 41362 [990 / 254799]

Number of users (online)

155

Required server performance, new values per second

3177.43

[ZBX-19204] Large trend cache breaks history sync Created: 2021 Apr 02 Updated: 2025 Mar 04 Resolved: 2024 Feb 01
Status:	Closed
Project:	ZABBIX BUGS AND ISSUES
Component/s:	None
Affects Version/s:	None
Fix Version/s:	None

[ZBX-19204] Large trend cache breaks history sync Created: 2021 Apr 02 Updated: 2025 Mar 04 Resolved: 2024 Feb 01