[#ZBX-21705] History write cache leak

Comment by Evgeny Molchanov [ 2022 Sep 29 ]

I tried to update to 6.0.6, 6.0.7, 6.0.8, 6.0.9, version and I also see the same problem.

Used as a database PostgreSQL 14.5-1 + TimescaleDB 2.7.2

Server and database performance metrics show no problem (10-15% load) as of version 5.0.27.

The data in the interface does not lag behind, there are also no queues.

The server does not process any data itself - it receives all data from about 10 proxies.

Server config:

ListenPort=10051
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
SocketDir=/var/run/zabbix
DBHost=127.0.0.1
DBName=zabbix_server
DBUser=zabbix
DBPassword=secret
DBPort=5432
StartPollers=30
StartPollersUnreachable=10
StartPreprocessors=5
StartHistoryPollers=30
StartTrappers=30
StartDiscoverers=8
StartTimers=4
CacheSize=8G
StartDBSyncers=10
StartLLDProcessors=8
HistoryCacheSize=2G
HistoryIndexCacheSize=2G
TrendCacheSize=2G
ValueCacheSize=2G
Timeout=20
AlertScriptsPath=/opt/zabbix/alertscripts
ExternalScripts=/opt/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
LogSlowQueries=3000
StatsAllowedIP=127.0.0.1

Comment by Evgeny Molchanov [ 2022 Sep 29 ]

the only slow query longer than 3 seconds in the database:

2022-09-29 22:59:22.590 MSK [14018] zabbix@zabbix_server LOG:  duration: 12997.090 ms  statement: select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.snmp_oid,i.ipmi_sensor,i.delay,i.trapper_hosts,i.logtimefmt,i.params,ir.state,i.authtype,i.username,i.password,i.publickey,i.privatekey,i.flags,i.interfaceid,ir.lastlogsize,ir.mtime,i.history,i.trends,i.inventory_link,i.valuemapid,i.units,ir.error,i.jmx_endpoint,i.master_itemid,i.timeout,i.url,i.query_fields,i.posts,i.status_codes,i.follow_redirects,i.post_type,i.http_proxy,i.headers,i.retrieve_mode,i.request_method,i.output_format,i.ssl_cert_file,i.ssl_key_file,i.ssl_key_password,i.verify_peer,i.verify_host,i.allow_traps,i.templateid,null from items i inner join hosts h on i.hostid=h.hostid join item_rtdata ir on i.itemid=ir.itemid where h.status in (0,1) and i.flags<>2

Comment by Vladislavs Sokurenko [ 2022 Sep 30 ]

Please provide more information about items that accumulate, what kind of values do they have ?

Comment by Evgeny Molchanov [ 2022 Sep 30 ]

Items found in cache:

Top.values:
  itemid:9858563 values:298566
  itemid:10690270 values:242934
  itemid:9829034 values:239930
  itemid:9829055 values:236880
  itemid:9829051 values:235248

They have an update interval

0 second and period 1-5,00:00-24:00

in db:

 -[ RECORD 1 ]----+---------------------------------
itemid           | 9858563
type             | 7
snmp_oid         |
hostid           | 15873
name             | Validation Status
key_             | oracle[val_status,{$ORACLE_SID}]
delay            | 3m;0/1-5,00:00-24:00
history          | 30d
trends           | 365d
status           | 1
value_type       | 3

After disabling these elements, the growth of the cache stopped, after turning it back on, the growth of cache usage resumed, and the problem was localized.

After removing the update period - the problem also no longer reproduces.
Immediately after the upgrade, the server log contained the following lines:

1109639:20220929:192028.228 item "my-server-1:oracle[val_status,{$ORACLE_SID}]" became not supported: Incorrect update interval.

Items have an unsupported status:

"Value of type "string" is not suitable for value type "Numeric (unsigned)"

Comment by Vladislavs Sokurenko [ 2022 Sep 30 ]

Sounds similar to ~~ZBX-20487~~, can you please check agent logs ?

Comment by Evgeny Molchanov [ 2022 Sep 30 ]

The agent logs are clean, no errors.

In the web interface, I see the value (string) that comes when the script is executed.

Comment by Vladislavs Sokurenko [ 2022 Sep 30 ]

If you could please provide Zabbix server, Zabbix proxy and Zabbix agent version then we could try reproducing the issue if it still persists with latest versions

Comment by Evgeny Molchanov [ 2022 Sep 30 ]

Zabbix-server: 6.0.9

Zabbix-proxy: 6.0.9

Zabbix-agent: 2.4.5

Comment by Vladislavs Sokurenko [ 2022 Sep 30 ]

Reproduced.

Steps:
In 6.0 Zabbix server create Zabbix agent (active) item, for example ping and add flexible update interval.
3m;0/1-5,00:00-24:00
Launch Zabbix-agent 2.4.5
Notice Zabbix agent spamming:

'{"request":"agent data","data":[{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664106223},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664107848},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664108557},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664109098},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664109682},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664110265},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664110848},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664111390},{"host":"Zabbix server","key":"agent.ping","value":"1","clo

It is doing so because it treats 0 update interval as no interval

79589:20220930:165804.632 send_list_of_active_checks_json() sending [{"response":"success","data":[{"key":"agent.ping","delay":0,"lastlogsize":0,"mtime":0}]}]

Workaround is not to use flexible update interval as it is not supported by older agent anyway.

[ZBX-21705] History write cache leak Created: 2022 Sep 28 Updated: 2022 Oct 14
Status:	Confirmed
Project:	ZABBIX BUGS AND ISSUES
Component/s:	Server (S)
Affects Version/s:	6.2.3
Fix Version/s:	None

[ZBX-21705] History write cache leak Created: 2022 Sep 28 Updated: 2022 Oct 14