[ZBX-21705] History write cache leak Created: 2022 Sep 28  Updated: 2022 Oct 14

Status: Confirmed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 6.2.3
Fix Version/s: None

Type: Problem report Priority: Trivial
Reporter: Evgeny Molchanov Assignee: Zabbix Development Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 20.04


Attachments: PNG File 2022-09-28_23-17-32-1.png     PNG File image-2022-09-28-23-53-28-256.png    

 Description   

Steps to reproduce:

Upgrade from 5.0.27 to 6.2.3

Result:

Leak history write cache

 

zabbix_server -R diaginfo=historycache
== history cache diagnostic information ==
Items:967 values:15211481 time:0.165088
Memory.data:
  size: free:672386656 used:984689904
  chunks: free:227458 used:30422962 min:24 max:1241344
    buckets:
      24:2
      80:192
      96:4
      120:8
      128:2
      136:33
      176:18
      192:2
      232:2
      256+:227195
Memory.index:
  size: free:2100265384 used:47201912
  chunks: free:26 used:972 min:6576 max:1991610192
    buckets:
      256+:26
Top.values:
  itemid:9858563 values:1543699
  itemid:9829055 values:1285728
  itemid:10690270 values:1280093
  itemid:9829051 values:1251979
  itemid:9829034 values:1230478
  itemid:9829052 values:1116630
  itemid:9829045 values:1114413
  itemid:9829046 values:1072323
  itemid:9829040 values:1071086
  itemid:12624782 values:1029427
  itemid:9829036 values:996521
  itemid:9829056 values:935154
  itemid:9829035 values:833783
  itemid:9829044 values:448364
  itemid:9858558 values:98
  itemid:12624777 values:85
  itemid:4560302 values:79
  itemid:9682895 values:73
  itemid:768217 values:73
  itemid:671391 values:73
  itemid:3842244 values:54
  itemid:10690265 values:54
  itemid:4560547 values:53
  itemid:5626535 values:53
  itemid:9682628 values:53
==

Expected:

No leak history write cache



 Comments   
Comment by Evgeny Molchanov [ 2022 Sep 29 ]

I tried to update to 6.0.6, 6.0.7, 6.0.8, 6.0.9, version and I also see the same problem.

Used as a database PostgreSQL 14.5-1 + TimescaleDB 2.7.2 

Server and database performance metrics show no problem (10-15% load) as of version 5.0.27.

The data in the interface does not lag behind, there are also no queues.

The server does not process any data itself - it receives all data from about 10 proxies.

Server config:

ListenPort=10051
LogFile=/var/log/zabbix/zabbix_server.log
LogFileSize=0
PidFile=/var/run/zabbix/zabbix_server.pid
SocketDir=/var/run/zabbix
DBHost=127.0.0.1
DBName=zabbix_server
DBUser=zabbix
DBPassword=secret
DBPort=5432
StartPollers=30
StartPollersUnreachable=10
StartPreprocessors=5
StartHistoryPollers=30
StartTrappers=30
StartDiscoverers=8
StartTimers=4
CacheSize=8G
StartDBSyncers=10
StartLLDProcessors=8
HistoryCacheSize=2G
HistoryIndexCacheSize=2G
TrendCacheSize=2G
ValueCacheSize=2G
Timeout=20
AlertScriptsPath=/opt/zabbix/alertscripts
ExternalScripts=/opt/zabbix/externalscripts
FpingLocation=/usr/bin/fping
Fping6Location=/usr/bin/fping6
LogSlowQueries=3000
StatsAllowedIP=127.0.0.1 
Comment by Evgeny Molchanov [ 2022 Sep 29 ]

the only slow query longer than 3 seconds in the database:

2022-09-29 22:59:22.590 MSK [14018] zabbix@zabbix_server LOG:  duration: 12997.090 ms  statement: select i.itemid,i.hostid,i.status,i.type,i.value_type,i.key_,i.snmp_oid,i.ipmi_sensor,i.delay,i.trapper_hosts,i.logtimefmt,i.params,ir.state,i.authtype,i.username,i.password,i.publickey,i.privatekey,i.flags,i.interfaceid,ir.lastlogsize,ir.mtime,i.history,i.trends,i.inventory_link,i.valuemapid,i.units,ir.error,i.jmx_endpoint,i.master_itemid,i.timeout,i.url,i.query_fields,i.posts,i.status_codes,i.follow_redirects,i.post_type,i.http_proxy,i.headers,i.retrieve_mode,i.request_method,i.output_format,i.ssl_cert_file,i.ssl_key_file,i.ssl_key_password,i.verify_peer,i.verify_host,i.allow_traps,i.templateid,null from items i inner join hosts h on i.hostid=h.hostid join item_rtdata ir on i.itemid=ir.itemid where h.status in (0,1) and i.flags<>2 
Comment by Vladislavs Sokurenko [ 2022 Sep 30 ]

Please provide more information about items that accumulate, what kind of values do they have ?

Comment by Evgeny Molchanov [ 2022 Sep 30 ]

Items found in cache:

Top.values:
  itemid:9858563 values:298566
  itemid:10690270 values:242934
  itemid:9829034 values:239930
  itemid:9829055 values:236880
  itemid:9829051 values:235248 

They have an update interval

0 second and period 1-5,00:00-24:00

in db:

 -[ RECORD 1 ]----+---------------------------------
itemid           | 9858563
type             | 7
snmp_oid         |
hostid           | 15873
name             | Validation Status
key_             | oracle[val_status,{$ORACLE_SID}]
delay            | 3m;0/1-5,00:00-24:00
history          | 30d
trends           | 365d
status           | 1
value_type       | 3 

After disabling these elements, the growth of the cache stopped, after turning it back on, the growth of cache usage resumed, and the problem was localized.

After removing the update period - the problem also no longer reproduces.
Immediately after the upgrade, the server log contained the following lines:

1109639:20220929:192028.228 item "my-server-1:oracle[val_status,{$ORACLE_SID}]" became not supported: Incorrect update interval. 

Items have an unsupported status:

"Value of type "string" is not suitable for value type "Numeric (unsigned)" 
Comment by Vladislavs Sokurenko [ 2022 Sep 30 ]

Sounds similar to ZBX-20487, can you please check agent logs ?

Comment by Evgeny Molchanov [ 2022 Sep 30 ]

The agent logs are clean, no errors.

In the web interface, I see the value (string) that comes when the script is executed.

Comment by Vladislavs Sokurenko [ 2022 Sep 30 ]

If you could please provide Zabbix server, Zabbix proxy and Zabbix agent version then we could try reproducing the issue if it still persists with latest versions

Comment by Evgeny Molchanov [ 2022 Sep 30 ]

Zabbix-server: 6.0.9

Zabbix-proxy: 6.0.9

Zabbix-agent: 2.4.5

Comment by Vladislavs Sokurenko [ 2022 Sep 30 ]

Reproduced.

Steps:
In 6.0 Zabbix server create Zabbix agent (active) item, for example ping and add flexible update interval.
3m;0/1-5,00:00-24:00
Launch Zabbix-agent 2.4.5
Notice Zabbix agent spamming:

'{"request":"agent data","data":[{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664106223},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664107848},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664108557},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664109098},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664109682},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664110265},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664110848},{"host":"Zabbix server","key":"agent.ping","value":"1","clock":1664545981,"ns":664111390},{"host":"Zabbix server","key":"agent.ping","value":"1","clo 

It is doing so because it treats 0 update interval as no interval

79589:20220930:165804.632 send_list_of_active_checks_json() sending [{"response":"success","data":[{"key":"agent.ping","delay":0,"lastlogsize":0,"mtime":0}]}] 

Workaround is not to use flexible update interval as it is not supported by older agent anyway.

 

Generated at Sat Jun 14 17:15:37 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.