[ZBX-24974] History write cache depleted after Zabbix upgrade Created: 2024 Aug 06  Updated: 2024 Dec 12  Resolved: 2024 Dec 12

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 6.0.31
Fix Version/s: None

Type: Problem report Priority: Trivial
Reporter: t.oshima Assignee: Aigars Kadikis
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:
  • OS: Amazon Linux 2023(Fedora base) / 6.1.77-99.164.amzn2023.x86_64
  • DB: Aurora 8.0.mysql_aurora.3.05.2(MySQL 8.0.32 compatible)
  • Package version of Zabbix server:
  • zabbix-agent2 6.0.31
  • zabbix-agent2-plugin-mongodb 6.0.31
  • zabbix-agent2-plugin-postgresql 6.0.31
  • zabbix-apache-conf 6.0.31
  • zabbix-sender 6.0.31
  • zabbix-server-mysql 6.0.31
  • zabbix-sql-scripts 6.0.31
  • zabbix-web 6.0.31
  • zabbix-web-deps 6.0.31
  • zabbix-web-japanese 6.0.31
  • zabbix-web-mysql 6.0.31
  • Package version of monitored host:
  • zabbix-agent2 6.0.27

Attachments: PDF File 20240725-graph.pdf     PNG File Graph from the log monitoring test implementation.png     Text File diaginfo.txt     Text File diaginfo_d002.txt     PNG File peak.png    

 Description   

Steps to reproduce:
Upgrade Zabbix server from version 6.0.25 to 6.0.31.
The version of the monitored host has not been changed.
 
Result:
After the upgrade, the usage rate of the History write cache started to increase around 9:45, and the History write cache became full.
The usage rate of the History write cache started to rise from around 9:45 the next day, so we expanded the History write cache. After that, there was no sign that the usage rate of History write cache would decrease, so I downgraded Zabbix from 6.0.31 to 6.0.25.
After the downgrade, the history write cache stabilizes at a value close to 0%, suggesting that the issue is likely due to the upgrade.

 

We have already upgraded the development environment Zabbix to version 6.0.31, and the same issue has not occurred.

Expected:
After upgrading Zabbix, we expect the usage rate of the history write cache to stabilize at a value close to 0% without continuing to increase.
To address the vulnerability "CVE-2024-22120," we are considering upgrading Zabbix to version 6.0.28 or later.



 Comments   
Comment by Vladislavs Sokurenko [ 2024 Aug 06 ]

Please provide zabbix_server -R diaginfo when issue occurs

Comment by t.oshima [ 2024 Aug 07 ]

Is it difficult to determine the cause without running "zabbix_server -R diaginfo" while the problem is occurring?
We are rolling back from "6.0.31" to "6.0.25" due to the impact on monitoring in production environment and cannot get "zabbix_server -R diaginfo" during the issue.

Comment by Vladislavs Sokurenko [ 2024 Aug 07 ]

It's currently unknown why this has happened, in the past it was possible that some items spammed Zabbix server and they could be shown in diaginfo in that case.

Comment by t.oshima [ 2024 Aug 08 ]

Get "zabbix_server -R diaginfo" in development environment Zabbix, upgraded from "6.0.25" to "6.0.31".

Comment by t.oshima [ 2024 Aug 09 ]

Please find attached the diaginfo obtained with Zabbix (6.0.31) in development environment.
I have not observed any "history write cache" exhaustion in development environment of Zabbix.

Comment by Aigars Kadikis [ 2024 Aug 09 ]

It will be required to have "zabbix_server -R diaginfo" at the peak level:

 
By running "zabbix_server -R diaginfo," we can see which itemid holds many values in memory at the peak level. This can reveal the item type.

Comment by t.oshima [ 2024 Aug 15 ]

I am looking for a way to determine the cause of the problem from information other than diaginfo.

You mentioned that there were items spamming Zabbix server in the past, if possible please let us know what items were the cause.

Is it difficult to investigate the cause from Zabbix server logs or DB?

Comment by t.oshima [ 2024 Sep 25 ]

We have prepared a verification environment and were able to reproduce the issue. Could you please investigate the cause based on the attached diaginfo from the peak time?

Comment by Vladislavs Sokurenko [ 2024 Sep 30 ]

This part might explain it, as it is seen item with itemid 126643 has 1025153 values

Values:1386357 done:1202367 queued:199 processing:15 pending:183776 time:1.026531
Top.values:
  itemid:126643 values:1025153 steps:0
  itemid:56472 values:11404 steps:2
  itemid:56494 values:11404 steps:2
  itemid:56488 values:11400 steps:2
  itemid:56483 values:11400 steps:2
  itemid:56470 values:11398 steps:2
  itemid:132621 values:11395 steps:2
  itemid:132622 values:11392 steps:2
  itemid:56491 values:11391 steps:2
  itemid:56473 values:11390 steps:2
  itemid:132619 values:11388 steps:2
  itemid:56471 values:11388 steps:2
  itemid:56476 values:11388 steps:2
  itemid:132620 values:11387 steps:2
  itemid:56501 values:11387 steps:2
  itemid:56489 values:11384 steps:2
  itemid:56490 values:11381 steps:2
  itemid:126772 values:378 steps:0
  itemid:126677 values:377 steps:0
  itemid:41239 values:377 steps:0
  itemid:69793 values:377 steps:0
  itemid:88884 values:377 steps:0
  itemid:92699 values:377 steps:0
  itemid:91374 values:377 steps:0
  itemid:68404 values:377 steps:0
Comment by t.oshima [ 2024 Oct 04 ]

Thank you for your comment. 
Itemid 126643 is a log monitoring item. As a result of testing under the following conditions, only version 6.0.31 showed a tendency for the history write cache to increase.

Test Conditions

Created a log monitoring item on Linux, writing 30 lines per second to the monitored log file (for 2 hours).

Results

  • 6.0.31: The history write cache (% used) increased to about 15%.
  • 6.0.25: The history write cache (% used) remained around 0.2%.

Additional Information

This phenomenon was confirmed in Linux log monitoring, but no increase in the history write cache (% used) was observed in Windows log monitoring with version 6.0.31.

Consideration

Based on these results, we suspect a performance degradation specific to version 6.0.31. Could you please share your thoughts on this matter?

Comment by t.oshima [ 2024 Dec 12 ]

We implemented the following two measures:

  • Removed log monitoring items with large data volumes
  • Upgraded to version 6.0.34 where no performance degradation is observed

We will close this case.

 

Generated at Mon Jun 30 09:11:49 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.