[ZBXNEXT-4661] Excessive unexplained value cache hits leads to poor zabbix server performance Created: 2018 Aug 01 Updated: 2018 Aug 17 Resolved: 2018 Aug 17 |
|
Status: | Closed |
Project: | ZABBIX FEATURE REQUESTS |
Component/s: | Server (S) |
Affects Version/s: | 3.4.11 |
Fix Version/s: | None |
Type: | Change Request | Priority: | Major |
Reporter: | James Cook | Assignee: | Unassigned |
Resolution: | Won't fix | Votes: | 1 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Zabbix 3.4.11 Server, 8 * Zabbix 3.4.11 Proxies |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Description |
Hi, We have this issue where there is an unexplained huge jump in Value Cache hits in which we then experience a build up in the History cache until we restart the server. This happens regularly ie every week or two. I have included several graphs for the last 7 day period. Clearly you can see at approx 2-3pm yesterday there was a massive jump in value cache hits without a massive jump in monitoring configuration and the history cache starting to rise at the exact same time. I have attached several graphs which show this value cache hits, process performance, cache capacity, value performance and monitoring configuration graphs. Regards James |
Comments |
Comment by James Cook [ 2018 Aug 01 ] |
Hi, I have attached the value cache graph as well to show how the value cache did not really increase during that period, which may suggest something is actually hitting the value cache to frequently rather than returning to much data. Cheers James |
Comment by James Cook [ 2018 Aug 01 ] |
Hi, I would like to know if there is a way to dump out what is in the value cache and what is in the configuration cache in order to identify potential triggers in memory that could be causing the excessive hits? Cheers James |
Comment by James Cook [ 2018 Aug 01 ] |
Hi, I have attached the template that was applied to 750 hosts yesterday at 2-3pm which appears to correlate with the issue. We have tons of identicle triggers that do not cause the same issue so I am puzzled why this is the case. I also removed this template that was applied to the hosts yesterday and even after a configuration cache reload the issue still exists. I am wandering if the triggers have actually been deleted in the live configuration cache which is why I would like to dump a list of triggers in the live configuration cache? Cheers James |
Comment by Alexey Pustovalov [ 2018 Aug 01 ] |
James, Unfortunately it is impossible to dump value cache. I suppose the reason is log items. Do you have log monitoring? |
Comment by James Cook [ 2018 Aug 01 ] |
Hi Alexey, We do monitor the syslog (/var/log/messages) on our linux systems and some light windows event log, however this has been in place for years. Cheers James |
Comment by Alexey Pustovalov [ 2018 Aug 01 ] |
James, I did not say that is there something new, maybe you changed trigger expression or sometimes log monitoring receives much more records than usually. You can check it in history_log table. |
Comment by James Cook [ 2018 Aug 01 ] |
Hi Alexey,
Sorry misinterperation...
I have counted up the history_log rows per day as the following
26/07/2018 - 56790 27/07/2018 - 65303 28/07/2018 - 52207 29/07/2018 - 51363 30/07/2018 - 65147 31/07/2018 - 256293 01/08/2018 - 41361 (to now only 11 hours)
There does seem to be a increase yesterday sometime so what I will do is look if the increase happened during 2-3 pm where the graph shows the increase.
If so I will then identify what actual items have increased and look at disabling it temporarily etc...
Cheers James |
Comment by Alexey Pustovalov [ 2018 Aug 01 ] |
James, It is better to add group by itemid + clock (by hour). It will help you understand which item is guilty. |
Comment by James Cook [ 2018 Aug 01 ] |
Hi Alexey, Wow you were spot on... I found an individual item that had increased its submission rate (using SQL).... I disabled the trigger and it went straight back to normal.... I will keep an eye on it for a couple of hours and then we can close it. Something for me to look at in the future for similar problems. Cheers James |
Comment by Alexey Pustovalov [ 2018 Aug 01 ] |
Ok. Also, please, do not forget support.zabbix.com is used for bug reports, not configuration problems, performance problems and etc |
Comment by James Cook [ 2018 Aug 01 ] |
Hi Alexey, No problems and I will remember ... Thumbs up for being so quick. Cheers James |