[ZBX-26116] Excessive Event Log Data from vmware.eventlog key crashes zabbix Created: 2025 Feb 28 Updated: 2025 Apr 09 |
|
Status: | Confirmed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 7.0.9 |
Fix Version/s: | None |
Type: | Problem report | Priority: | Blocker |
Reporter: | Michal Kudlacz | Assignee: | Michael Veksler |
Resolution: | Unresolved | Votes: | 2 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: |
![]() ![]() ![]() |
||||
Issue Links: |
|
||||
Sprint: | Support backlog |
Description |
While analyzing zabbix_server -R diaginfo, identified three item IDs dominating the history write cache. Found that all three corresponded to the official vmware.eventlog[url,skip] key, assigned to three different VMware hosts.
Fix: The issue is that if someone who is unaware of that downloads the updated template and set up vmware monitoring, can crash the entire infrastructure. |
Comments |
Comment by Marco Hofmann [ 2025 Mar 03 ] |
On Friday, we upgraded from 6.4.21 to 7.0.10. Since then, 3 out of 9 vCenter report a Timeout on the item 'vmware.eventlog[\{$VMWARE.URL},skip]'. I suspect this might be related, but for us, it doesn't crash, but never finishes. |
Comment by Michael Veksler [ 2025 Mar 03 ] |
Hi starko, Please confirm that you are successfully receiving event log for all vc before upgrading. I suspect that the 3 vc with issues have not sent events for a long time and the delayed events are overloading the server (when they started coming in) |
Comment by Marco Hofmann [ 2025 Mar 04 ] |
Hi MVekslers ! Long time no read I try to gather as much information as possible. Zabbix server 7.0.10 was exactly started for the first time on: The Zabbix proxies were all upgraded to 7.0.10 a few hours later with saltstack. The last one at about 28.02.2025 17:00:00. Our main vCenter datacenter, with 33 vSphere Hosts and 787 virtual machines, still reports its event log without issues. One of our key customers has a vCenter datacenter with 6 vSphere hosts and 108 virtual machines, still reports its event log without issues. The reason I know event log monitoring stopped working is not because I double-checked latest data, it's because I created a nodata trigger for vmware.eventlog years ago: VMware: Event log monitoring stopped nodata(/vcenter.customer.de/vmware.eventlog[{$VMWARE.URL},skip],8h)=1 Affected customer 1 has stopped reporting events 2025-03-01 10:28:22 PM Affected customer 2 stopped reporting events 2025-03-01 11:06:42 PM Affected customer 3 has stopped reporting events 2025-03-02 08:02:26 PM All three have stopped reporting event logs several hours after the Zabbix Proxy upgrade. Please tell me, what else debug information you need. EDIT 04.03.2025 11:48: I have to apologize, the error is gone. It just came to my mind, that I haven't restarted all Zabbix proxy VMs after the upgrade from 6.4.21 -> 7.0.10 and therefore I just rebooted the three responsible Debian Linux VMs. A few minutes later the nodata Trigger Events closed themselves as new event log data was being delivered. |
Comment by Michael Veksler [ 2025 Mar 04 ] |
Hi starko, Thanks for info |
Comment by Michael Veksler [ 2025 Mar 07 ] |
Hi starko, |
Comment by Marco Hofmann [ 2025 Mar 13 ] |
Sorry for the late update, but I had to be sure. The nodata Trigger for the vmware.eventlog item key triggers for (nearly) each vCenter every other day. So there is defenetly something wrong since switching from 6.4.21 to 7.0.10.
How can I provide the data you need to troubleshoot this? |
Comment by Michael Veksler [ 2025 Mar 13 ] |
To protect HistoryCache a "fuse" was introduced that stop "even collection" when HistoryCache usage > 80% |
Comment by Marco Hofmann [ 2025 Mar 18 ] |
Please tell me, if these are the values you need. If not, please tell me, where to find them. Zabbix Proxy is 7.0.10 on Debian 12. Zabbix proxy is monitored with version 7.0-1 of the official template: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/zabbix_proxy?at=refs%2Fheads%2Frelease%2F7.0 For HistoryCache, I guess you mean the metrics of the Zabbix Proxy? If yes, this Zabbix proxy is affected (vmware.eventlog nodata) at the moment, the values look not relevant to the problem: |
Comment by Michael Veksler [ 2025 Mar 19 ] |
Hi starko, Please increase the log level: zabbix_server -R log_level_increase="vmware collector" Send me a log with the problem, please. Btw, can you test dev build ? (I am not sure that current logs are enough) |
Comment by Marco Hofmann [ 2025 Mar 21 ] |
I send you the VMware debug log via E-Mail. I don't think I can test a dev build, I only have access to this prod environment, I'm sorry. |
Comment by Vladislavs Sokurenko [ 2025 Apr 09 ] |
Do you have any information how much memory was used for example by preprocessing manager ? mkudlacz ? |
Comment by Vladislavs Sokurenko [ 2025 Apr 09 ] |
Shouldn't template also be updated so it have some kind of filtering for event log or be disabled by default ? |