Since about some weeks (beginning with Zabbix 2.2.3 and now with Zabbix 2.2.5) we have observed a slow "creeping up" of "Not supported items" in the Zabbix Internal graph.
After some investigation, it turned out, that a lot of our VMWare checks became unsupported, and the simple reason give was "Timeout was reached". What was interesting, that at the beginning (about 3 weeks ago) it happened only a few times, but then it started to happen more and more often (bringing up the number of "unsupported items" gradually). As of today, only a few times a day could it get some VMWare values (then - even visible in the logs - most items became "supported" for a short time).
I did a lengthy research of the issue with some debug sessions and even trying to find some hints in the vast vCenter logs (we get the values from the vCenter appliance) without a finding a smoking gun ...
I did also a lot of poking around in the source code, and found a single line, which made me curious (in vmware\vmware.c ):
Thats where the timeout for CURL is set (which is obviously used to get the data from the VMWare API) ... and timeout is hard coded to be 10.
So, as a last experiment - just a hit or miss - I rised that timeout value to 30
And recompiled the Zabbix Server.
After an hour of watching the server, it seems this single setting completely "fixed" the problem. The number or "unsupported items" went back to normal (without any spikes since then), in the log, all VMWare related items became "supported" again, and I now regularly get updated values for my items.
So, after all, I would recommend to:
- either rise the value of this hard coded setting to something like 30
- or even better, make it another config value in the zabbix_server.conf
This problem could be related to: