[ZBX-8552] VMWare CURL Timeout sometimes too low .... Created: 2014 Jul 30  Updated: 2017 May 30  Resolved: 2014 Jul 30

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.2.5
Fix Version/s: None

Type: Incident report Priority: Minor
Reporter: Andras Fabian Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: timeout, vmware
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux nbg-web07 3.2.0-67-generic #101-Ubuntu SMP Tue Jul 15 17:46:11 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux


Issue Links:
Duplicate
duplicates ZBX-7719 VMWare monitoring bug (timeout reached) Closed

 Description   

Since about some weeks (beginning with Zabbix 2.2.3 and now with Zabbix 2.2.5) we have observed a slow "creeping up" of "Not supported items" in the Zabbix Internal graph.
After some investigation, it turned out, that a lot of our VMWare checks became unsupported, and the simple reason give was "Timeout was reached". What was interesting, that at the beginning (about 3 weeks ago) it happened only a few times, but then it started to happen more and more often (bringing up the number of "unsupported items" gradually). As of today, only a few times a day could it get some VMWare values (then - even visible in the logs - most items became "supported" for a short time).
I did a lengthy research of the issue with some debug sessions and even trying to find some hints in the vast vCenter logs (we get the values from the vCenter appliance) without a finding a smoking gun ...

I did also a lot of poking around in the source code, and found a single line, which made me curious (in vmware\vmware.c ):

	int		err, opt, timeout = 10, ret = FAIL;

			CURLE_OK != (err = curl_easy_setopt(easyhandle, opt = CURLOPT_TIMEOUT, (long)timeout)) ||

Thats where the timeout for CURL is set (which is obviously used to get the data from the VMWare API) ... and timeout is hard coded to be 10.
So, as a last experiment - just a hit or miss - I rised that timeout value to 30

	int		err, opt, timeout = 30, ret = FAIL;

And recompiled the Zabbix Server.

After an hour of watching the server, it seems this single setting completely "fixed" the problem. The number or "unsupported items" went back to normal (without any spikes since then), in the log, all VMWare related items became "supported" again, and I now regularly get updated values for my items.

So, after all, I would recommend to:

  • either rise the value of this hard coded setting to something like 30
  • or even better, make it another config value in the zabbix_server.conf

This problem could be related to:
https://support.zabbix.com/browse/ZBX-8057
https://www.zabbix.com/forum/showthread.php?t=43776



 Comments   
Comment by richlv [ 2014 Jul 30 ]

seems to be a duplicate of ZBX-7719, closing

Generated at Fri Apr 19 07:18:31 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.