[ZBX-4393] Zabbix_server ignores Timeout config option for snmp checks Created: 2011 Nov 25 Updated: 2022 Oct 08 Resolved: 2013 May 16 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 1.8.8 |
Fix Version/s: | 2.1.0 |
Type: | Incident report | Priority: | Major |
Reporter: | Pavel Timofeev | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 2 |
Labels: | snmp, timeout | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
FreeBSD 8.2 RELEASE amd64, zabbix-server-1.8.8, net-snmp-5.7 |
Attachments: |
![]() ![]() ![]() ![]() |
||||||||||||
Issue Links: |
|
Description |
Looks like zabbix_server doesn't listen to Timeout config option for snmp checks on FreeBSD (Don't have Linux, can't check). Now I have Timeout=30, but I see zabbix log messages like: Ok, let's do debug = 4. [root@octans ~]# tcpdump -N -i bce0 host 192.168.9.42 About 1 second for waiting responce again. 1 second is default timeout parameter for net-snmp. So let's try net-snmp's snmpget - same tcpdump output. But if I set -t 30 all works good every time. P.S. [root@octans /usr/ports/net-mgmt/net-snmp]# make showconfig |
Comments |
Comment by richlv [ 2011 Nov 25 ] |
there was an issue about snmp timeout fixing - |
Comment by Pavel Timofeev [ 2011 Nov 28 ] |
cat /usr/local/etc/snmp/snmp.conf seems like it can help as workaround. |
Comment by Eric Gearhart [ 2012 Apr 10 ] |
Just wanted to quickly mention here the token 'timeout' is apparently not recognized by net-snmp... if I set 'timeout 30' in /etc/snmp/snmp.conf I get "Warning: Unknown token timeout" errors in zabbix_server.log I have no idea how to set net-snmp's default client timeout... I've been all over a bunch of man pages and I've googled, and all I see online are references to how to set session timeout via the net-snmp API (which would be bundled into the poller_snmp source in Zabbix |
Comment by Pavel Timofeev [ 2012 Apr 11 ] |
it works for me I advice you to see `man snmp.conf` |
Comment by Oleksii Zagorskyi [ 2012 Apr 11 ] |
my FreeBSD 8.1 box:
timeout INTEGER my Debian 6.0.4 box:
CHANGELOG of net-snmp for 5.7:
|
Comment by Eric Gearhart [ 2012 Apr 11 ] |
Good catch Oleksiy - I can try to find or build some net-snmp 5.7 RPMs for CentOS and rebuild Zabbix against them, and see if 'timeout' in snmp.conf works (it should work in theory) |
Comment by Eric Gearhart [ 2012 Apr 21 ] |
Just to follow up on this one, I "rolled my own" net-snmp packages and rebuilt Zabbix against them, and the timout problems seems to persist. |
Comment by Jiann-Ming Su [ 2012 Sep 09 ] |
Seems like a problem with snmp_synch_response() in net-snmp. I cooked up the attached poller_timeout.patch for 2.0.2. Ugly, but seems to work. It cleared up my queue backlog and no more "network error" log messages. |
Comment by Jiann-Ming Su [ 2012 Sep 12 ] |
Though, my patch may behave weird if an unknown or invalid OID is passed in... |
Comment by Eric Gearhart [ 2012 Sep 20 ] |
YOUR PATCH JUST COMPLETELY FIXED THE TIMEOUT PROBLEM THAT HAS BEEN PLAGUING ME FOR MONTHS! Many thanks. I just applied your patch, and the SNMPv3 based queue backups that I have been experiencing on and off for months (ever since I was on Zabbix 2.0 pre versions!) is completely gone) Can we please, PLEASE get this patch into 2.0.3 |
Comment by Jiann-Ming Su [ 2012 Sep 22 ] |
I'm testing another patch that may behave better. The one I have may get caught up in a loop. I'm letting this other patch run over the weekend to see if it works better. I still think the problem is related to net-snmp. |
Comment by Jiann-Ming Su [ 2012 Sep 25 ] |
New poller timeout patch dated 2012-09-22. Found it easier not to make additional alarm() calls and simply put a loop counter. The number of "tries" is completely arbitrary. The loop counter behaves better than calling alarm() multiple times as doing so could cause an infinite loop condition. |
Comment by Oleksii Zagorskyi [ 2012 Dec 07 ] |
Jiann, has your patch some dependency of libsnmp version ? |
Comment by Jiann-Ming Su [ 2012 Dec 07 ] |
No dependency on libsnmp. My patch is really an ugly hack around some misbehaving versions of libsnmp. I'm convinced it's libsnmp that's prematurely closing connections before the timeout. The proper solution may be to find a libsnmp version that works properly and make zabbix dependent on that version. |
Comment by dimir [ 2012 Dec 14 ] |
The net-snmp's structure snmp_session has fields "timeout" and "retries" which we are not using in Zabbix code: http://www.net-snmp.org/dev/agent/structsnmp__session.html I think it's worth to try setting the timeout when creating snmp session in checks_snmp.c:snmp_open_session(). On my Debian the net-snmp version is 5.4.3~dfsg-2 and the structure has both "timeout" and "retries". |
Comment by dimir [ 2012 Dec 14 ] |
I have attached the patch zabbix-2.0.5rc1-snmp-timeout.patch that should enable server config "Timeout" parameter for snmp checks. Please check if that works for you. |
Comment by dimir [ 2012 Dec 18 ] |
Eric, could you please try the attached patch (zabbix-2.0.5rc1-snmp-timeout.patch)? It should enable your server config parameter "Timeout" for snmp items. |
Comment by Eric Gearhart [ 2012 Dec 21 ] |
dmir - I am on holiday vacation right now, but when I get back home on the 27th I will apply the patch against a clean Zabbix 2.0.5 source tree and see if it makes a difference |
Comment by dimir [ 2012 Dec 21 ] |
Thanks a lot, that'd be great. |
Comment by dimir [ 2012 Dec 28 ] |
We'll fix it only for 2.2 . |
Comment by dimir [ 2012 Dec 28 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4393 . |
Comment by Alexander Vladishev [ 2013 Jan 04 ] |
Great! Successfully tested. |
Comment by dimir [ 2013 Jan 04 ] |
Fixed in pre-2.1.0 r32463. |
Comment by dimir [ 2013 Jan 04 ] |
Reopen to fix comment. |
Comment by Oleksii Zagorskyi [ 2013 Jan 04 ] |
Added a dirty note here https://www.zabbix.com/documentation/2.2/manual/introduction/whatsnew220#miscellaneous_daemon_improvements |
Comment by Oleksii Zagorskyi [ 2013 Mar 15 ] |
Heya, in this fix we also got another change which we HAVE to know: What it means ? See a SNMPv3 check with incorrect credentials: No. Time Source Destination Protocol Info 2277 23:27:29.634748 10.20.0.32 10.20.0.6 SNMP get-request 2278 23:27:29.639042 10.20.0.6 10.20.0.32 SNMP report 1.3.6.1.6.3.15.1.1.4.0 2279 23:27:29.639352 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 2280 23:27:29.645606 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 2300 23:27:30.640517 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 2301 23:27:30.646411 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 2334 23:27:31.641693 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 2335 23:27:31.648537 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 2356 23:27:32.641925 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 2357 23:27:32.647837 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 2367 23:27:33.642349 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 2368 23:27:33.648242 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 2383 23:27:34.643528 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 2384 23:27:34.649399 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 3939 23:27:50.593321 10.20.0.32 10.20.0.6 SNMP get-request 3940 23:27:50.597571 10.20.0.6 10.20.0.32 SNMP report 1.3.6.1.6.3.15.1.1.4.0 3941 23:27:50.597889 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 3942 23:27:50.604240 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 4019 23:27:51.598891 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 4020 23:27:51.607692 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 4154 23:27:52.600020 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 4155 23:27:52.605958 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 4233 23:27:53.600172 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 4236 23:27:53.606036 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 4388 23:27:54.601381 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 4389 23:27:54.607274 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 4432 23:27:55.601589 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 4433 23:27:55.607517 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 6600 23:28:11.670657 10.20.0.32 10.20.0.6 SNMP get-request 6601 23:28:11.674939 10.20.0.6 10.20.0.32 SNMP report 1.3.6.1.6.3.15.1.1.4.0 6602 23:28:11.675129 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 6603 23:28:11.681417 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 6672 23:28:12.676035 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 6673 23:28:12.682392 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 6782 23:28:13.676372 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 6784 23:28:13.682304 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 6967 23:28:14.676594 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 6978 23:28:14.765060 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 7100 23:28:15.677785 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 7101 23:28:15.684098 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 7167 23:28:16.679028 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 7168 23:28:16.695648 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 8948 23:28:32.767679 10.20.0.32 10.20.0.6 SNMP get-request 8949 23:28:32.771994 10.20.0.6 10.20.0.32 SNMP report 1.3.6.1.6.3.15.1.1.4.0 8950 23:28:32.772182 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 8951 23:28:32.778545 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 9018 23:28:33.773320 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 9019 23:28:33.779215 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 9080 23:28:34.774416 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 9081 23:28:34.780316 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 9365 23:28:35.774602 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 9366 23:28:35.780547 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 9490 23:28:36.775784 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 9491 23:28:36.781783 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 9521 23:28:37.776962 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 9522 23:28:37.782906 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown trunk (2.2): No. Time Source Destination Protocol Info 7992 22:38:05.358694 10.20.0.32 10.20.0.6 SNMP get-request 7993 22:38:05.362968 10.20.0.6 10.20.0.32 SNMP report 1.3.6.1.6.3.15.1.1.4.0 7994 22:38:05.363219 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 7995 22:38:05.369622 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 9351 22:38:23.699214 10.20.0.32 10.20.0.6 SNMP get-request 9352 22:38:23.703510 10.20.0.6 10.20.0.32 SNMP report 1.3.6.1.6.3.15.1.1.4.0 9353 22:38:23.703805 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 9355 22:38:23.709979 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 12026 22:38:42.767192 10.20.0.32 10.20.0.6 SNMP get-request 12027 22:38:42.771789 10.20.0.6 10.20.0.32 SNMP report 1.3.6.1.6.3.15.1.1.4.0 12028 22:38:42.771932 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 12029 22:38:42.778254 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown 13801 22:39:00.847818 10.20.0.32 10.20.0.6 SNMP get-request 13802 22:39:00.852116 10.20.0.6 10.20.0.32 SNMP report 1.3.6.1.6.3.15.1.1.4.0 13803 22:39:00.852310 10.20.0.32 10.20.0.6 SNMP encryptedPDU: privKey Unknown 13804 22:39:00.858567 10.20.0.6 10.20.0.32 SNMP encryptedPDU: privKey Unknown As you see previously we had 5 retries after first basic request. In 2.2 we don't have retries - we have only one request but with 3 seconds timeout (zabbix server default). Conclusion - previously zabbix poller spent 6 seconds (as minimum) for every check of unavailable snmp agent, now it will spend only 3 seconds. dimir, btw, does this comment sound correct? |
Comment by dimir [ 2013 Mar 15 ] |
Yeah, it should be "microseconds". Fixed in r34391 . I have added a comments to the code: session.retries = 0; /* number of retries if first attempt fails (default = 5) */ session.timeout = CONFIG_TIMEOUT * 1000 * 1000; /* timeout of one attempt in microseconds (default = 1 second) */ So before after failed first attempt it did 5 additional attempts with 1 second interval which resulted in 6 seconds you mention. After this fix it only performs one attempt with timeout of "Timeout" seconds specified in zabbix server configuration file. I guess we could instead fix it by leaving timeout to default (1 second) and setting "Timeout" value as number of retries. I'm not sure which is the better approach here. |
Comment by Alexey Pustovalov [ 2013 Mar 15 ] |
I suppose that way will be preferred, because SNMP is UDP. So I think number of retries should be more than one, because if retries are zero (only one request), the packet can be dropped/missed. zalex_ua I don't like 5 retries at all. Why then we don't use 5 retries for zabbix agent checks (TCP based) during attempt to establish TCP session or in case when agent doesn't respond during 3 seconds (default)? Note that zabbix server is not a "commandline" when you need to perform additional attempts if fist attempt was for example timeouted. Why we have to handle retries at two levels together (libnetsnmp=5 & zabbix_server=15seconds) ? I think we not considered this previously because we actually didn't know about the 5 libnetsnmp retries. I remember I investigated network discovery using a snmp check, I'm quoting myself:
it was a FreeBSD host and I'm not sure why there is 5 seconds but not 6 seconds. Is this good ? If you will have serious problems in network (routers, switches) and will have some traffic loss then probably 5 retries during 5 seconds will help not so mush (again remember about zabbix_server's retries). At the end about my example with SNMPv3 - I absolutely don't like that libnetsnmp is killing my SNMP monitored devise with 5 extra requests with incorrect credentials. Why do I need them ? This issue should be closed again with optionally created another ZBXNEXT to provide possibility to configure retries at user level. Thanks <richlv> split out as zalex_ua |
Comment by richlv [ 2013 Mar 15 ] |
zalex, thanks for digging this up - that change could help with performance in some cases, too. |
Comment by Oleksii Zagorskyi [ 2013 Mar 15 ] |
(1) In any case we need additionally update documentation. https://www.zabbix.com/documentation/2.2/manual/introduction/whatsnew220 just let's wait closing this issue to know final resolution <richlv> and note on changed retry count in https://www.zabbix.com/documentation/2.2/manual/installation/upgrade_notes_220 zalex_ua All mentioned pages updated. Please review. dimir CLOSED |
Comment by Pavel Timofeev [ 2013 Mar 15 ] |
I'm sorry, I have small question about comments in patch which say about default number of retries and default timeout of one attempt. It's default value of what? Zabbix_server or libnetsnmp? I think you need mention about it in that comments. |
Comment by Oleksii Zagorskyi [ 2013 Mar 15 ] |
Pavel, they are defaults for libnetsnmp. |
Comment by dimir [ 2013 Mar 18 ] |
Comments fixed in r34432: session.retries = 0; /* number of retries after failed attempt */ /* (net-snmp default = 5) */ session.timeout = CONFIG_TIMEOUT * 1000 * 1000; /* timeout of one attempt in microseconds */ /* (net-snmp default = 1 second) */ |
Comment by dimir [ 2013 May 16 ] |
Thanks, closing the issue. |
Comment by Oleksii Zagorskyi [ 2014 Jul 28 ] |
In |