[ZBX-4393] Zabbix_server ignores Timeout config option for snmp checks Created: 2011 Nov 25  Updated: 2022 Oct 08  Resolved: 2013 May 16

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 1.8.8
Fix Version/s: 2.1.0

Type: Incident report Priority: Major
Reporter: Pavel Timofeev Assignee: Unassigned
Resolution: Fixed Votes: 2
Labels: snmp, timeout
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

FreeBSD 8.2 RELEASE amd64, zabbix-server-1.8.8, net-snmp-5.7


Attachments: Text File poller_timeout.patch     Text File poller_timeout.patch     Text File poller_timeout.patch     File zabbix-2.0.5rc1-snmp-timeout.patch    
Issue Links:
Duplicate
is duplicated by ZBX-4847 Low level discovery items that are au... Closed
is duplicated by ZBX-5554 SNMP Source Port Closed Before Timeou... Closed

 Description   

Looks like zabbix_server doesn't listen to Timeout config option for snmp checks on FreeBSD (Don't have Linux, can't check).

Now I have Timeout=30, but I see zabbix log messages like:
1749:20111125:172042.270 SNMP Host [spb9124-F1S2]: first network error, wait for 15 seconds
1749:20111125:172043.271 SNMP Host [spb9124-F2S2]: first network error, wait for 15 seconds
1754:20111125:172058.267 SNMP Host [spb9124-F1S2]: another network error, wait for 15 seconds

Ok, let's do debug = 4.
...
47693:20111123:111039.182 In get_values()
47693:20111123:111039.182 In DCconfig_get_poller_items() poller_type:0
47693:20111123:111039.182 End of DCconfig_get_poller_items():1
47693:20111123:111039.182 In substitute_simple_macros() data:'snmp.cpu'
47693:20111123:111039.182 In substitute_simple_macros() data:'iowa'
47693:20111123:111039.182 In substitute_simple_macros() data:'CISCO-SYSTEM-EXT-MIB::cseSysCPUUtilization.0'
47693:20111123:111039.182 In get_value() key:'snmp.cpu'
47693:20111123:111039.182 In get_value_snmp() key:'snmp.cpu' oid:'CISCO-SYSTEM-EXT-MIB::cseSysCPUUtilization.0'
47693:20111123:111039.182 In snmp_open_session()
47693:20111123:111039.182 SNMP [[email protected]:161]
47693:20111123:111039.183 End of snmp_open_session()
47693:20111123:111039.183 Standard processing
47693:20111123:111039.183 In snmp_normalize(oid:CISCO-SYSTEM-EXT-MIB::cseSysCPUUtilization.0)
47693:20111123:111039.183 End of snmp_normalize():CISCO-SYSTEM-EXT-MIB::cseSysCPUUtilization.0
47693:20111123:111039.183 In get_snmp(oid:CISCO-SYSTEM-EXT-MIB::cseSysCPUUtilization.0)
47695:20111123:111039.199 In get_values()
47695:20111123:111039.199 In DCconfig_get_poller_items() poller_type:0
47695:20111123:111039.199 End of DCconfig_get_poller_items():0
47695:20111123:111039.199 End of get_values()
47695:20111123:111039.199 poller #3 spent 0.000112 seconds while updating 0 values
47695:20111123:111039.199 In DCconfig_get_poller_nextcheck() poller_type:0
47695:20111123:111039.199 End of DCconfig_get_poller_nextcheck():1322032240
47695:20111123:111039.199 sleeping for 1 seconds
....
47693:20111123:111040.185 End of get_snmp():NETWORK_ERROR
47693:20111123:111040.185 In snmp_close_session()
47693:20111123:111040.185 End of snmp_close_session()
47693:20111123:111040.185 End of get_value_snmp():NETWORK_ERROR
47693:20111123:111040.185 Item [spb9124-F1S1:snmp.cpu] error: Timeout while connecting to [192.168.9.42:161]
47693:20111123:111040.185 In zabbix_log()
47693:20111123:111040.185 In DCconfig_get_items() hostid:0 key:'zabbix[log]'
47693:20111123:111040.185 End of DCconfig_get_items():0
47693:20111123:111040.185 End of zabbix_log()
47693:20111123:111040.186 End of get_value():NETWORK_ERROR
47693:20111123:111040.186 query [txnlev:1] [begin;]
47693:20111123:111040.186 query [txnlev:1] [update hosts set snmp_errors_from=1322032240,snmp_disable_until=1322032255 where hostid=10182]
47693:20111123:111040.186 query [txnlev:1] [commit;]
47693:20111123:111040.186 SNMP Host [spb9124-F1S1]: first network error, wait for 15 seconds
.....
About 1 second for timeout. Why?

[root@octans ~]# tcpdump -N -i bce0 host 192.168.9.42
....
17:25:39.675758 IP monitoring.59216 > spb9124-f1s1.snmp: C=iowa GetRequest(33) E:cisco.9.305.1.1.1.0
17:25:39.850199 IP monitoring > spb9124-f1s1: ICMP echo request, id 15879, seq 1280, length 76
17:25:39.850451 IP spb9124-f1s1 > monitoring: ICMP echo reply, id 15879, seq 1280, length 76
17:25:40.050191 IP monitoring > spb9124-f1s1: ICMP echo request, id 15879, seq 1280, length 76
17:25:40.050429 IP spb9124-f1s1 > monitoring: ICMP echo reply, id 15879, seq 1280, length 76
17:25:40.251189 IP monitoring > spb9124-f1s1: ICMP echo request, id 15879, seq 1280, length 76
17:25:40.251428 IP spb9124-f1s1 > monitoring: ICMP echo reply, id 15879, seq 1280, length 76
17:25:40.452795 IP monitoring > spb9124-f1s1: ICMP echo request, id 15879, seq 1280, length 76
17:25:40.453040 IP spb9124-f1s1 > monitoring: ICMP echo reply, id 15879, seq 1280, length 76
17:25:40.651187 IP monitoring > spb9124-f1s1: ICMP echo request, id 15879, seq 1280, length 76
17:25:40.651425 IP spb9124-f1s1 > monitoring: ICMP echo reply, id 15879, seq 1280, length 76
17:25:40.692946 IP spb9124-f1s1.snmp > monitoring.59216: C=iowa GetResponse(34) E:cisco.9.305.1.1.1.0=0
17:25:40.692963 IP monitoring > spb9124-f1s1: ICMP monitoring udp port 59216 unreachable, length 36
....

About 1 second for waiting responce again. 1 second is default timeout parameter for net-snmp.

So let's try net-snmp's snmpget - same tcpdump output.
And this
[root@octans ~]# time snmpget -c iowa -v1 192.168.9.42 1.3.6.1.4.1.9.9.305.1.1.1.0
Timeout: No Response from 192.168.9.42.
1.02s real 0.01s user 0.00s sys

But if I set -t 30 all works good every time.
[root@octans ~]# time snmpget -t 30 -c iowa -v1 192.168.9.42 1.3.6.1.4.1.9.9.305.1.1.1.0
SNMPv2-SMI::enterprises.9.9.305.1.1.1.0 = Gauge32: 0
1.60s real 0.02s user 0.00s sys

P.S.
[root@octans /usr/ports/net-mgmt/zabbix-server]# make showconfig
===> The following configuration options are available for zabbix-server-1.8.8,2:
MYSQL=on "Use MySQL backend"
PGSQL=off "Use PostgreSQL backend"
SQLITE=off "Use SQLite backend"
IPV6=on "Support for IPv6"
FPING=on "Use fping for pinging hosts"
JABBER=on "Support for jabber media type"
CURL=on "Support web monitoring with cURL"
LDAP=off "Support for checking LDAP servers"
IPMI=on "Support for IPMI"
SSH=on "Support for SSH-based checks"
IODBC=off "Support for iODBC"
UNIXODBC=off "Support for unixODBC"
===> Use 'make config' to modify these settings

[root@octans /usr/ports/net-mgmt/net-snmp]# make showconfig
===> The following configuration options are available for net-snmp-5.7_5:
IPV6=on "Build with IPv6 support"
MFD_REWRITES=off "Build with 64-bit Interface Counters"
PERL=on "Install additional perl modules"
PERL_EMBEDDED=on "Build embedded perl"
DUMMY=on "Enable dummy values as placeholders"
TKMIB=off "Install graphical MIB browser"
DMALLOC=off "Enable dmalloc debug memory allocator"
UNPRIVILEGED=off "Allow unprivileged users to execute net-snmp"
===> Use 'make config' to modify these settings



 Comments   
Comment by richlv [ 2011 Nov 25 ]

there was an issue about snmp timeout fixing - ZBX-693

Comment by Pavel Timofeev [ 2011 Nov 28 ]

cat /usr/local/etc/snmp/snmp.conf
timeout 30

seems like it can help as workaround.

Comment by Eric Gearhart [ 2012 Apr 10 ]

Just wanted to quickly mention here the token 'timeout' is apparently not recognized by net-snmp... if I set 'timeout 30' in /etc/snmp/snmp.conf I get "Warning: Unknown token timeout" errors in zabbix_server.log

I have no idea how to set net-snmp's default client timeout... I've been all over a bunch of man pages and I've googled, and all I see online are references to how to set session timeout via the net-snmp API (which would be bundled into the poller_snmp source in Zabbix

Comment by Pavel Timofeev [ 2012 Apr 11 ]

it works for me
FreeBSD 9.0 RELEASE amd64
net-snmp-5.7.1_6
zabbix-(server|agent|frontend)-1.8.10_1,2

I advice you to see `man snmp.conf`

Comment by Oleksii Zagorskyi [ 2012 Apr 11 ]

my FreeBSD 8.1 box:
net-snmp-5.7_1

  1. man snmp.conf
    ...
    retries INTEGER
    Specifies the number of retries to be used in the requests.

timeout INTEGER
Specifies the timeout in seconds between retries.
...
V5.7 21 Apr 2010 SNMP.CONF(5)

my Debian 6.0.4 box:
ii libsnmp-base 5.4.3~dfsg-2 SNMP (Simple Network Management Protocol) MIBs and documentation
ii libsnmp-dev 5.4.3~dfsg-2 SNMP (Simple Network Management Protocol) development files
ii libsnmp-perl 5.4.3~dfsg-2 SNMP (Simple Network Management Protocol) Perl5 support
ii libsnmp15 5.4.3~dfsg-2 SNMP (Simple Network Management Protocol) library
ii snmp 5.4.3~dfsg-2

  1. man snmp.conf
    "timeout" missing

CHANGELOG of net-snmp for 5.7:

  • add snmp.conf tokens for timeouts and retries
Comment by Eric Gearhart [ 2012 Apr 11 ]

Good catch Oleksiy - I can try to find or build some net-snmp 5.7 RPMs for CentOS and rebuild Zabbix against them, and see if 'timeout' in snmp.conf works (it should work in theory)

Comment by Eric Gearhart [ 2012 Apr 21 ]

Just to follow up on this one, I "rolled my own" net-snmp packages and rebuilt Zabbix against them, and the timout problems seems to persist.

Comment by Jiann-Ming Su [ 2012 Sep 09 ]

Seems like a problem with snmp_synch_response() in net-snmp. I cooked up the attached poller_timeout.patch for 2.0.2. Ugly, but seems to work. It cleared up my queue backlog and no more "network error" log messages.

Comment by Jiann-Ming Su [ 2012 Sep 12 ]

Though, my patch may behave weird if an unknown or invalid OID is passed in...

Comment by Eric Gearhart [ 2012 Sep 20 ]

YOUR PATCH JUST COMPLETELY FIXED THE TIMEOUT PROBLEM THAT HAS BEEN PLAGUING ME FOR MONTHS!

Many thanks. I just applied your patch, and the SNMPv3 based queue backups that I have been experiencing on and off for months (ever since I was on Zabbix 2.0 pre versions!) is completely gone)

Can we please, PLEASE get this patch into 2.0.3

Comment by Jiann-Ming Su [ 2012 Sep 22 ]

I'm testing another patch that may behave better. The one I have may get caught up in a loop. I'm letting this other patch run over the weekend to see if it works better. I still think the problem is related to net-snmp.

Comment by Jiann-Ming Su [ 2012 Sep 25 ]

New poller timeout patch dated 2012-09-22. Found it easier not to make additional alarm() calls and simply put a loop counter. The number of "tries" is completely arbitrary. The loop counter behaves better than calling alarm() multiple times as doing so could cause an infinite loop condition.

Comment by Oleksii Zagorskyi [ 2012 Dec 07 ]

Jiann, has your patch some dependency of libsnmp version ?

Comment by Jiann-Ming Su [ 2012 Dec 07 ]

No dependency on libsnmp. My patch is really an ugly hack around some misbehaving versions of libsnmp. I'm convinced it's libsnmp that's prematurely closing connections before the timeout. The proper solution may be to find a libsnmp version that works properly and make zabbix dependent on that version.

Comment by dimir [ 2012 Dec 14 ]

The net-snmp's structure snmp_session has fields "timeout" and "retries" which we are not using in Zabbix code:

http://www.net-snmp.org/dev/agent/structsnmp__session.html

I think it's worth to try setting the timeout when creating snmp session in checks_snmp.c:snmp_open_session().

On my Debian the net-snmp version is 5.4.3~dfsg-2 and the structure has both "timeout" and "retries".

Comment by dimir [ 2012 Dec 14 ]

I have attached the patch

zabbix-2.0.5rc1-snmp-timeout.patch

that should enable server config "Timeout" parameter for snmp checks. Please check if that works for you.

Comment by dimir [ 2012 Dec 18 ]

Eric, could you please try the attached patch (zabbix-2.0.5rc1-snmp-timeout.patch)? It should enable your server config parameter "Timeout" for snmp items.

Comment by Eric Gearhart [ 2012 Dec 21 ]

dmir - I am on holiday vacation right now, but when I get back home on the 27th I will apply the patch against a clean Zabbix 2.0.5 source tree and see if it makes a difference

Comment by dimir [ 2012 Dec 21 ]

Thanks a lot, that'd be great.

Comment by dimir [ 2012 Dec 28 ]

We'll fix it only for 2.2 .

Comment by dimir [ 2012 Dec 28 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-4393 .

Comment by Alexander Vladishev [ 2013 Jan 04 ]

Great! Successfully tested.

Comment by dimir [ 2013 Jan 04 ]

Fixed in pre-2.1.0 r32463.

Comment by dimir [ 2013 Jan 04 ]

Reopen to fix comment.

Comment by Oleksii Zagorskyi [ 2013 Jan 04 ]

Added a dirty note here https://www.zabbix.com/documentation/2.2/manual/introduction/whatsnew220#miscellaneous_daemon_improvements

Comment by Oleksii Zagorskyi [ 2013 Mar 15 ]

Heya, in this fix we also got another change which we HAVE to know:
added a line: "session.retries = 0;" to checks_snmp.c

What it means ? See a SNMPv3 check with incorrect credentials:
2.0 branch:

No.     Time            Source                Destination           Protocol Info
   2277 23:27:29.634748 10.20.0.32            10.20.0.6             SNMP     get-request
   2278 23:27:29.639042 10.20.0.6             10.20.0.32            SNMP     report 1.3.6.1.6.3.15.1.1.4.0
   2279 23:27:29.639352 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   2280 23:27:29.645606 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   2300 23:27:30.640517 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   2301 23:27:30.646411 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   2334 23:27:31.641693 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   2335 23:27:31.648537 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   2356 23:27:32.641925 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   2357 23:27:32.647837 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   2367 23:27:33.642349 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   2368 23:27:33.648242 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   2383 23:27:34.643528 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   2384 23:27:34.649399 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   3939 23:27:50.593321 10.20.0.32            10.20.0.6             SNMP     get-request
   3940 23:27:50.597571 10.20.0.6             10.20.0.32            SNMP     report 1.3.6.1.6.3.15.1.1.4.0
   3941 23:27:50.597889 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   3942 23:27:50.604240 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   4019 23:27:51.598891 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   4020 23:27:51.607692 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   4154 23:27:52.600020 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   4155 23:27:52.605958 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   4233 23:27:53.600172 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   4236 23:27:53.606036 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   4388 23:27:54.601381 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   4389 23:27:54.607274 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   4432 23:27:55.601589 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   4433 23:27:55.607517 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   6600 23:28:11.670657 10.20.0.32            10.20.0.6             SNMP     get-request
   6601 23:28:11.674939 10.20.0.6             10.20.0.32            SNMP     report 1.3.6.1.6.3.15.1.1.4.0
   6602 23:28:11.675129 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   6603 23:28:11.681417 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   6672 23:28:12.676035 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   6673 23:28:12.682392 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   6782 23:28:13.676372 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   6784 23:28:13.682304 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   6967 23:28:14.676594 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   6978 23:28:14.765060 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   7100 23:28:15.677785 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   7101 23:28:15.684098 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   7167 23:28:16.679028 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   7168 23:28:16.695648 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   8948 23:28:32.767679 10.20.0.32            10.20.0.6             SNMP     get-request
   8949 23:28:32.771994 10.20.0.6             10.20.0.32            SNMP     report 1.3.6.1.6.3.15.1.1.4.0
   8950 23:28:32.772182 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   8951 23:28:32.778545 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   9018 23:28:33.773320 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   9019 23:28:33.779215 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   9080 23:28:34.774416 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   9081 23:28:34.780316 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   9365 23:28:35.774602 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   9366 23:28:35.780547 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   9490 23:28:36.775784 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   9491 23:28:36.781783 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   9521 23:28:37.776962 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   9522 23:28:37.782906 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown

trunk (2.2):

No.     Time            Source                Destination           Protocol Info
   7992 22:38:05.358694 10.20.0.32            10.20.0.6             SNMP     get-request
   7993 22:38:05.362968 10.20.0.6             10.20.0.32            SNMP     report 1.3.6.1.6.3.15.1.1.4.0
   7994 22:38:05.363219 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   7995 22:38:05.369622 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
   9351 22:38:23.699214 10.20.0.32            10.20.0.6             SNMP     get-request
   9352 22:38:23.703510 10.20.0.6             10.20.0.32            SNMP     report 1.3.6.1.6.3.15.1.1.4.0
   9353 22:38:23.703805 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
   9355 22:38:23.709979 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
  12026 22:38:42.767192 10.20.0.32            10.20.0.6             SNMP     get-request
  12027 22:38:42.771789 10.20.0.6             10.20.0.32            SNMP     report 1.3.6.1.6.3.15.1.1.4.0
  12028 22:38:42.771932 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
  12029 22:38:42.778254 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown
  13801 22:39:00.847818 10.20.0.32            10.20.0.6             SNMP     get-request
  13802 22:39:00.852116 10.20.0.6             10.20.0.32            SNMP     report 1.3.6.1.6.3.15.1.1.4.0
  13803 22:39:00.852310 10.20.0.32            10.20.0.6             SNMP     encryptedPDU: privKey Unknown
  13804 22:39:00.858567 10.20.0.6             10.20.0.32            SNMP     encryptedPDU: privKey Unknown

As you see previously we had 5 retries after first basic request. In 2.2 we don't have retries - we have only one request but with 3 seconds timeout (zabbix server default).
It looks related and opposite to ZBXNEXT-1096

Conclusion - previously zabbix poller spent 6 seconds (as minimum) for every check of unavailable snmp agent, now it will spend only 3 seconds.
I consider as an additional improvement.

dimir, btw, does this comment sound correct?
session.timeout = CONFIG_TIMEOUT * 1000 * 1000; /* milliseconds */
maybe "microseconds" actually ?

Comment by dimir [ 2013 Mar 15 ]

Yeah, it should be "microseconds". Fixed in r34391 . I have added a comments to the code:

session.retries = 0;                            /* number of retries if first attempt fails (default = 5)  */
session.timeout = CONFIG_TIMEOUT * 1000 * 1000; /* timeout of one attempt in microseconds (default = 1 second) */

So before after failed first attempt it did 5 additional attempts with 1 second interval which resulted in 6 seconds you mention. After this fix it only performs one attempt with timeout of "Timeout" seconds specified in zabbix server configuration file.

I guess we could instead fix it by leaving timeout to default (1 second) and setting "Timeout" value as number of retries. I'm not sure which is the better approach here.

Comment by Alexey Pustovalov [ 2013 Mar 15 ]

I guess we could instead fix it by leaving timeout to default (1 second) and setting "Timeout" value as number of retries. I'm not sure which is the better approach here.

I suppose that way will be preferred, because SNMP is UDP. So I think number of retries should be more than one, because if retries are zero (only one request), the packet can be dropped/missed.

zalex_ua I don't like 5 retries at all. Why then we don't use 5 retries for zabbix agent checks (TCP based) during attempt to establish TCP session or in case when agent doesn't respond during 3 seconds (default)?
Say me please why ?

Note that zabbix server is not a "commandline" when you need to perform additional attempts if fist attempt was for example timeouted.
Zabbix server will perform additional attempts after 15 seconds (zabbix_server.conf) if it got timeout on previous attempt.

Why we have to handle retries at two levels together (libnetsnmp=5 & zabbix_server=15seconds) ?
We have to turn OFF retries at library level as dimir already did it.

I think we not considered this previously because we actually didn't know about the 5 libnetsnmp retries.

I remember I investigated network discovery using a snmp check, I'm quoting myself:

...
Теперь проверим СНМП
39007:20110410:015334.836 process_rule() IP:'10.20.0.154'
39007:20110410:015334.840 End of snmp_normalize():.1.3.6.1.2.1.1.5.0
39007:20110410:015334.840 In get_snmp(oid:.1.3.6.1.2.1.1.5.0)
...
39007:20110410:015339.998 End of get_value_snmp():NETWORK_ERROR
39007:20110410:015339.998 Discovery: Item [.1.3.6.1.2.1.1.5.0] error: Timeout while connecting to [10.20.0.154:161]
еще раc
39007:20110410:015340.000 End of snmp_normalize():.1.3.6.1.2.1.1.5.0
39007:20110410:015340.000 In get_snmp(oid:.1.3.6.1.2.1.1.5.0)
...
39007:20110410:015345.169 End of get_value_snmp():NETWORK_ERROR
39007:20110410:015345.170 Discovery: Item [.1.3.6.1.2.1.1.5.0] error: Timeout while connecting to [10.20.0.155:161]
как видим, тут глобальный таймаут (30 сек) не влияет. Похоже зашито в коде 5 сек фиксированно.

it was a FreeBSD host and I'm not sure why there is 5 seconds but not 6 seconds.

Is this good ?
Why we need perform retries for network discovery ?
I'm sure - we don't need them there. It becomes unclear why it works so.

If you will have serious problems in network (routers, switches) and will have some traffic loss then probably 5 retries during 5 seconds will help not so mush (again remember about zabbix_server's retries).

At the end about my example with SNMPv3 - I absolutely don't like that libnetsnmp is killing my SNMP monitored devise with 5 extra requests with incorrect credentials. Why do I need them ?

This issue should be closed again with optionally created another ZBXNEXT to provide possibility to configure retries at user level.

Thanks

<richlv> split out as ZBXNEXT-1668

zalex_ua
Discussion finished ?
I believe yes, so CLOSED.
CLOSED

Comment by richlv [ 2013 Mar 15 ]

zalex, thanks for digging this up - that change could help with performance in some cases, too.
i'd object to making timeout parameter affect snmp retries - if we want it to be configurable, it should be a separate param

Comment by Oleksii Zagorskyi [ 2013 Mar 15 ]

(1) In any case we need additionally update documentation.

https://www.zabbix.com/documentation/2.2/manual/introduction/whatsnew220
and here as well:
https://www.zabbix.com/documentation/2.2/manual/config/items/itemtypes/snmp

just let's wait closing this issue to know final resolution

<richlv> and note on changed retry count in https://www.zabbix.com/documentation/2.2/manual/installation/upgrade_notes_220

zalex_ua All mentioned pages updated. Please review.
RESOLVED

dimir CLOSED

Comment by Pavel Timofeev [ 2013 Mar 15 ]

I'm sorry, I have small question about comments in patch which say about default number of retries and default timeout of one attempt. It's default value of what? Zabbix_server or libnetsnmp? I think you need mention about it in that comments.

Comment by Oleksii Zagorskyi [ 2013 Mar 15 ]

Pavel, they are defaults for libnetsnmp.
Indeed would be good to clarify - I had the same thoughts

Comment by dimir [ 2013 Mar 18 ]

Comments fixed in r34432:

session.retries = 0;                            /* number of retries after failed attempt */
                                                /* (net-snmp default = 5) */
session.timeout = CONFIG_TIMEOUT * 1000 * 1000; /* timeout of one attempt in microseconds */
                                                /* (net-snmp default = 1 second) */
Comment by dimir [ 2013 May 16 ]

Thanks, closing the issue.

Comment by Oleksii Zagorskyi [ 2014 Jul 28 ]

In ZBX-8538 is a suggestion to allow single retry.

Generated at Fri Apr 19 11:52:05 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.