Uname -a: Linux zabbix5-net-proxy2.doit.missouri.edu 2.6.32-504.1.3.el6.x86_64 #1 SMP Fri Oct 31 11:37:10 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
Memory: 4 GB real, 2.5 GB swap.
Zabbix proxy package: zabbix-proxy-2.4.1-1.el6.x86_64
500 SNMP pollers
We have a number of network switches assigned to proxies where each switch has 1000-4000 Items, based on the number of physical ports on the switch. In one test case, the switch has 1800 Items.
We saw that the queues for the proxies were getting quite long, with tens of thousands of Items over 10 minutes old. We looked in the log and found lots of these messages:
Nov 26 07:42:32 zabbix5-net-proxy2 zabbix_proxy: SNMP agent item "mib-2.ifoutdiscards.["10115"]" on host "c2960s202-AlphaEpsilonPi-1" failed: first network error, wait for 1 seconds
SNMP tests from the command line and packet captures confirmed that SNMP queries are working. There are no access issues and the OIDs queried do exist. We see this same Host appear in the logs throughout the day, but the specific OID listed in the error changes.
When bulk requests were disabled on the host, the log messages went away.
I performed a packet capture on another monitored host that had similar log messages and where bulk requests were left enabled. I noticed that the SNMP queries sent did not actually request more than one OID in any packet. So "bulk requests" were enabled, but not actually being sent.
Another difference is that with bulk requests enabled, all of the SNMP requests to the Host were sent at the same time. With bulk requests disabled, the SNMP requests for the various Items were spread out over the entire polling period. (Most of the SNMP Items are queried every 300 seconds for all Hosts.)
ZBX-8538 allow single retry on libnetsnmp level - it will give positive effect