[ZBX-5028] Unusable SNMPv3 performance Created: 2012 May 21  Updated: 2017 May 30  Resolved: 2013 Mar 20

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.0.0rc6
Fix Version/s: None

Type: Incident report Priority: Critical
Reporter: Marc Herren Assignee: Unassigned
Resolution: Duplicate Votes: 1
Labels: performance, snmpv3
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: JPEG File snmpv3_cpu.jpg     JPEG File snmpv3_data_gathering.jpg     JPEG File snmpv3_queue.jpg    
Issue Links:
Duplicate
duplicates ZBX-2152 multiple SNMPv3 checks get unexpected... Closed

 Description   

I'm currently monitoring 150 network devices (Brocade switches/routers) with only some key information (uplinks, cpu, temperature) Overall I have ~3500 SNMP items and new values per second is around 35. So everything seem to be ok.

Using SNMPv3 on all checks results in a lot of network errors in the server log. Ruffly I have every second a entry like that:

31229:20120521:104212.070 SNMP item [.1.3.6.1.2.1.2.2.1.16.65] on host [SDEGFE17] failed: first network error, wait for 15 seconds
31250:20120521:104212.270 resuming SNMP checks on host [PTEGFE08]: connection restored
31231:20120521:104213.114 SNMP item [.1.3.6.1.2.1.2.2.1.16.66] on host [SDEGFE17] failed: another network error, wait for 15 seconds
31249:20120521:104218.291 resuming SNMP checks on host [FAEGFE06]: connection restored
31230:20120521:104220.432 SNMP item [.1.3.6.1.2.1.2.2.1.16.65] on host [PTEGFE09] failed: first network error, wait for 15 seconds
31232:20120521:104222.501 SNMP item [.1.3.6.1.2.1.2.2.1.20.65] on host [PTEGFE09] failed: another network error, wait for 15 seconds
31248:20120521:104222.516 SNMP item [.1.3.6.1.2.1.2.2.1.10.3] on host [TEEGFE02] failed: first network error, wait for 15 seconds
31238:20120521:104223.569 SNMP item [cpuUtil1Min] on host [STGEGFE01] failed: first network error, wait for 15 seconds
31249:20120521:104228.342 resuming SNMP checks on host [SDEGFE17]: connection restored

The update queue has also a lot of entries up to 5min waits.

Then I changed all my SNMP items to SNMPv2 and the errors are gone, not a single network error anymore! Update queue is also empty!

I did a quick test to be sure that the network device is not charged to much with those SNMPv3 settings but I could easely retrieve 100 values within seconds without a single error.

The zabbix server itself is also under no load at all



 Comments   
Comment by Marc Herren [ 2012 May 22 ]

I've raised the SNMP time out to 15 seconds without success.

Timeout=15

Server load is still very low but queue is filled with delayed request.

Comment by Alexander Vladishev [ 2012 Jun 12 ]

Can you please execute the following command and attach it output:

time snmpget -v3 -l ... ... SDEGFE17 1.3.6.1.2.1.2.2.1.16.65

Comment by Marc Herren [ 2012 Jun 12 ]

I did 3 consecutive request with snmpv3 and with snmp v2c for comparision:

ejpdxt4040:~ # time snmpwalk -v3 -u v3admin -A*** -X*** -l authPriv -O n 10.0.2.112 1.3.6.1.2.1.2.2.1.16.65
.1.3.6.1.2.1.2.2.1.16.65 = Counter32: 4105962327

real 0m0.054s
user 0m0.048s
sys 0m0.000s
ejpdxt4040:~ # time snmpwalk -v3 -u v3admin -A*** -X*** -l authPriv -O n 10.0.2.112 1.3.6.1.2.1.2.2.1.16.65
.1.3.6.1.2.1.2.2.1.16.65 = Counter32: 4105963562

real 0m0.057s
user 0m0.048s
sys 0m0.000s
ejpdxt4040:~ # time snmpwalk -v3 -u v3admin -A*** -X*** -l authPriv -O n 10.0.2.112 1.3.6.1.2.1.2.2.1.16.65
.1.3.6.1.2.1.2.2.1.16.65 = Counter32: 4105964219

real 0m0.052s
user 0m0.044s
sys 0m0.000s
ejpdxt4040:~ # time snmpwalk -v2c -c public 10.0.2.112 1.3.6.1.2.1.2.2.1.16.65
IF-MIB::ifOutOctets.65 = Counter32: 4105964858

real 0m0.033s
user 0m0.024s
sys 0m0.004s
ejpdxt4040:~ # time snmpwalk -v2c -c public 10.0.2.112 1.3.6.1.2.1.2.2.1.16.65
IF-MIB::ifOutOctets.65 = Counter32: 4105965754

real 0m0.046s
user 0m0.040s
sys 0m0.000s
ejpdxt4040:~ # time snmpwalk -v2c -c public 10.0.2.112 1.3.6.1.2.1.2.2.1.16.65
IF-MIB::ifOutOctets.65 = Counter32: 4105965954

real 0m0.033s
user 0m0.024s
sys 0m0.004s

Comment by Oleksii Zagorskyi [ 2013 Jan 09 ]

ZBXNEXT-98 asks to SNMP getbulk for OID retrieval

Comment by Eric Gearhart [ 2013 Feb 06 ]

Just FYI for the benefit of this bug, there is a known bug where Zabbix ignores the SNMP timeout in its SNMP poller code... Zabbix 2.2 should contain a patch to the poller code that fixes it.

See https://support.zabbix.com/browse/ZBX-4393 for lots and lots of detail

Comment by Oleksii Zagorskyi [ 2013 Feb 06 ]

The ZBX-4393 is not related to this issue.

I have some progress in debugging this one and will post more details soon when I'll summarize them.

Comment by Oleksii Zagorskyi [ 2013 Mar 18 ]

That's all about ZBX-2152
I'm closing this one as duplicate.

Generated at Thu Apr 25 02:21:23 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.