[ZBX-8096] incorrect processing of "error-status: noSuchName" response using SNMPv2 protocol for devices behaving like SNMPv1 Created: 2014 Apr 16 Updated: 2017 May 30 Resolved: 2016 Jun 06 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | 2.2.3 |
Fix Version/s: | 2.2.14rc1, 3.0.4rc1, 3.2.0alpha1 |
Type: | Incident report | Priority: | Major |
Reporter: | Cristian Mammoli | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | bulk, patch, snmp | ||
Σ Remaining Estimate: | Not Specified | Remaining Estimate: | Not Specified |
Σ Time Spent: | Not Specified | Time Spent: | Not Specified |
Σ Original Estimate: | Not Specified | Original Estimate: | Not Specified |
Environment: |
CentOS 6.5 x86_64, PostgreSQL 9.1. |
Attachments: |
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
||||||||||
Issue Links: |
|
||||||||||
Sub-Tasks: |
|
Description |
2016-05-18 ADDED by zabbix team (zalex): this issue description does not reflect the actual problem, see end of discussion in comments to know what is actually wrong. After upgrading to 2.2.3 I started noticing lots of the following errors in the console log: SNMP error: (noSuchName) There is no such variable name in this MIB on some devices (SNMPv1) Examples: 31420:20140416:091857.217 item [patton2291.apra.it:IF-MIB-ifInOctets.[8]] became supported The affected devices seem to be only: The items keep going not supported and back. I have a notification for unsupported items and I received more than 2500 mail last night only. |
Comments |
Comment by Aleksandrs Saveljevs [ 2014 Apr 16 ] |
It would be wonderful if you could attach a bit of DebugLevel=4 which would show what Zabbix receives from your devices and ideally some traffic dump which would show the packets themselves. |
Comment by Tobias Wigand [ 2014 Apr 16 ] |
You can at least add Lotus Notes running on Windows Server to the list. Since upgrading to 2.2.3 (Ubuntu via apt-get) we are missing a lot of data from our Lotus servers. I'm not an expert on SNMP but shouldn't this request use GetBulk instead of GetRequest? 172.xxx.xxx.33.37282 > 172.xxx.xxx.53.161: [bad udp cksum 0x5c2f -> 0xe6fa!] { SNMPv2c C=public { GetRequest(128) R=963109867 .1.3.6.1.4.1.334.72.1.1.4.34.0 .1.3.6.1.4.1.334.72.1.1.4.18.0 .1.3.6.1.4.1.334.72.1.1.4.4.0 .1.3.6.1.4.1.334.72.1.1.9.5.0 .1.3.6.1.4.1.334.72.1.1.4.6.0 .1.3.6.1.4.1.334.72.1.1.13.10.0 } } } Anyways, all of those OIDs except number 6 (.1.3.6.1.4.1.334.72.1.1.13.10.0) are working if I query them directly via snmpwalk. But Lotus/Windows seems to leave out all data from the other OIDs in the response. |
Comment by Cristian Mammoli [ 2014 Apr 16 ] |
Actually I downgraded to 2.2.2. As soon as I have some spare time I'll try to dig further. |
Comment by Tobias Wigand [ 2014 Apr 16 ] |
Tried again, I setup a new host just with a Windows SNMP template. Works fine. As soon as I add the Lotus Domino template, which has some OIDs that might not (yet) be present on that Lotus server, everything stops working. The problem with Lotus SNMP seems that the OIDs are empty until Lotus has something to fill in. I.e. if no dead mail has been found after a reboot the OID is simply not there. When there is dead mail the OID becomes available and filled correctly. So I think we might need a per host switch that allows us to disable bulk requests for that device, because in my case Windows refuses to give any data at all when there's NoSuchName in a single OID present in the request: Just the Windows template: 172.xxx.xxx.33.35786 > 172.xxx.xxx.53.161: [bad udp cksum 0x5d46 -> 0x1212!] { SNMPv2c C=public { GetRequest(405) R=866709499 .1.3.6.1.2.1.25.2.3.1.6.2 .1.3.6.1.2.1.2.2.1.8.14 .1.3.6.1.2.1.2.2.1.16.16 .1.3.6.1.2.1.25.2.3.1.6.6 .1.3.6.1.2.1.2.2.1.16.15 .1.3.6.1.2.1.2.2.1.8.15 .1.3.6.1.2.1.2.2.1.7.11 .1.3.6.1.2.1.2.2.1.16.11 .1.3.6.1.2.1.2.2.1.8.11 .1.3.6.1.2.1.25.2.3.1.6.3 .1.3.6.1.2.1.25.2.3.1.6.1 .1.3.6.1.2.1.25.2.3.1.6.5 .1.3.6.1.2.1.2.2.1.10.15 .1.3.6.1.2.1.2.2.1.7.15 .1.3.6.1.2.1.2.2.1.7.16 .1.3.6.1.2.1.2.2.1.10.11 .1.3.6.1.2.1.2.2.1.8.16 .1.3.6.1.2.1.25.2.3.1.6.4 .1.3.6.1.2.1.2.2.1.10.14 .1.3.6.1.2.1.2.2.1.10.16 .1.3.6.1.2.1.25.3.3.1.2.1 .1.3.6.1.2.1.2.2.1.7.14 .1.3.6.1.2.1.1.3.0 .1.3.6.1.2.1.2.2.1.16.14 }} 172.xxx.xxx.53.161 > 172.xxx.xxx.33.35786: [udp sum ok] { SNMPv2c C=public { GetResponse(468) R=866709499 .1.3.6.1.2.1.25.2.3.1.6.2=5522690 .1.3.6.1.2.1.2.2.1.8.14=1 .1.3.6.1.2.1.2.2.1.16.16=2160174782 .1.3.6.1.2.1.25.2.3.1.6.6=45586 .1.3.6.1.2.1.2.2.1.16.15=2160174782 .1.3.6.1.2.1.2.2.1.8.15=1 .1.3.6.1.2.1.2.2.1.7.11=1 .1.3.6.1.2.1.2.2.1.16.11=2160174782 .1.3.6.1.2.1.2.2.1.8.11=1 .1.3.6.1.2.1.25.2.3.1.6.3=6484432 .1.3.6.1.2.1.25.2.3.1.6.1=0 .1.3.6.1.2.1.25.2.3.1.6.5=66865 .1.3.6.1.2.1.2.2.1.10.15=228796428 .1.3.6.1.2.1.2.2.1.7.15=1 .1.3.6.1.2.1.2.2.1.7.16=1 .1.3.6.1.2.1.2.2.1.10.11=228796428 .1.3.6.1.2.1.2.2.1.8.16=1 .1.3.6.1.2.1.25.2.3.1.6.4=0 .1.3.6.1.2.1.2.2.1.10.14=228796428 .1.3.6.1.2.1.2.2.1.10.16=228796428 .1.3.6.1.2.1.25.3.3.1.2.1=4 .1.3.6.1.2.1.2.2.1.7.14=1 .1.3.6.1.2.1.1.3.0=131335255 .1.3.6.1.2.1.2.2.1.16.14=2160174782 }} Right after adding the lotus template: 172.xxx.xxx.33.47428 > 172.xxx.xxx.53.161: [bad udp cksum 0x5eaf -> 0x4714!] { SNMPv2c C=public { GetRequest(766) R=444667481 .1.3.6.1.2.1.25.2.3.1.6.2 .1.3.6.1.4.1.334.72.1.1.4.34.0 .1.3.6.1.2.1.2.2.1.8.11 .1.3.6.1.2.1.25.3.3.1.2.1 .1.3.6.1.2.1.2.2.1.10.11 .1.3.6.1.2.1.2.2.1.7.14 .1.3.6.1.4.1.334.72.1.1.4.11.0 .1.3.6.1.2.1.1.3.0 .1.3.6.1.2.1.2.2.1.16.14 .1.3.6.1.4.1.334.72.1.1.4.1.0 .1.3.6.1.4.1.334.72.1.1.5.2.0 .1.3.6.1.2.1.2.2.1.8.16 .1.3.6.1.4.1.334.72.1.1.4.31.0 .1.3.6.1.2.1.25.2.3.1.6.4 .1.3.6.1.2.1.2.2.1.10.14 .1.3.6.1.2.1.2.2.1.10.16 .1.3.6.1.2.1.2.2.1.7.16 .1.3.6.1.2.1.25.2.3.1.6.3 .1.3.6.1.4.1.334.72.1.1.4.9.0 .1.3.6.1.4.1.334.72.1.1.4.7.0 .1.3.6.1.4.1.334.72.1.1.5.1.0 .1.3.6.1.4.1.334.72.1.1.4.21.0 .1.3.6.1.4.1.334.72.1.1.4.3.0 .1.3.6.1.4.1.334.72.1.1.4.6.0 .1.3.6.1.2.1.2.2.1.10.15 .1.3.6.1.2.1.25.2.3.1.6.5 .1.3.6.1.2.1.2.2.1.7.15 .1.3.6.1.4.1.334.72.1.1.4.5.0 .1.3.6.1.2.1.25.2.3.1.6.1 .1.3.6.1.2.1.2.2.1.8.14 .1.3.6.1.4.1.334.72.1.1.5.5.0 .1.3.6.1.4.1.334.72.1.1.4.2.0 .1.3.6.1.4.1.334.72.1.1.5.4.0 .1.3.6.1.2.1.2.2.1.16.16 .1.3.6.1.2.1.2.2.1.16.15 .1.3.6.1.4.1.334.72.1.1.4.19.0 .1.3.6.1.2.1.2.2.1.16.11 .1.3.6.1.2.1.2.2.1.8.15 .1.3.6.1.4.1.334.72.1.1.4.18.0 .1.3.6.1.2.1.2.2.1.7.11 .1.3.6.1.4.1.334.72.1.1.4.4.0 .1.3.6.1.2.1.25.2.3.1.6.6 .1.3.6.1.4.1.334.72.1.1.5.3.0 }} 172.xxx.xxx.53.161 > 172.xxx.xxx.33.47428: [udp sum ok] { SNMPv2c C=public { GetResponse(766) R=444667481 noSuchName@11 .1.3.6.1.2.1.25.2.3.1.6.2= .1.3.6.1.4.1.334.72.1.1.4.34.0= .1.3.6.1.2.1.2.2.1.8.11= .1.3.6.1.2.1.25.3.3.1.2.1= .1.3.6.1.2.1.2.2.1.10.11= .1.3.6.1.2.1.2.2.1.7.14= .1.3.6.1.4.1.334.72.1.1.4.11.0= .1.3.6.1.2.1.1.3.0= .1.3.6.1.2.1.2.2.1.16.14= .1.3.6.1.4.1.334.72.1.1.4.1.0= .1.3.6.1.4.1.334.72.1.1.5.2.0= .1.3.6.1.2.1.2.2.1.8.16= .1.3.6.1.4.1.334.72.1.1.4.31.0= .1.3.6.1.2.1.25.2.3.1.6.4= .1.3.6.1.2.1.2.2.1.10.14= .1.3.6.1.2.1.2.2.1.10.16= .1.3.6.1.2.1.2.2.1.7.16= .1.3.6.1.2.1.25.2.3.1.6.3= .1.3.6.1.4.1.334.72.1.1.4.9.0= .1.3.6.1.4.1.334.72.1.1.4.7.0= .1.3.6.1.4.1.334.72.1.1.5.1.0= .1.3.6.1.4.1.334.72.1.1.4.21.0= .1.3.6.1.4.1.334.72.1.1.4.3.0= .1.3.6.1.4.1.334.72.1.1.4.6.0= .1.3.6.1.2.1.2.2.1.10.15= .1.3.6.1.2.1.25.2.3.1.6.5= .1.3.6.1.2.1.2.2.1.7.15= .1.3.6.1.4.1.334.72.1.1.4.5.0= .1.3.6.1.2.1.25.2.3.1.6.1= .1.3.6.1.2.1.2.2.1.8.14= .1.3.6.1.4.1.334.72.1.1.5.5.0= .1.3.6.1.4.1.334.72.1.1.4.2.0= .1.3.6.1.4.1.334.72.1.1.5.4.0= .1.3.6.1.2.1.2.2.1.16.16= .1.3.6.1.2.1.2.2.1.16.15= .1.3.6.1.4.1.334.72.1.1.4.19.0= .1.3.6.1.2.1.2.2.1.16.11= .1.3.6.1.2.1.2.2.1.8.15= .1.3.6.1.4.1.334.72.1.1.4.18.0= .1.3.6.1.2.1.2.2.1.7.11= .1.3.6.1.4.1.334.72.1.1.4.4.0= .1.3.6.1.2.1.25.2.3.1.6.6= .1.3.6.1.4.1.334.72.1.1.5.3.0= }} |
Comment by Cristian Mammoli [ 2014 Apr 16 ] |
Ok, upgraded again to 2.2.3 with debuglevel=4 a ~ 10 min log resulted in more than 400 mb of data (30MB gzipped) The pcap file has been created this way: tcpdump -s0 -n 'port 161 and (host sg300-1 or host patton2291)' -w zabbix223.pcap sg300-1 = 192.168.0.251 (Cisco SMB switch, SNMPv2) |
Comment by Cristian Mammoli [ 2014 Apr 16 ] |
tcpdump -s0 -n 'port 161 and (host sg300-1 or host patton2291)' -w zabbix223.pcap |
Comment by Aleksandrs Saveljevs [ 2014 Apr 16 ] |
Cristian, the traffic dump would be enough for now. Please give me some time to check it. |
Comment by Aleksandrs Saveljevs [ 2014 Apr 16 ] |
Our understanding of the expected device behavior in cases when the specified OID is not present on the device is summarized in one of the source code comments: ... else if (STAT_SUCCESS == status && SNMP_ERR_NOSUCHNAME == response->errstat && 0 != response->errindex && ITEM_TYPE_SNMPv1 == items[0].type) { /* If a request PDU contains a bad variable, the specified behavior is different between SNMPv1 and */ /* later versions. In SNMPv1, the whole PDU is rejected and "response->errindex" is set to indicate */ /* the bad variable. In SNMPv2 and later, the SNMP agent processes the PDU by filling values for the */ /* known variables and marking unknown variables individually in the variable binding list. So if we */ /* get this error with SNMPv1, we fix the PDU by removing the bad variable and retry the request. */ ... This seems to be confirmed by http://tools.ietf.org/html/rfc1157#section-4.1.2 (SNMPv1) and http://tools.ietf.org/html/rfc3416#section-4.2.1 (SNMPv2). Devices we have locally seem to conform to the described behavior. However, the behavior of Lotus described by Tobias is unexpected, where SNMPv2 agent behaves like SNMPv1. Whether that is standards conformant or not, Zabbix should be able to monitor such agents and it seems to be as simple as removing the check for SNMPv1 above. That should fix the Lotus SNMPv2 case. In Cristian's case, Zabbix seems to work properly. At least I have not noticed any irregularities in Zabbix operation based on traffic. Cristian, based on your log file and the attached traffic dump, could you please show an OID that you would expect not to become unsupported, but which actually did? |
Comment by Cristian Mammoli [ 2014 Apr 16 ] |
During the dump all this items went unsupported: [patton2291.apra.it:IF-MIB-ifInOctets.[1]] [sg300-1.apra.it:IF-MIB-ifInErrors.[54]] To obtain the oid just replace the slash after IF-MIB with ":", then snmptranslate Example Of course all this items work if I do a manual query with snmpget: snmpget -v 2c -c public sg300-1 IF-MIB::ifInErrors.54 snmpget -v 1 -c public patton2291 IF-MIB::ifSpeed.1 |
Comment by Tobias Wigand [ 2014 Apr 16 ] |
Many thanks, Aleksandrs! I took a look at the Lotus SNMP implementation guide and found out that it's daemon only does SNMPv1. Windows however does SNMPv2, acts as a proxy for the Lotus requests and I had assumed that it would somehow convert the Lotus SNMPv1 traffic while relaying it. That was wrong I guess. |
Comment by Aleksandrs Saveljevs [ 2014 Apr 16 ] |
Thank you, Tobias! We shall ignore the proxying issue for now then, but will keep it in mind. |
Comment by Aleksandrs Saveljevs [ 2014 Apr 16 ] |
Cristian, according to the traffic dump, Zabbix behavior is expected. For instance, see the first packet where the device returns "noSuchName" error for Patton: It says that OID .1.3.6.1.2.1.2.2.1.5.2 (which corresponds to item "patton2291.apra.it:IF-MIB-ifSpeed.[2]") is not to be found. The packets that follow say the same about the other OIDs in the list. Similarly, see the following packet for Cisco: It says, for instance, that OIDs .1.3.6.1.2.1.2.2.1.14.54 and .1.3.6.1.2.1.2.2.1.14.55 (items "sg300-1.apra.it:IF-MIB-ifInErrors.[54]" and "sg300-1.apra.it:IF-MIB-ifInErrors.[54]") are not to be found. So, according to the traffic, Zabbix behaves as expected. |
Comment by Cristian Mammoli [ 2014 Apr 16 ] |
So probably the snmp implementations on this devices are to blame. The problem is that this does not happen with 2.2.2. I can leave without interface traffic information on a voice gateway, but not on a switch? |
Comment by Aleksandrs Saveljevs [ 2014 Apr 17 ] |
It seems that there is nothing to fix on Zabbix side. Closing for now. |
Comment by Ray [ 2014 May 07 ] |
I have the same problem too, everything was working fine with previous version of zabbix. I have 4 SNMP OID's on monitoring (soft switch MERA MVTS3G), and "unsupported" is rotated in cycle, i.e. first supported, others - no, later - second supported, others - not, etc., approximately every 5 minutes. All other devices is fine. I don't think, that this is not a bug in zabbix - everything was fine many months. And problems starts exactly after update. My environment Ubuntu 12.04, Mysql, zabbix from official repo |
Comment by Ray [ 2014 May 07 ] |
It's seems to be that some devices cannot return data, if it's requested via "get request" with multiple valiues, but can answer to "get-next-request" with one value. Have you changed request type in 2.2.3? |
Comment by Aleksandrs Saveljevs [ 2014 May 07 ] |
Ray, it seems that you have reported your problem separately in |
Comment by Strategist [ 2015 Oct 21 ] |
I have this problem now on zabbix 2.2.10, before upgrade from 2.0.14 to 2.2.10 all worked fine |
Comment by Andrei Gushchin (Inactive) [ 2015 Nov 04 ] |
According RFC Zabbix has a bit not correct behaviour If both the error-status field and the error-index field of the Response-PDU are non-zero, then the value of the error-index field is the index of the variable binding (in the variable-binding list of the corresponding request) for which the request failed. The first variable binding in a request's variable-binding list is index one, the second is index two, etc. A compliant SNMPv2 entity acting in a manager role must be able to properly receive and handle a Response-PDU with an error-status field equal to `noSuchName', `badValue', or `readOnly'. (See Section 3.1.2 of [8].) [8] It is https://tools.ietf.org/html/rfc2576 zalex_ua currently (2016-05-17) the 2nd paragraph is written a bit differently: A compliant SNMP entity supporting a command generator application must be able to properly receive and handle a Response-PDU with an error-status field equal to "noSuchName", "badValue", or "readOnly". (See sections 1.3 and 4.3 of [RFC2576].) |
Comment by Oleksii Zagorskyi [ 2016 May 18 ] |
Issue summary/description updated to reflect actual figured out problem. |
Comment by Oleksii Zagorskyi [ 2016 May 19 ] |
Let me summarize what is the issue, repeating some parts said already above. Some details to be more clear: SNMP_NOSUCHOBJECT and SNMP_NOSUCHINSTANCE errors are related to to SNMPv2 only and may appear not in the "packet header", but for particular var binds in the response: The problem is that in zabbix code we process correctly only "error-status: noSuchName" for SNMPv1 protocol only: else if (STAT_SUCCESS == status && SNMP_ERR_NOSUCHNAME == response->errstat && 0 != response->errindex && ITEM_TYPE_SNMPv1 == items[0].type) I support the asaveljevs' idea that simply removal the "TEM_TYPE_SNMPv1 == items[0].type" condition should nicely resolve current issue. |
Comment by Aleksandrs Saveljevs [ 2016 May 24 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-8096 . |
Comment by Andris Zeila [ 2016 May 27 ] |
Successfully tested |
Comment by Aleksandrs Saveljevs [ 2016 May 31 ] |
Fixed in pre-2.2.14rc1 r60421, pre-3.0.4rc1 r60422, pre-3.1.0 (trunk) r60423. |