[ZBX-5943] Zabbix treats SNMP noSuchName as "host unavailable" Created: 2012 Dec 05  Updated: 2017 May 30  Resolved: 2014 Feb 12

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Incident report Priority: Critical
Reporter: Will Lowe Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: pollers, snmp
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Zabbix 1.8.7, CentOS 6.3 (final), x86_64


Issue Links:
Duplicate
duplicates ZBX-4284 Possible wrong host (agent!,snmp?) di... Closed

 Description   

I'm using an older SNMP device which only supports SNMP v1. I have a number of similar devices and I've created a Zabbix template containing a bunch of items which specific OIDs. One device is missing a few OIDs because it has a different number of interfaces/ports/whatever.

Whenever Zabbix polls this OID, it gets a response from the SNMP server saying "I don't have that OID". Here's a tcpdump from the Zabbix server:

[will@zabbix2 ~]$ sudo tcpdump -ni eth0 ip host apl2-sensor1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
14:02:38.086350 IP 10.30.0.19.60270 > 10.30.0.94.snmp:  GetRequest(35)  .1.3.6.1.4.1.3854.1.2.2.1.16.1.3.0
14:02:38.092946 IP 10.30.0.94.snmp > 10.30.0.19.60270:  GetResponse(40)  .1.3.6.1.4.1.3854.1.2.2.1.16.1.3.0=73
14:02:52.809796 IP 10.30.0.19.51029 > 10.30.0.94.snmp:  GetRequest(36)  .1.3.6.1.4.1.318.1.1.10.3.14.1.1.3.4
14:02:52.815242 IP 10.30.0.94.snmp > 10.30.0.19.51029:  GetResponse(40)  noSuchName@1 .1.3.6.1.4.1.318.1.1.10.3.14.1.1.3.4=
14:02:53.814905 IP 10.30.0.19.51029 > 10.30.0.94.snmp:  GetRequest(36)  .1.3.6.1.4.1.318.1.1.10.3.14.1.1.3.4

This response makes Zabbix decide that the host is unreachable, and it stops polling all other OIDs on the system for a while. As soon as it polls an OID which exists, it re-enables the server.

Here's the corresponding Zabbix server log:

[will@zabbix2 ~]$ grep '20121204:1402.*apl2-sensor1' /var/log/zabbix/zabbix_server.log
   994:20121204:140224.904 Disabling SNMP host [apl2-sensor1.oak]
   995:20121204:140238.093 Enabling SNMP host [apl2-sensor1.oak]
   979:20121204:140258.842 SNMP Host [apl2-sensor1.oak]: first network error, wait for 15 seconds

It seems that Zabbix is interpreting the "no such OID" response as "server down". Instead, it should mark that item not supported and continue polling all other times.



 Comments   
Comment by Oleksii Zagorskyi [ 2012 Dec 05 ]

Heh , looks like this is duplicate of ZBX-4284

Thank you for very good description here!

Which UnavailableDelay, UnreachableDelay, UnreachablePeriod in zabbix_server.conf are you using ?
How many snmp items for this host and which is update interval for items ?

Please, to be sure, upgrade server to latest 1.8.15 and perform experiment once more and provide updated strings from zabbix_server.log because of http://www.zabbix.com/documentation/1.8/manual/about/what_s_new_1.8.9#zabbix_server_improvements

Comment by Will Lowe [ 2012 Dec 06 ]

Looks like only one of those settings is changed from the default:

[will@zabbix2 ~]$ sudo grep -E '^(UnavailableDelay|UnreachableDelay|UnreachablePeriod)' /etc/zabbix/zabbix_server.conf
UnavailableDelay=5

Also, I'm not sure this is a duplicate of the other ticket.

Comment by Will Lowe [ 2012 Dec 06 ]

This host has 6 SNMP v1 items, although probably only two of them are valid.

The update interval is 30s for 5 of them (one of which doesn't return noSuchName) and 60s for the other (which also doesn't return noSuchName). This is likely a configuration error on our part and I'll set them all to 60s.

Comment by Oleksii Zagorskyi [ 2012 Dec 06 ]

Will, please do not change issue's Status.

Comment by Will Lowe [ 2012 Dec 06 ]

Oops, sorry, must've clicked the wrong button.

Not sure why that button is available to me.

Comment by Oleksii Zagorskyi [ 2012 Dec 06 ]

Yes, this issue is not absolute duplicate of the ZBX-4284, that's why it is still opened
But it is very connected to the ZBX-4284.

Please change all items update interval to 60 seconds, upgrade server binary, perform the experiment with capturing for 3 minutes the tcpdump and zabbix_server.log

Comment by richlv [ 2012 Dec 06 ]

btw, why was unavailabledelay changed ?

Comment by Will Lowe [ 2012 Dec 06 ]

I changed it after I discovered this problem. I was hoping it would let Zabbix figure out that the server was "back up" sooner.

Comment by Aleksandrs Saveljevs [ 2014 Feb 12 ]

Cannot reproduce with the reported 1.8.7, latest 1.8, latest 2.0, and latest 2.2.

Here is the tcpdump for the query:

13:22:33.189277 IP (tos 0x0, ttl 64, id 55580, offset 0, flags [DF], proto UDP (17), length 74)
    192.168.1.2.46374 > 192.168.1.1.161: [bad udp cksum 0x87b6 -> 0x0c2a!]  { SNMPv1 { GetRequest(31) R=1523844148  .1.3.6.1.2.1.2.2.1.10.10000 } }
13:22:33.190115 IP (tos 0x0, ttl 64, id 18615, offset 0, flags [none], proto UDP (17), length 76)
    192.168.1.1.161 > 192.168.1.2.46374: [udp sum ok]  { SNMPv1 { GetResponse(31) R=1523844148  noSuchName@1 .1.3.6.1.2.1.2.2.1.10.10000= } }

The server correctly writes the following into the log:

  3494:20140212:132104.167 Item [router:interfaces.ifTable.ifEntry.ifInOctets.10000] became not supported: SNMP error [(noSuchName) There is no such variable name in this MIB.]

Please reopen if the problem is still reproducible.

Comment by Aleksandrs Saveljevs [ 2014 Feb 12 ]

Note that in the description timestamps for tcpdump do not correspond to Zabbix server log entries:

 
[will@zabbix2 ~]$ sudo tcpdump -ni eth0 ip host apl2-sensor1 
...
14:02:52.809796 IP 10.30.0.19.51029 > 10.30.0.94.snmp: GetRequest(36) .1.3.6.1.4.1.318.1.1.10.3.14.1.1.3.4 
14:02:52.815242 IP 10.30.0.94.snmp > 10.30.0.19.51029: GetResponse(40) noSuchName@1 .1.3.6.1.4.1.318.1.1.10.3.14.1.1.3.4= 
14:02:53.814905 IP 10.30.0.19.51029 > 10.30.0.94.snmp: GetRequest(36) .1.3.6.1.4.1.318.1.1.10.3.14.1.1.3.4 
...
 
[will@zabbix2 ~]$ grep '20121204:1402.*apl2-sensor1' /var/log/zabbix/zabbix_server.log 
   994:20121204:140224.904 Disabling SNMP host [apl2-sensor1.oak] 
   995:20121204:140238.093 Enabling SNMP host [apl2-sensor1.oak] 
   979:20121204:140258.842 SNMP Host [apl2-sensor1.oak]: first network error, wait for 15 seconds 

The "noSuchName" packet is from 52 seconds and there is another request packet after that, but the network error is at 58 seconds.

Generated at Fri Apr 26 10:13:49 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.