[ZBX-8982] Some SNMP counters do not work Created: 2014 Nov 02  Updated: 2017 May 30  Resolved: 2014 Nov 13

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 2.2.7
Fix Version/s: 2.2.8rc1, 2.4.3rc1, 2.5.0

Type: Incident report Priority: Blocker
Reporter: Nick Babinskiy Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: snmp, validation
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS release 5.11 (Final)
2.6.18-398.el5.centos.plusxen #1 SMP Fri Sep 19 16:42:00 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux
Zabbix 2.2.7 (revision 50148)


Attachments: File SNMP.dump     PNG File incorrect-oid.png     Text File server.log     Text File zabbix_server.conf    
Issue Links:
Duplicate

 Description   

After upgrading zabbix server 2.2.2 to 2.2.7, some counters on some devices became not supported.
In log I got like this

2757:20141102:082032.381 SNMP response from host "cc-03.xxxx.xx" does not contain variable bindings in the requested order
2829:20141102:082032.791 item "cc-03.xxx.xx:.1.3.6.1.4.1.17095.1.3.1.1.4.4" became not supported: Invalid SNMP response: variable bindings out of order

I found ZBXNEXT-2301 issue and mentioned EnableSNMPBulkRequests config file paramenter.
But there was no affects when I set EnableSNMPBulkRequests=0 for my server.
I had to rollback to 2.2.2



 Comments   
Comment by richlv [ 2014 Nov 02 ]

just in case, did you restart server after setting EnableSNMPBulkRequests=0 ?

Comment by Nick Babinskiy [ 2014 Nov 02 ]

yes, of cource

Comment by Aleksandrs Saveljevs [ 2014 Nov 03 ]

If the error is reliably reproducible, could you please attach tcpdump between Zabbix server and this SNMP device, together with the relevant part of the log?

Comment by Nick Babinskiy [ 2014 Nov 03 ]

SNMP.dump is result of tcpdump -s 1500 -npi eth0 -w SNMP.dump host 192.168.3.106 and port 161 command
server.log - minimum filtered zabbix server log
zabbix_server.conf - my zabbix server config file

Comment by Aleksandrs Saveljevs [ 2014 Nov 03 ]

Attached tcpdump contains the following:

It can be seen that a device is queried one OID, but returns the same OID with ".0" appended. That does not seem to be how a self-respecting SNMP device should behave.

You can probably fix it by appending ".0" to the queried OID yourself.

Comment by Nick Babinskiy [ 2014 Nov 03 ]

i appended .0 to some OID, and i saw in tcpdump output these lines

10:44:05.522700 IP 192.168.3.112.60068 > 192.168.3.106.snmp:  GetRequest(35)  .1.3.6.1.4.1.17095.1.3.1.1.4.7.0
10:44:05.530812 IP 192.168.3.106.snmp > 192.168.3.112.60068:  GetResponse(40)  .1.3.6.1.4.1.17095.1.3.1.1.4.7.0.0=[noSuchObject]
....
10:52:45.272375 IP 192.168.3.112.50559 > 192.168.3.106.snmp:  GetRequest(35)  .1.3.6.1.4.1.17095.1.3.1.1.4.6.0
10:52:45.280569 IP 192.168.3.106.snmp > 192.168.3.112.50559:  GetResponse(40)  .1.3.6.1.4.1.17095.1.3.1.1.4.6.0.0=[noSuchObject]

and counters still in "not supported" state

Comment by Aleksandrs Saveljevs [ 2014 Nov 03 ]

What kind of device are you monitoring? Do you know why does it try to append ".0" itself?

Comment by Nick Babinskiy [ 2014 Nov 03 ]

There are self-manufactured devices based on Microchip PIC18F97J60.

Also i have seen on SNMP traffic with 2.2.2 zabbix

11:13:45.555253 IP 192.168.3.112.36379 > 192.168.3.106.snmp:  GetRequest(34)  .1.3.6.1.4.1.17095.1.3.1.1.4.6
11:13:45.563656 IP 192.168.3.106.snmp > 192.168.3.112.36379:  GetResponse(41)  .1.3.6.1.4.1.17095.1.3.1.1.4.6.0=197
11:13:48.913211 IP 192.168.3.112.41267 > 192.168.3.106.snmp:  GetRequest(34)  .1.3.6.1.4.1.17095.1.3.1.1.2.5
11:13:48.920437 IP 192.168.3.106.snmp > 192.168.3.112.41267:  GetResponse(40)  .1.3.6.1.4.1.17095.1.3.1.1.2.5.0=1
11:13:50.018579 IP 192.168.3.112.60738 > 192.168.3.106.snmp:  GetRequest(34)  .1.3.6.1.4.1.17095.1.3.1.1.2.6
11:13:50.026054 IP 192.168.3.106.snmp > 192.168.3.112.60738:  GetResponse(40)  .1.3.6.1.4.1.17095.1.3.1.1.2.6.0=1
11:13:50.109829 IP 192.168.3.112.37738 > 192.168.3.106.snmp:  GetRequest(34)  .1.3.6.1.4.1.17095.1.3.1.1.2.7
11:13:50.117469 IP 192.168.3.106.snmp > 192.168.3.112.37738:  GetResponse(40)  .1.3.6.1.4.1.17095.1.3.1.1.2.7.0=1
11:13:51.179862 IP 192.168.3.112.45811 > 192.168.3.106.snmp:  GetRequest(34)  .1.3.6.1.4.1.17095.1.3.1.1.3.2
11:13:51.187999 IP 192.168.3.106.snmp > 192.168.3.112.45811:  GetResponse(40)  .1.3.6.1.4.1.17095.1.3.1.1.3.2.0=0

.0 is also added and zabbix 2.2.2 recognizes it correcrly

Comment by Aleksandrs Saveljevs [ 2014 Nov 03 ]

If the device is self-manufactured, do you know what is the reason for appending ".0" and whether it would be possible to make it standard-conformant?

Comment by Aleksandrs Saveljevs [ 2014 Nov 04 ]

Validation of SNMP responses was introduced in ZBX-8621 in Zabbix 2.2.7. It was mostly meant for detecting cases where SNMP bulk does not work properly. With validation in place for simple, non-bulk SNMP responses, there is no way to monitor devices such as the above. So it was decided to validate SNMP responses only for bulk requests.

Comment by Aleksandrs Saveljevs [ 2014 Nov 04 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-8982 .

Comment by richlv [ 2014 Nov 05 ]

(1) should we maybe log a warning when we get an incorrect response in non-bulk mode ?

asaveljevs Probably not, because that would spam the log permanently (as in the case above), with no ability to fix it.

<richlv> we do have some warnings (value cache related, for example) that are throttled do one per 5 min. maybe here once per item during server/proxy startup is doable ? it would be nice to inform users about this in some way, especially as we have the validation in place anyway.
if not, feel free to close this subissue

asaveljevs It was decided that for single-variable requests we shall log such bad responses under DebugLevel=4 and accept the values, while for bulk requests we shall log them under DebugLevel=3 and return "not supported". Example of a log message:

 24958:20141111:113756.390 SNMP response from host "firewall" contains variable bindings that do not match the request: sent ".1.3.6.1.2.1.2.2.1.16.9", received ".1.3.6.1.2.1.2.2.1.16.9.0"

RESOLVED

asaveljevs According to (3), for bulk requests we shall now log a warning, but retry with half the number of variables, proceeding with the general retrying strategy. RESOLVED.

wiper CLOSED

Comment by Andris Zeila [ 2014 Nov 10 ]

(2) The bulk request check before validation should be reverted. And maybe we should skip SNMP response validation for single value responses even with bulk enabled?

asaveljevs RESOLVED in r50536 and r50538.

wiper CLOSED

Comment by Andris Zeila [ 2014 Nov 12 ]

(3) If SNMP response validation fails neither the request is retried nor min_fail value is updated in configuration cache. This means the next bulk request will attempt to retrieve the same number of values. And if there is a device, that always returns incorrect (either out of order or less or more data) data when N values are requested, then SNMP poller will keep trying to request N values (and failing).

asaveljevs RESOLVED in r50567.

wiper CLOSED

Comment by Andris Zeila [ 2014 Nov 12 ]

Successfully tested

Comment by Aleksandrs Saveljevs [ 2014 Nov 13 ]

Fixed in pre-2.2.8 r50589, pre-2.4.3 r50590, and pre-2.5.0 (trunk) r50591.

Comment by Aleksandrs Saveljevs [ 2014 Nov 13 ]

(4) Documented at the following locations:

Note that some of these changes document what ZBX-8621 should have documented earlier.

wiper CLOSED

Generated at Fri Mar 29 02:25:50 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.