[ZBX-8982] Some SNMP counters do not work Created: 2014 Nov 02 Updated: 2017 May 30 Resolved: 2014 Nov 13 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Proxy (P), Server (S) |
Affects Version/s: | 2.2.7 |
Fix Version/s: | 2.2.8rc1, 2.4.3rc1, 2.5.0 |
Type: | Incident report | Priority: | Blocker |
Reporter: | Nick Babinskiy | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | snmp, validation | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
CentOS release 5.11 (Final) |
Attachments: | SNMP.dump incorrect-oid.png server.log zabbix_server.conf | ||||
Issue Links: |
|
Description |
After upgrading zabbix server 2.2.2 to 2.2.7, some counters on some devices became not supported. 2757:20141102:082032.381 SNMP response from host "cc-03.xxxx.xx" does not contain variable bindings in the requested order 2829:20141102:082032.791 item "cc-03.xxx.xx:.1.3.6.1.4.1.17095.1.3.1.1.4.4" became not supported: Invalid SNMP response: variable bindings out of order I found |
Comments |
Comment by richlv [ 2014 Nov 02 ] |
just in case, did you restart server after setting EnableSNMPBulkRequests=0 ? |
Comment by Nick Babinskiy [ 2014 Nov 02 ] |
yes, of cource |
Comment by Aleksandrs Saveljevs [ 2014 Nov 03 ] |
If the error is reliably reproducible, could you please attach tcpdump between Zabbix server and this SNMP device, together with the relevant part of the log? |
Comment by Nick Babinskiy [ 2014 Nov 03 ] |
SNMP.dump is result of tcpdump -s 1500 -npi eth0 -w SNMP.dump host 192.168.3.106 and port 161 command |
Comment by Aleksandrs Saveljevs [ 2014 Nov 03 ] |
Attached tcpdump contains the following:
It can be seen that a device is queried one OID, but returns the same OID with ".0" appended. That does not seem to be how a self-respecting SNMP device should behave. You can probably fix it by appending ".0" to the queried OID yourself. |
Comment by Nick Babinskiy [ 2014 Nov 03 ] |
i appended .0 to some OID, and i saw in tcpdump output these lines 10:44:05.522700 IP 192.168.3.112.60068 > 192.168.3.106.snmp: GetRequest(35) .1.3.6.1.4.1.17095.1.3.1.1.4.7.0 10:44:05.530812 IP 192.168.3.106.snmp > 192.168.3.112.60068: GetResponse(40) .1.3.6.1.4.1.17095.1.3.1.1.4.7.0.0=[noSuchObject] .... 10:52:45.272375 IP 192.168.3.112.50559 > 192.168.3.106.snmp: GetRequest(35) .1.3.6.1.4.1.17095.1.3.1.1.4.6.0 10:52:45.280569 IP 192.168.3.106.snmp > 192.168.3.112.50559: GetResponse(40) .1.3.6.1.4.1.17095.1.3.1.1.4.6.0.0=[noSuchObject] and counters still in "not supported" state |
Comment by Aleksandrs Saveljevs [ 2014 Nov 03 ] |
What kind of device are you monitoring? Do you know why does it try to append ".0" itself? |
Comment by Nick Babinskiy [ 2014 Nov 03 ] |
There are self-manufactured devices based on Microchip PIC18F97J60. Also i have seen on SNMP traffic with 2.2.2 zabbix 11:13:45.555253 IP 192.168.3.112.36379 > 192.168.3.106.snmp: GetRequest(34) .1.3.6.1.4.1.17095.1.3.1.1.4.6 11:13:45.563656 IP 192.168.3.106.snmp > 192.168.3.112.36379: GetResponse(41) .1.3.6.1.4.1.17095.1.3.1.1.4.6.0=197 11:13:48.913211 IP 192.168.3.112.41267 > 192.168.3.106.snmp: GetRequest(34) .1.3.6.1.4.1.17095.1.3.1.1.2.5 11:13:48.920437 IP 192.168.3.106.snmp > 192.168.3.112.41267: GetResponse(40) .1.3.6.1.4.1.17095.1.3.1.1.2.5.0=1 11:13:50.018579 IP 192.168.3.112.60738 > 192.168.3.106.snmp: GetRequest(34) .1.3.6.1.4.1.17095.1.3.1.1.2.6 11:13:50.026054 IP 192.168.3.106.snmp > 192.168.3.112.60738: GetResponse(40) .1.3.6.1.4.1.17095.1.3.1.1.2.6.0=1 11:13:50.109829 IP 192.168.3.112.37738 > 192.168.3.106.snmp: GetRequest(34) .1.3.6.1.4.1.17095.1.3.1.1.2.7 11:13:50.117469 IP 192.168.3.106.snmp > 192.168.3.112.37738: GetResponse(40) .1.3.6.1.4.1.17095.1.3.1.1.2.7.0=1 11:13:51.179862 IP 192.168.3.112.45811 > 192.168.3.106.snmp: GetRequest(34) .1.3.6.1.4.1.17095.1.3.1.1.3.2 11:13:51.187999 IP 192.168.3.106.snmp > 192.168.3.112.45811: GetResponse(40) .1.3.6.1.4.1.17095.1.3.1.1.3.2.0=0 .0 is also added and zabbix 2.2.2 recognizes it correcrly |
Comment by Aleksandrs Saveljevs [ 2014 Nov 03 ] |
If the device is self-manufactured, do you know what is the reason for appending ".0" and whether it would be possible to make it standard-conformant? |
Comment by Aleksandrs Saveljevs [ 2014 Nov 04 ] |
Validation of SNMP responses was introduced in |
Comment by Aleksandrs Saveljevs [ 2014 Nov 04 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-8982 . |
Comment by richlv [ 2014 Nov 05 ] |
(1) should we maybe log a warning when we get an incorrect response in non-bulk mode ? asaveljevs Probably not, because that would spam the log permanently (as in the case above), with no ability to fix it. <richlv> we do have some warnings (value cache related, for example) that are throttled do one per 5 min. maybe here once per item during server/proxy startup is doable ? it would be nice to inform users about this in some way, especially as we have the validation in place anyway. asaveljevs It was decided that for single-variable requests we shall log such bad responses under DebugLevel=4 and accept the values, while for bulk requests we shall log them under DebugLevel=3 and return "not supported". Example of a log message: 24958:20141111:113756.390 SNMP response from host "firewall" contains variable bindings that do not match the request: sent ".1.3.6.1.2.1.2.2.1.16.9", received ".1.3.6.1.2.1.2.2.1.16.9.0" RESOLVED asaveljevs According to (3), for bulk requests we shall now log a warning, but retry with half the number of variables, proceeding with the general retrying strategy. RESOLVED. wiper CLOSED |
Comment by Andris Zeila [ 2014 Nov 10 ] |
(2) The bulk request check before validation should be reverted. And maybe we should skip SNMP response validation for single value responses even with bulk enabled? asaveljevs RESOLVED in r50536 and r50538. wiper CLOSED |
Comment by Andris Zeila [ 2014 Nov 12 ] |
(3) If SNMP response validation fails neither the request is retried nor min_fail value is updated in configuration cache. This means the next bulk request will attempt to retrieve the same number of values. And if there is a device, that always returns incorrect (either out of order or less or more data) data when N values are requested, then SNMP poller will keep trying to request N values (and failing). asaveljevs RESOLVED in r50567. wiper CLOSED |
Comment by Andris Zeila [ 2014 Nov 12 ] |
Successfully tested |
Comment by Aleksandrs Saveljevs [ 2014 Nov 13 ] |
Fixed in pre-2.2.8 r50589, pre-2.4.3 r50590, and pre-2.5.0 (trunk) r50591. |
Comment by Aleksandrs Saveljevs [ 2014 Nov 13 ] |
(4) Documented at the following locations:
Note that some of these changes document what wiper CLOSED |