[ZBX-19115] Failed SNMP LLD causes gaps in data collection of unrelated items Created: 2021 Mar 11  Updated: 2023 Oct 20  Resolved: 2023 Oct 20

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 5.2.4
Fix Version/s: None

Type: Problem report Priority: Trivial
Reporter: Dave E Martin Assignee: Edgars Melveris
Resolution: Incomplete Votes: 0
Labels: collector, items, lld, snmp, timeout
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

server:
Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-65-generic x86_64)

DB (separate host):
PostgreSQL 11.3 (Ubuntu 11.3-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit


Attachments: PNG File zabbix failed snmp lld causes gaps in other data 20210311-1.png     PNG File zabbix failed snmp lld causes gaps in other data 20210311-2-1.png     PNG File zabbix failed snmp lld causes gaps in other data, disabled at 1154 20210311-3.png    

 Description   

An SNMP low-level discovery rule that fails appears to block data collection of other unrelated SNMP data items, and also keeps trying to discover, ignoring the configured discovery interval.

 

Based on the logs, it appears that a non-existent OID error is treated the same as a timeout error.

 

Verifying that these two conditions are in fact distinguishable:

 

snmpwalk -v2c -cxxxx 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID
snmpget -v2c -cxxxx 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID
snmpget -v2c -cxxxx 172.18.1.44 1.3.6.1.2.1.14.10.1.1
Timeout: No Response from 172.18.1.44.
 

 

The attached graph of several unrelated items did not have gaps in it before adding the discovery rule.

 

 

3032035:20210311:110158.177 resuming SNMP agent checks on host "agg-1.gp1": connection restored
3032040:20210311:110218.664 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds
3032017:20210311:110220.263 SNMP agent item "ospfNbrIpAddr" on host "agg-1.voon" failed: first network error, wait for 15 seconds
3032024:20210311:110220.695 SNMP agent item "ospfNbrIpAddr" on host "agg-2.voon" failed: first network error, wait for 15 seconds
3032031:20210311:110223.089 SNMP agent item "ospfAreaId" on host "agg-2.gp1" failed: another network error, wait for 15 seconds
3032083:20210311:110235.214 resuming SNMP agent checks on host "agg-1.voon": connection restored
3032083:20210311:110235.218 resuming SNMP agent checks on host "agg-2.voon": connection restored
3032085:20210311:110243.416 temporarily disabling SNMP agent checks on host "agg-2.gp1": host unavailable
3032007:20210311:110253.214 resuming SNMP agent checks on host "agg-1.gp1": connection restored
3031961:20210311:110313.239 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds
 

These log entries continue every 1 to 2 minutes or so.

 

I disabled the discovery rule at approximately 1154, and no log entries appear after that time:

3032055:20210311:115056.621 resuming SNMP agent checks on host "agg-1.gp1": connection restored
3032054:20210311:115156.667 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds
3032024:20210311:115251.434 temporarily disabling SNMP agent checks on host "agg-1.gp1": host unavailable
3032057:20210311:115411.505 enabling SNMP agent checks on host "agg-1.gp1": host became available
root@zabbix:/opt/zabbix/log# date
Thu 11 Mar 2021 01:22:46 PM MST
root@zabbix:/opt/zabbix/log#

 

I would suggest that Zabbix not treat "no such object" the same as "timeout"  (if this is in fact the case, as the log seems to imply) in either LLD, or in regular data collection; and that an LLD OID that doesn't exist should just mean that the item(s) is not discovered (or lost) until the next time the LLD rule should run. And a regular item OID that doesn't exist should not stop or suspend collection of other items (but should log/report as appropriate).

 



 Comments   
Comment by Edgars Melveris [ 2021 Apr 01 ]

Hello Dave!
Actually it does not treat those errors as the same - "No Such Object available on this agent at this OID" error will make an item or LLD rule unsupported, but the rest of the items will work normally.
An actual timeout on the other hand disabled monitoring all items on this host for a while.
What is the timeout setting in your configuration file?
Please repeat the command like this:

time snmpwalk -v2c -cxxxx 172.18.1.4 1.3.6.1.2.1.14.10.1.1

Execute it multiple times.

Comment by Dave E Martin [ 2021 May 05 ]
      1. Option: Timeout
  1. Specifies how long we wait for agent, SNMP device or external check (in seconds).
    #
  2. Mandatory: no
  3. Range: 1-30
  4. Default:
  5. Timeout=3

###Timeout=4
Timeout=20

 
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.161s
user 0m0.018s
sys 0m0.038s
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.024s
user 0m0.004s
sys 0m0.011s
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.025s
user 0m0.007s
sys 0m0.010s
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.029s
user 0m0.013s
sys 0m0.008s
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.025s
user 0m0.005s
sys 0m0.013s
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.026s
user 0m0.005s
sys 0m0.014s
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.024s
user 0m0.008s
sys 0m0.009s
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.026s
user 0m0.007s
sys 0m0.012s
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.025s
user 0m0.005s
sys 0m0.013s
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.025s
user 0m0.009s
sys 0m0.008s
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.026s
user 0m0.001s
sys 0m0.017s
@zabbix.voon:~$ time snmpwalk -v2c -cairwired 172.18.1.4 1.3.6.1.2.1.14.10.1.1
iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID

real 0m0.032s
user 0m0.016s
sys 0m0.008s

 

Comment by Dave E Martin [ 2021 May 06 ]

Upon further investigation, when we upgraded from Zabbix 4.x to 5.x, Zabbix created additional SNMP interfaces on most of our hosts (I presume due to differences in the way SNMP items were defined in various templates that we use).

It appears as a result of this conversion the discovery rules got attached to the wrong SNMP interface.

We have (had) a template that triggers alerts on SNMP responding to either public or private communities, while all of our normal SNMP items use {$SNMP_COMMUNITY}.

After conversion, our hosts ended up with 3 SNMP interfaces (with the 3 different communities), and (looking in the database tables) the discovery items are associated with the "private" interface instead of the "{$SNMP_COMMUNITY}" interface, even though the pre 5.x version of the discovery rule specified "{$SNMP_COMMUNITY}" (as shown here: 3 Discovery of SNMP OIDs [Zabbix Documentation 4.0])

So apparently the problem is (or was?) in the schema conversion process while upgrading, or so it appears now.

zabbix=# select distinct it.itemid,it.key_,it.interfaceid from items as it where it.hostid=11760 order by interfaceid;
 itemid  |                                key_                                | interfaceid
---------+--------------------------------------------------------------------+-------------
...
 1161436 | ospfAreaId                                                         |        3002
 1161437 | ospfNbrIpAddr                                                      |        3002
...
  269206 | cbgpPeer2AdminStatus[{#CBGPPEER2REMOTEADDR}]                       |       10985
  269208 | ifInErrors[{#IFINDEX}]                                             |       10985
  269209 | ifOperStatus[{#IFINDEX}]                                           |       10985
...

zabbix=# select * from interface as it join interface_snmp as its on its.interfaceid=it.interfaceid where it.interfaceid in (10985,3002);
 interfaceid | hostid | main | type | useip |     ip     | dns | port | interfaceid | version | bulk |     community     | securityname | securitylevel | authpassphrase | privpassphrase | authprotocol | privprotocol | contextname
-------------+--------+------+------+-------+------------+-----+------+-------------+---------+------+-------------------+--------------+---------------+----------------+----------------+--------------+--------------+-------------
        3002 |  11760 |    1 |    2 |     1 | 172.18.2.5 |     | 161  |        3002 |       2 |    0 | private           |              |             0 |                |                |            0 |            0 |
       10985 |  11760 |    0 |    2 |     1 | 172.18.2.5 |     | 161  |       10985 |       2 |    0 | {$SNMP_COMMUNITY} |              |             0 |                |                |            0 |            0 |
(2 rows)
Generated at Fri Apr 04 15:54:48 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.