[ZBX-19115] Failed SNMP LLD causes gaps in data collection of unrelated items Created: 2021 Mar 11 Updated: 2023 Oct 20 Resolved: 2023 Oct 20 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 5.2.4 |
Fix Version/s: | None |
Type: | Problem report | Priority: | Trivial |
Reporter: | Dave E Martin | Assignee: | Edgars Melveris |
Resolution: | Incomplete | Votes: | 0 |
Labels: | collector, items, lld, snmp, timeout | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
server: DB (separate host): |
Attachments: |
![]() ![]() ![]() |
Description |
An SNMP low-level discovery rule that fails appears to block data collection of other unrelated SNMP data items, and also keeps trying to discover, ignoring the configured discovery interval.
Based on the logs, it appears that a non-existent OID error is treated the same as a timeout error.
Verifying that these two conditions are in fact distinguishable:
snmpwalk -v2c -cxxxx 172.18.1.4 1.3.6.1.2.1.14.10.1.1 iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID snmpget -v2c -cxxxx 172.18.1.4 1.3.6.1.2.1.14.10.1.1 iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID snmpget -v2c -cxxxx 172.18.1.44 1.3.6.1.2.1.14.10.1.1 Timeout: No Response from 172.18.1.44.
The attached graph of several unrelated items did not have gaps in it before adding the discovery rule.
3032035:20210311:110158.177 resuming SNMP agent checks on host "agg-1.gp1": connection restored 3032040:20210311:110218.664 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds 3032017:20210311:110220.263 SNMP agent item "ospfNbrIpAddr" on host "agg-1.voon" failed: first network error, wait for 15 seconds 3032024:20210311:110220.695 SNMP agent item "ospfNbrIpAddr" on host "agg-2.voon" failed: first network error, wait for 15 seconds 3032031:20210311:110223.089 SNMP agent item "ospfAreaId" on host "agg-2.gp1" failed: another network error, wait for 15 seconds 3032083:20210311:110235.214 resuming SNMP agent checks on host "agg-1.voon": connection restored 3032083:20210311:110235.218 resuming SNMP agent checks on host "agg-2.voon": connection restored 3032085:20210311:110243.416 temporarily disabling SNMP agent checks on host "agg-2.gp1": host unavailable 3032007:20210311:110253.214 resuming SNMP agent checks on host "agg-1.gp1": connection restored 3031961:20210311:110313.239 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds These log entries continue every 1 to 2 minutes or so.
I disabled the discovery rule at approximately 1154, and no log entries appear after that time: 3032055:20210311:115056.621 resuming SNMP agent checks on host "agg-1.gp1": connection restored 3032054:20210311:115156.667 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds 3032024:20210311:115251.434 temporarily disabling SNMP agent checks on host "agg-1.gp1": host unavailable 3032057:20210311:115411.505 enabling SNMP agent checks on host "agg-1.gp1": host became available root@zabbix:/opt/zabbix/log# date Thu 11 Mar 2021 01:22:46 PM MST root@zabbix:/opt/zabbix/log#
I would suggest that Zabbix not treat "no such object" the same as "timeout" (if this is in fact the case, as the log seems to imply) in either LLD, or in regular data collection; and that an LLD OID that doesn't exist should just mean that the item(s) is not discovered (or lost) until the next time the LLD rule should run. And a regular item OID that doesn't exist should not stop or suspend collection of other items (but should log/report as appropriate).
|
Comments |
Comment by Edgars Melveris [ 2021 Apr 01 ] |
Hello Dave! time snmpwalk -v2c -cxxxx 172.18.1.4 1.3.6.1.2.1.14.10.1.1 Execute it multiple times. |
Comment by Dave E Martin [ 2021 May 05 ] |
###Timeout=4 real 0m0.161s real 0m0.024s real 0m0.025s real 0m0.029s real 0m0.025s real 0m0.026s real 0m0.024s real 0m0.026s real 0m0.025s real 0m0.025s real 0m0.026s real 0m0.032s
|
Comment by Dave E Martin [ 2021 May 06 ] |
Upon further investigation, when we upgraded from Zabbix 4.x to 5.x, Zabbix created additional SNMP interfaces on most of our hosts (I presume due to differences in the way SNMP items were defined in various templates that we use). It appears as a result of this conversion the discovery rules got attached to the wrong SNMP interface. We have (had) a template that triggers alerts on SNMP responding to either public or private communities, while all of our normal SNMP items use {$SNMP_COMMUNITY}. After conversion, our hosts ended up with 3 SNMP interfaces (with the 3 different communities), and (looking in the database tables) the discovery items are associated with the "private" interface instead of the "{$SNMP_COMMUNITY}" interface, even though the pre 5.x version of the discovery rule specified "{$SNMP_COMMUNITY}" (as shown here: 3 Discovery of SNMP OIDs [Zabbix Documentation 4.0]) So apparently the problem is (or was?) in the schema conversion process while upgrading, or so it appears now. zabbix=# select distinct it.itemid,it.key_,it.interfaceid from items as it where it.hostid=11760 order by interfaceid; itemid | key_ | interfaceid ---------+--------------------------------------------------------------------+------------- ... 1161436 | ospfAreaId | 3002 1161437 | ospfNbrIpAddr | 3002 ... 269206 | cbgpPeer2AdminStatus[{#CBGPPEER2REMOTEADDR}] | 10985 269208 | ifInErrors[{#IFINDEX}] | 10985 269209 | ifOperStatus[{#IFINDEX}] | 10985 ... zabbix=# select * from interface as it join interface_snmp as its on its.interfaceid=it.interfaceid where it.interfaceid in (10985,3002); interfaceid | hostid | main | type | useip | ip | dns | port | interfaceid | version | bulk | community | securityname | securitylevel | authpassphrase | privpassphrase | authprotocol | privprotocol | contextname -------------+--------+------+------+-------+------------+-----+------+-------------+---------+------+-------------------+--------------+---------------+----------------+----------------+--------------+--------------+------------- 3002 | 11760 | 1 | 2 | 1 | 172.18.2.5 | | 161 | 3002 | 2 | 0 | private | | 0 | | | 0 | 0 | 10985 | 11760 | 0 | 2 | 1 | 172.18.2.5 | | 161 | 10985 | 2 | 0 | {$SNMP_COMMUNITY} | | 0 | | | 0 | 0 | (2 rows) |