-
Problem report
-
Resolution: Incomplete
-
Trivial
-
None
-
5.2.4
-
server:
Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-65-generic x86_64)
DB (separate host):
PostgreSQL 11.3 (Ubuntu 11.3-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit
An SNMP low-level discovery rule that fails appears to block data collection of other unrelated SNMP data items, and also keeps trying to discover, ignoring the configured discovery interval.
Based on the logs, it appears that a non-existent OID error is treated the same as a timeout error.
Verifying that these two conditions are in fact distinguishable:
snmpwalk -v2c -cxxxx 172.18.1.4 1.3.6.1.2.1.14.10.1.1 iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID snmpget -v2c -cxxxx 172.18.1.4 1.3.6.1.2.1.14.10.1.1 iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID snmpget -v2c -cxxxx 172.18.1.44 1.3.6.1.2.1.14.10.1.1 Timeout: No Response from 172.18.1.44.
The attached graph of several unrelated items did not have gaps in it before adding the discovery rule.
3032035:20210311:110158.177 resuming SNMP agent checks on host "agg-1.gp1": connection restored 3032040:20210311:110218.664 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds 3032017:20210311:110220.263 SNMP agent item "ospfNbrIpAddr" on host "agg-1.voon" failed: first network error, wait for 15 seconds 3032024:20210311:110220.695 SNMP agent item "ospfNbrIpAddr" on host "agg-2.voon" failed: first network error, wait for 15 seconds 3032031:20210311:110223.089 SNMP agent item "ospfAreaId" on host "agg-2.gp1" failed: another network error, wait for 15 seconds 3032083:20210311:110235.214 resuming SNMP agent checks on host "agg-1.voon": connection restored 3032083:20210311:110235.218 resuming SNMP agent checks on host "agg-2.voon": connection restored 3032085:20210311:110243.416 temporarily disabling SNMP agent checks on host "agg-2.gp1": host unavailable 3032007:20210311:110253.214 resuming SNMP agent checks on host "agg-1.gp1": connection restored 3031961:20210311:110313.239 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds
These log entries continue every 1 to 2 minutes or so.
I disabled the discovery rule at approximately 1154, and no log entries appear after that time:
3032055:20210311:115056.621 resuming SNMP agent checks on host "agg-1.gp1": connection restored 3032054:20210311:115156.667 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds 3032024:20210311:115251.434 temporarily disabling SNMP agent checks on host "agg-1.gp1": host unavailable 3032057:20210311:115411.505 enabling SNMP agent checks on host "agg-1.gp1": host became available root@zabbix:/opt/zabbix/log# date Thu 11 Mar 2021 01:22:46 PM MST root@zabbix:/opt/zabbix/log#
I would suggest that Zabbix not treat "no such object" the same as "timeout" (if this is in fact the case, as the log seems to imply) in either LLD, or in regular data collection; and that an LLD OID that doesn't exist should just mean that the item(s) is not discovered (or lost) until the next time the LLD rule should run. And a regular item OID that doesn't exist should not stop or suspend collection of other items (but should log/report as appropriate).