Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-19115

Failed SNMP LLD causes gaps in data collection of unrelated items

XMLWordPrintable

    • Icon: Problem report Problem report
    • Resolution: Incomplete
    • Icon: Trivial Trivial
    • None
    • 5.2.4
    • Server (S)
    • server:
      Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-65-generic x86_64)

      DB (separate host):
      PostgreSQL 11.3 (Ubuntu 11.3-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit

      An SNMP low-level discovery rule that fails appears to block data collection of other unrelated SNMP data items, and also keeps trying to discover, ignoring the configured discovery interval.

       

      Based on the logs, it appears that a non-existent OID error is treated the same as a timeout error.

       

      Verifying that these two conditions are in fact distinguishable:

       

      snmpwalk -v2c -cxxxx 172.18.1.4 1.3.6.1.2.1.14.10.1.1
      iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID
      snmpget -v2c -cxxxx 172.18.1.4 1.3.6.1.2.1.14.10.1.1
      iso.3.6.1.2.1.14.10.1.1 = No Such Object available on this agent at this OID
      snmpget -v2c -cxxxx 172.18.1.44 1.3.6.1.2.1.14.10.1.1
      Timeout: No Response from 172.18.1.44.
       
      

       

      The attached graph of several unrelated items did not have gaps in it before adding the discovery rule.

       

       

      3032035:20210311:110158.177 resuming SNMP agent checks on host "agg-1.gp1": connection restored
      3032040:20210311:110218.664 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds
      3032017:20210311:110220.263 SNMP agent item "ospfNbrIpAddr" on host "agg-1.voon" failed: first network error, wait for 15 seconds
      3032024:20210311:110220.695 SNMP agent item "ospfNbrIpAddr" on host "agg-2.voon" failed: first network error, wait for 15 seconds
      3032031:20210311:110223.089 SNMP agent item "ospfAreaId" on host "agg-2.gp1" failed: another network error, wait for 15 seconds
      3032083:20210311:110235.214 resuming SNMP agent checks on host "agg-1.voon": connection restored
      3032083:20210311:110235.218 resuming SNMP agent checks on host "agg-2.voon": connection restored
      3032085:20210311:110243.416 temporarily disabling SNMP agent checks on host "agg-2.gp1": host unavailable
      3032007:20210311:110253.214 resuming SNMP agent checks on host "agg-1.gp1": connection restored
      3031961:20210311:110313.239 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds
       
      

      These log entries continue every 1 to 2 minutes or so.

       

      I disabled the discovery rule at approximately 1154, and no log entries appear after that time:

      3032055:20210311:115056.621 resuming SNMP agent checks on host "agg-1.gp1": connection restored
      3032054:20210311:115156.667 SNMP agent item "ospfAreaId" on host "agg-1.gp1" failed: first network error, wait for 15 seconds
      3032024:20210311:115251.434 temporarily disabling SNMP agent checks on host "agg-1.gp1": host unavailable
      3032057:20210311:115411.505 enabling SNMP agent checks on host "agg-1.gp1": host became available
      root@zabbix:/opt/zabbix/log# date
      Thu 11 Mar 2021 01:22:46 PM MST
      root@zabbix:/opt/zabbix/log#
      

       

      I would suggest that Zabbix not treat "no such object" the same as "timeout"  (if this is in fact the case, as the log seems to imply) in either LLD, or in regular data collection; and that an LLD OID that doesn't exist should just mean that the item(s) is not discovered (or lost) until the next time the LLD rule should run. And a regular item OID that doesn't exist should not stop or suspend collection of other items (but should log/report as appropriate).

       

            zux Edgars Melveris
            xxiii Dave E Martin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: