Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-19136

Pollers randomly stop qurying SNMP

    XMLWordPrintable

    Details

    • Type: Problem report
    • Status: Open
    • Priority: Trivial
    • Resolution: Unresolved
    • Affects Version/s: 5.2.4
    • Fix Version/s: None
    • Component/s: Proxy (P), Server (S)
    • Labels:
      None
    • Environment:
      Ubuntu 18.04 / 20.04

      Description

      Since a couple of weeks, for hosts being monitored with SNMP, proxies randomly stop monitoring hosts with SNMP for up to an hour.

      In /etc/zabbix/zabbix_proxy.conf, we have the following settings"

      Timeout=15           
      UnavailableDelay=300 
      UnreachableDelay=15  
      UnreachablePeriod=120
      

      Here is an example in the log (with warning log level) where things work as expected:

      123006.208 SNMP agent item "sensor.temperature" on host "pdu-302" failed: first network error, wait for 15 seconds
      123051.023 SNMP agent item "phase.loadstate[3]" on host "pdu-302" failed: another network error, wait for 15 seconds
      123206.062 temporarily disabling SNMP agent checks on host "pdu-302": host unavailable
      123206.174 enabling SNMP agent checks on host "pdu-302": host became available
      

      There was an issue getting SNMP data, the proxy tried again shortly after, marked as unuavailabe, and short after was marked as available again.

      Here is an example of unexpected behavior (log level 4 enabled after host was marked unavailable):

      45296:20210317:130733.726 SNMP agent item "ilo.temperature[ambient]" on host "usvh016" failed: first network error, wait for 15 seconds
      45679:20210317:130933.643 temporarily disabling SNMP agent checks on host "usvh016": host unavailable
      45777:20210317:140933.116 enabling SNMP agent checks on host "usvh016": host became available
      45383:20210317:141024.397 In get_values_snmp() host:'usvh016' addr:'usvh016-ilo' num:1
      

      Zabbix only reports a single issue, then 2 minutes later, immediately marks the host as unavailable, and starts monitoring again after 1 hour (tcpdump confirmed no SNMP traffic to the host in between). This happens with random hosts (different hosts each time) and just a couple of times per day.

      This seemed behavior seemed to have started randomly somewhere in Zabbix 5.0.x, and we still have it with Zabbix 5.2.4. Before that, everything was stable. The poller process usage is less then 25% at it's peak, unreachable pollers less than 4%. The host has plenty of resources. It happens.

        Attachments

          Activity

            People

            Assignee:
            ssimonenko Sergey Simonenko
            Reporter:
            kdaudt Kevin Daudt
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated: