Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-19136

Pollers randomly stop qurying SNMP

XMLWordPrintable

    • Icon: Problem report Problem report
    • Resolution: Unresolved
    • Icon: Trivial Trivial
    • None
    • 5.2.4
    • Proxy (P), Server (S)
    • None
    • Ubuntu 18.04 / 20.04

      Since a couple of weeks, for hosts being monitored with SNMP, proxies randomly stop monitoring hosts with SNMP for up to an hour.

      In /etc/zabbix/zabbix_proxy.conf, we have the following settings"

      Timeout=15           
      UnavailableDelay=300 
      UnreachableDelay=15  
      UnreachablePeriod=120
      

      Here is an example in the log (with warning log level) where things work as expected:

      123006.208 SNMP agent item "sensor.temperature" on host "pdu-302" failed: first network error, wait for 15 seconds
      123051.023 SNMP agent item "phase.loadstate[3]" on host "pdu-302" failed: another network error, wait for 15 seconds
      123206.062 temporarily disabling SNMP agent checks on host "pdu-302": host unavailable
      123206.174 enabling SNMP agent checks on host "pdu-302": host became available
      

      There was an issue getting SNMP data, the proxy tried again shortly after, marked as unuavailabe, and short after was marked as available again.

      Here is an example of unexpected behavior (log level 4 enabled after host was marked unavailable):

      45296:20210317:130733.726 SNMP agent item "ilo.temperature[ambient]" on host "usvh016" failed: first network error, wait for 15 seconds
      45679:20210317:130933.643 temporarily disabling SNMP agent checks on host "usvh016": host unavailable
      45777:20210317:140933.116 enabling SNMP agent checks on host "usvh016": host became available
      45383:20210317:141024.397 In get_values_snmp() host:'usvh016' addr:'usvh016-ilo' num:1
      

      Zabbix only reports a single issue, then 2 minutes later, immediately marks the host as unavailable, and starts monitoring again after 1 hour (tcpdump confirmed no SNMP traffic to the host in between). This happens with random hosts (different hosts each time) and just a couple of times per day.

      This seemed behavior seemed to have started randomly somewhere in Zabbix 5.0.x, and we still have it with Zabbix 5.2.4. Before that, everything was stable. The poller process usage is less then 25% at it's peak, unreachable pollers less than 4%. The host has plenty of resources. It happens.

        1. proxy_stops_monitoring_snmp.png
          56 kB
          Kevin Daudt
        2. usmailext025_snmp_issue.log.gz
          8.00 MB
          Kevin Daudt
        3. usmailext028_server_configuration_cache.png
          28 kB
          Kevin Daudt
        4. usmailext028_server_performance.png
          82 kB
          Kevin Daudt
        5. usmailext028_server_value_cache.png
          28 kB
          Kevin Daudt
        6. usmailext028_snmp_availability.png
          27 kB
          Kevin Daudt
        7. usmailext028_wrk_proxy_dbg.log.gz
          10.41 MB
          Kevin Daudt
        8. usmailext028_wrk_proxy_processes.png
          117 kB
          Kevin Daudt

            zabbix.support Zabbix Support Team
            kdaudt Kevin Daudt
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: