Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-16260

ZBXNEXT-4967 may causes problematic behaviour when it can't hit a server

    Details

    • Type: Incident report
    • Status: Closed
    • Priority: Minor
    • Resolution: Duplicate
    • Affects Version/s: 4.2.1
    • Fix Version/s: None
    • Component/s: Agent (G)
    • Labels:

      Description

      The new zabbix_sender behaviour introduced in ZBXNEXT-4967 can cause long timeouts when a server in the ServerActive is not running/responding.

      Steps to reproduce:

      1. Add a server to the zabbix_agentd.conf ServerActive= section which exists, but can't be connected to on port 10051.
      2. Run zabbix_sender -c zabbix_agentd.conf -s test -k test -o 0 -vv
      3. Note the long timeout

      Result:

      $ time zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -s test -k test -o 0 -vv
      zabbix_sender [24249]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 1; total: 1; seconds spent: 0.000050"}]
      Response from "gamezabbix-server.svc.oanda.com:10051": "processed: 0; failed: 1; total: 1; seconds spent: 0.000050"
      zabbix_sender [24250]: Warning: timeout while executing operation
      sent: 1; skipped: 0; total: 1
      
      real 1m0.030s
      user 0m0.007s
      sys 0m0.038s
      
      grep -i timeout /etc/zabbix/zabbix_agentd.conf
      ### Option: Timeout
      # Spend no more than Timeout seconds on processing
      # Timeout=3
      Timeout=5
      
      $ grep ServerActive /etc/zabbix/zabbix_agentd.conf
      ### Option: ServerActive
      # Example: ServerActive=127.0.0.1:20051,zabbix.domain,[::1]:30051,::1,[12fc::1]
      # ServerActive=
      ServerActive=gamezabbix-server.svc.oanda.com,zabbix.svc.engi.oanda.com
      
      

      In the example above, I had no yet created a firewall rule to allow access to one of the two servers in my config.

      In these cases, removing the invalid server or allowing access to it (assuming the server is running and port 10051 is open) fixes the issue, but the behaviour change is enough that these issues may not be obvious - particularly in my case where I default to adding extra ServerActive for every host, but only some actually need the config (Ansible laziness)
      **

      Expected:
      Ideally, zabbix_sender should follow the existing Timeout for these connections to prevent the long wait for a timeout.

      Make the new behaviour of sending to all servers require a special argument (or an argument to use the old behaviour of just the first server)

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                agavrilovs Aleksandrs Petrovs-Gavrilovs
                Reporter:
                dangelovich David Angelovich
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: