[ZBX-16260] ZBXNEXT-4967 may causes problematic behaviour when it can't hit a server Created: 2019 Jun 13  Updated: 2019 Jun 14  Resolved: 2019 Jun 14

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 4.2.1
Fix Version/s: None

Type: Incident report Priority: Minor
Reporter: David Angelovich Assignee: Aleksandrs Petrovs-Gavrilovs
Resolution: Duplicate Votes: 0
Labels: zabbix_sender
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates ZBXNEXT-284 Would be nice to add timeout paramete... Closed

 Description   

The new zabbix_sender behaviour introduced in ZBXNEXT-4967 can cause long timeouts when a server in the ServerActive is not running/responding.

Steps to reproduce:

  1. Add a server to the zabbix_agentd.conf ServerActive= section which exists, but can't be connected to on port 10051.
  2. Run zabbix_sender -c zabbix_agentd.conf -s test -k test -o 0 -vv
  3. Note the long timeout

Result:

$ time zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -s test -k test -o 0 -vv
zabbix_sender [24249]: DEBUG: answer [{"response":"success","info":"processed: 0; failed: 1; total: 1; seconds spent: 0.000050"}]
Response from "gamezabbix-server.svc.oanda.com:10051": "processed: 0; failed: 1; total: 1; seconds spent: 0.000050"
zabbix_sender [24250]: Warning: timeout while executing operation
sent: 1; skipped: 0; total: 1

real 1m0.030s
user 0m0.007s
sys 0m0.038s

grep -i timeout /etc/zabbix/zabbix_agentd.conf
### Option: Timeout
# Spend no more than Timeout seconds on processing
# Timeout=3
Timeout=5

$ grep ServerActive /etc/zabbix/zabbix_agentd.conf
### Option: ServerActive
# Example: ServerActive=127.0.0.1:20051,zabbix.domain,[::1]:30051,::1,[12fc::1]
# ServerActive=
ServerActive=gamezabbix-server.svc.oanda.com,zabbix.svc.engi.oanda.com

In the example above, I had no yet created a firewall rule to allow access to one of the two servers in my config.

In these cases, removing the invalid server or allowing access to it (assuming the server is running and port 10051 is open) fixes the issue, but the behaviour change is enough that these issues may not be obvious - particularly in my case where I default to adding extra ServerActive for every host, but only some actually need the config (Ansible laziness)
**

Expected:
Ideally, zabbix_sender should follow the existing Timeout for these connections to prevent the long wait for a timeout.

Make the new behaviour of sending to all servers require a special argument (or an argument to use the old behaviour of just the first server)



 Comments   
Comment by Aleksandrs Petrovs-Gavrilovs [ 2019 Jun 14 ]

Hello dangelovich,

Thank you for your request, actually there already is a solution for this, please a look at ZBXNEXT-284, there is a separate patch for this, available in the thread.

Syntax example would look like that:

timeout 2 zabbix_sender -c /etc/zabbix/zabbix_agentd.conf -s test -k test -o 0 -vv

Where 2 in this case is 2 seconds.

As for now I will be closing the issue as a duplicate!

Best Regards,
Aleksandrs

Generated at Wed Apr 24 08:19:52 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.