[ZBX-10066] Multiple host availability update issues Created: 2015 Nov 11  Updated: 2017 May 30  Resolved: 2016 Jan 13

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: None
Fix Version/s: 3.0.0alpha6

Type: Incident report Priority: Major
Reporter: Andris Zeila Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: availability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

There are multiple issues how host availability is handled:

  1. error message is not updated for already unavailable hosts. The error message might be changed, but it's ignored if the host is already unavailable (regression).
  2. when proxy fails to send host availability data to server it does not revert internal host availability cache and this availability update will be lost.
  3. the internal host availability cache is stored per process. That means with active passive proxies each trapper has it's own host availability cache and the same availability data could be sent to server multiple times.


 Comments   
Comment by Andris Zeila [ 2015 Nov 19 ]

Resolved in svn://svn.zabbix.com/branches/dev/ZBX-10066

Comment by Sandis Neilands (Inactive) [ 2016 Jan 05 ]

Testing

Scenario #1

Zabbix agent (passive checks)

Pre-requisites

  • Zabbix agent (encryption compiled in) with two configuration files:
    • one without encryption (leave TLS parameters commented out);
    • another one with encryption (for example, PSK see documentation), do not accept unencrypted connections.
  • Zabbix server:
  • Configure agent.ping item for the particular agent (host) in front-end.
  • Disabled Item retry interval set to 10 seconds, other settings left to defaults.

Scenario

1. Start server.
2. Start agentd with the configuration file that contains encryption, check in the log file that agentd started (encryption settings are OK).
3. Wait for the host's availability to be updated in the front-end. You should get an error message similar to this one: "Received empty response from Zabbix Agent at [127.0.0.1]. Assuming that agent dropped connection because of access permissions."
4. Kill all agentd processes.
5. Wait for the host's availability to be updated again. Error message should look like this: "Get value from agent failed: cannot connect to [[127.0.0.1]:10050]: [111] Connection refused"

Expected results: error message changes in step 5.

Result: PASS.

Zabbix proxy

The same test as above but with either active or passive proxy between agent and server.

Result: PASS for both passive and active proxies.

Scenario #2

Pre-requisites

  • Zabbix agent, active Zabbix proxy, Zabbix server.
  • Passive agent item configured in front-end.
  • Wireshark

Scenario

1. Start Wireshark capture of the interface on which the server is listening.
1.1. Use the following or similar filter

tcp.port==10051 && (data)

2. Start all daemons, check that host is available in front-end.
3. With firewall drop packets incoming to Zabbix server's trapper port (10051). The following example shows how to do it with iptables on Linux

sudo iptables -A INPUT -p tcp --dport 10051 -j DROP

4. Kill all agentd processes.
5. Wait for a few minutes.
6. Remove the blocking firewall rule.

sudo iptables -D INPUT 1

Expected results

After step 5:

  • check the proxy's log to see that the host becomes unavailable;
  • check that host's availability doesn't change in front-end (it should change to unknown state only after an hour of silence from proxy).

After step 6:

  • check the new packets in the Wireshark: one of them should contain availability data;
  • host's availability is updated in the front-end.

Result: PASS.

Scenario #3

Pre-requisites

  • Zabbix agent, passive Zabbix proxy with two trappers configured, Zabbix server.
  • Passive agent item configured in front-end.
  • Debug logs enabled for server.

Scenario

1. Start monitoring the server's log. Look for lines like the following one.

  7289:20160106:134500.702 In get_data_from_proxy() request:'host availability'

Shortly afterwards a line like this should appear if there are no host availability changes.

  7289:20160106:134500.702 obtained data from proxy "local proxy": [{"data":[]}]

2. Start all daemons, check that host is available in front-end.
3. Block one of the proxy's trappers by opening a TCP connection to it but not sending any data. It should be stuck for five minutes. You can use netcat utility for this.

nc 127.0.0.1 10052

4. Disable the agent.
5. Wait for the availability to be updated in front-end.
6. Block the other trapper with the same method as above.
7. Release the first trapper (press CTRL+C in your nc session).

Expected results

in log after step 7 you should see that proxy continues sending empty availability data to server.

Before the correction the proxy would send the host's availability again as if it had been changed again.

Result: PASS.

Comment by Sandis Neilands (Inactive) [ 2016 Jan 05 ]

(1) Passive proxy sends empty hosts availability data. The server protests with the following log message.

18403:20160105:174828.996 invalid host availability data: Can't find pair with name "data"

The immediate cause of this is that we don't check for return value of get_host_availability_data() in send_host_availability(). It fails in due to DCget_hosts_availability() not finding any hosts for which availability has changed.

wiper RESOLVED in r57439

sandis.neilands CLOSED.

Comment by Sandis Neilands (Inactive) [ 2016 Jan 06 ]

Successfully tested.

Comment by Andris Zeila [ 2016 Jan 07 ]

Released in:

  • pre-3.0.0alpha6 r57458
Comment by Aleksandrs Saveljevs [ 2016 Jan 07 ]

(2) The following compilation warning was introduced:

proxyhosts.c:105:2: warning: implicit declaration of function 'zbx_set_availability_diff_ts' is invalid in C99 [-Wimplicit-function-declaration]
        zbx_set_availability_diff_ts(ts);
        ^
1 warning generated.

wiper Fixed directly in trunk by adding include file.
RESOLVED in r57467

asaveljevs CLOSED

Comment by Andris Zeila [ 2016 Jan 13 ]

(3) After host is set unavailable the items are pulled with the usual delay, ignoring the unreachable delay configuration parameter

wiper RESOLVED in r57578

sandis.neilands CLOSED.

Comment by Andris Zeila [ 2016 Jan 13 ]

Released in:

  • pre-3.0.0alpha6 r57599
Generated at Fri Mar 29 08:20:14 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.