Loading...

XML

Word

Printable

Type: Problem report
Resolution: Won't fix
Priority: Major
Fix Version/s: 4.4 (plan)
Affects Version/s: 3.2.0alpha1
Component/s: Agent (G), Proxy (P), Server (S)
Labels:
- network
- tcp

Sprint:
Sprint 50 (Mar 2019), Sprint 51 (Apr 2019)
Story Points:
0

Here is how agent-server/proxy communication looks on TCP implementation level:

Time	Zabbix agent	client TCP layer	server TCP layer	Zabbix server/proxy
t=0	sets `alarm()` and calls `connect()`	sends SYN, changes connection state to SYN_SENT	connection in LISTEN state	in `accept()` call
t=1s	...	re-sends SYN	...	...
t=3s	...	re-sends SYN	...	...
t=3+s	gets SIGALRM and aborts `connect()`	changes connection state to CLOSED	...	...
t=?	...	...	receives SYN, responds with SYN/ACK, changes connection status to SYN_RECV	...
t=?	...	ignores received SYN/ACK	attempts several SYN/ACK retransmissions and finally (after some time) changes connection status to CLOSED	...

If round-trip time is over 3 seconds (or the first SYN gets lost and RTT is over 2 seconds or second SYN gets lost too) server/proxy will never get an ACK response and will end up with long-living "half-open" connection. If active agent count is sufficient enough connection queue will fill up and make server completely unreachable.

The problem is that default 3 seconds timeout interacts with TCP retransmission strategy in a very destructive fashion. When "half-open" connection queue is full incoming SYN packets are simply dropped which makes chances of third SYN to become "the one" very high. And since server has virtually no time to respond to it before agent aborts connection, recovery process is very difficult (if possible) even if network gets back to normal

is duplicated by

ZBX-15657 Zabbix agent cannot connect to Zabbix server after update v.4.0.4

Closed

Assignee:: Andris Zeila

Reporter:: Glebs Ivanovskis (Inactive)

Team:: Team A

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2016 May 27 16:21

Updated:: 2024 Apr 10 16:56

Resolved:: 2019 Apr 04 13:38

Details

Description

Attachments

Issue Links

Activity

People

Dates