[ZBX-9733] Zabbix agent port was taken by SYSTEM user and running without process on top of it. Created: 2015 Jul 27  Updated: 2017 May 30  Resolved: 2016 Jan 25

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 2.4.1
Fix Version/s: 2.2.12rc1, 2.4.8rc1, 3.0.0beta2

Type: Incident report Priority: Major
Reporter: Natalia Kagan Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: windows
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

zabbix server & proxy 2.4.4 on centos 6, DB: mysql
zabbix agent v2.4.1 on win 2008 R2


Issue Links:
Duplicate

 Description   

We encountered to some issue when the zabbix port 10050 was up and running twice, once with system user, and another was the agent:

C:\Users\XXXX>netstat -aon | find "10050"
  TCP    0.0.0.0:10050          0.0.0.0:0              LISTENING       16496
  TCP    0.0.0.0:10050          0.0.0.0:0              LISTENING       12600
  TCP    [::]:10050             [::]:0                 LISTENING       12600
  TCP    [::]:10050             [::]:0                 LISTENING       16496

while proccess: 16496 is owned by zabbix_agent and process: 12600 was not in task manager list (which means that the process is no longer exist).

This prevents our zabbix proxy not getting response to non active checks from the host: "Get value from agent failed: cannot connect to [[PPP.YYY.XXX.ZZZ]:10050]: [111] Connection refused"

We were not able to kill the process and also not able to terminate the port since the process wasn't exist (tried with TCPView too).

We also found bunch of Time_wait connections on port 10050 which were terminated by TCP Viewer, but the port LISTEN operation wasn't resolved.

Only reboot solved the issue, but in production we can't reboot around 50 servers which running on them critical services.

In addition we don't see this phenomenon on all windows machines.

Do you have any idea why it's happening?
Did you encounter this issue before?
Any idea of how to resolve this issue without rebooting the machines?

Thanks,
Natalia



 Comments   
Comment by Aleksandrs Saveljevs [ 2015 Jul 28 ]

Unless you have a reliable way of reproducing the issue, it looks more like a support request, rather than a bug report. For support, https://www.zabbix.org/wiki/Getting_help is a better place.

Comment by Aleksandrs Saveljevs [ 2015 Jul 28 ]

Actually, I have just tried starting two Windows agents with the same configuration file and... it worked and they listened to the same port. It should not, so let's keep this issue open.

Comment by Oleksii Zagorskyi [ 2015 Aug 20 ]

Something similar was discussed some time ago in ZBX-3479 but there was a bug which has been fixed.
I'd watch also on userparameters if they are being used.

Comment by Aleksandrs Saveljevs [ 2016 Jan 20 ]

According to the following links, SO_REUSEADDR has a different semantics on Windows than on Unix, and it is suggested to use SO_EXCLUSIVEADDRUSE instead:

The only problem is that it seems to require administrative privileges on Windows XP and earlier, but then setsockopt() call will simply fail silently - the agent will still work.

Comment by Aleksandrs Saveljevs [ 2016 Jan 20 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-9733 .

Comment by Andris Zeila [ 2016 Jan 22 ]

Successfully tested

Comment by Aleksandrs Saveljevs [ 2016 Jan 22 ]

Fixed in pre-2.2.12rc1 r57930, pre-2.4.8rc1 r57931, pre-3.0.0beta2 (trunk) r57932.

Comment by Aleksandrs Saveljevs [ 2016 Jan 22 ]

(1) There were conflicts merging into trunk and it took me a while to fix it. Please review r57932, r57934, r57935.

wiper CLOSED

Generated at Fri Apr 19 11:44:32 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.