[ZBX-6790] net.tcp.listen race condition Created: 2013 Jul 15  Updated: 2017 May 30  Resolved: 2014 Jan 17

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 2.0.6
Fix Version/s: 2.0.11rc1, 2.2.2rc1, 2.3.0

Type: Incident report Priority: Minor
Reporter: Doug Dixon Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: falsepositives, patch, trivial
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Amazon Linux



 Description   

We have observed intermittent net.tcp.listen failures for ports that are on 1024 byte boundaries in /proc/net/tcp. We are confident the ports are actually listening at all times because (a) we successfully monitored them by frequently grepping /proc/net/tcp, which reads the entire file in one read, and (b) we can influence which port fails by adding more listening ports (with nc -kl <portnum>) and observing that faults always occur at 1024 byte boundaries.

The net.tcp.listen function NET_TCP_LISTEN reads /proc/net/tcp in chunks of 1024 bytes, but /proc/net/tcp is not guaranteed to be consistent between reads. See http://stackoverflow.com/questions/5713451/is-it-safe-to-parse-a-proc-file for a discussion. It appears there is a race condition whereby the result of net.tcp.listen can be altered by the kernel changing /proc/net/tcp between reads.

Workaround: Instead of net.tcp.listen builtin, use a UserParameter "net.tcp.listen.grep":

UserParameter=net.tcp.listen.grep[*],grep -q $$(printf '%04X.00000000:0000.0A' $1) /proc/net/tcp && echo 1 || echo 0

With both items in place for the same ports, we now only see errors for net.tcp.listen, and not for net.tcp.listen.grep.

Would it be possible for the below function to be changed so as to read /proc/net/tcp* in one go?

int	NET_TCP_LISTEN(const char *cmd, const char *param, unsigned flags, AGENT_RESULT *result)
{
	FILE		*f = NULL;
	char		tmp[MAX_STRING_LEN], pattern[64];
	unsigned short	port;
	zbx_uint64_t	listen = 0;
	int		ret = SYSINFO_RET_FAIL;

	if (num_param(param) > 1)
		return ret;

	if (0 != get_param(param, 1, tmp, sizeof(tmp)))
		return ret;

	if (SUCCEED != is_ushort(tmp, &port))
		return ret;

	if (NULL != (f = fopen("/proc/net/tcp", "r")))
	{
		zbx_snprintf(pattern, sizeof(pattern), "%04X 00000000:0000 0A", (unsigned int)port);

		while (NULL != fgets(tmp, sizeof(tmp), f))
		{
			if (NULL != strstr(tmp, pattern))
			{
				listen = 1;
				break;
			}
		}
		zbx_fclose(f);

		ret = SYSINFO_RET_OK;
	}

	if (0 == listen && NULL != (f = fopen("/proc/net/tcp6", "r")))
	{
		zbx_snprintf(pattern, sizeof(pattern), "%04X 00000000000000000000000000000000:0000 0A", (unsigned int)port);

		while (NULL != fgets(tmp, sizeof(tmp), f))
		{
			if (NULL != strstr(tmp, pattern))
			{
				listen = 1;
				break;
			}
		}
		zbx_fclose(f);

		ret = SYSINFO_RET_OK;
	}

	SET_UI64_RESULT(result, listen);

	return ret;
}


 Comments   
Comment by Hardy Simpson [ 2013 Oct 11 ]

I am trying to solve the same problem, cool explanation!
I find same problem on aix of proc.num, I think it maybe the same problem.

Comment by Andris Zeila [ 2014 Jan 16 ]

Probably the best solution would be to add similar logic to iproute2 ss utility - estimate the /proc/net/tcp(6) length, allocate and set a custom buffer for /proc/net/tcp(6) reading.

Comment by Andris Zeila [ 2014 Jan 17 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-6790

Actually there is no need to read the /proc/net/tcp(6) line by line, so instead of setting buffer for file pointer I opted to read the whole file and search the port substring in it.

Comment by Aleksandrs Saveljevs [ 2014 Jan 20 ]

(1) We should probably read /proc/net/udp(6) in the same way, so I applied the same changes to net.udp.listen[] in r41643. Please take a look.

wiper Good idea. Also with removed EINTR checks we can simplify the proc_read_file() code a little. Please review r41648

asaveljevs Looks good. CLOSED.

Comment by Andris Zeila [ 2014 Jan 20 ]

Released in:
pre-2.0.11rc1 r41660
pre-2.2.2rc1 r41661
pre-2.3.0 r41662

Generated at Thu Apr 25 17:59:35 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.