Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-6790

net.tcp.listen race condition

    XMLWordPrintable

    Details

      Description

      We have observed intermittent net.tcp.listen failures for ports that are on 1024 byte boundaries in /proc/net/tcp. We are confident the ports are actually listening at all times because (a) we successfully monitored them by frequently grepping /proc/net/tcp, which reads the entire file in one read, and (b) we can influence which port fails by adding more listening ports (with nc -kl <portnum>) and observing that faults always occur at 1024 byte boundaries.

      The net.tcp.listen function NET_TCP_LISTEN reads /proc/net/tcp in chunks of 1024 bytes, but /proc/net/tcp is not guaranteed to be consistent between reads. See http://stackoverflow.com/questions/5713451/is-it-safe-to-parse-a-proc-file for a discussion. It appears there is a race condition whereby the result of net.tcp.listen can be altered by the kernel changing /proc/net/tcp between reads.

      Workaround: Instead of net.tcp.listen builtin, use a UserParameter "net.tcp.listen.grep":

      UserParameter=net.tcp.listen.grep[*],grep -q $$(printf '%04X.00000000:0000.0A' $1) /proc/net/tcp && echo 1 || echo 0
      

      With both items in place for the same ports, we now only see errors for net.tcp.listen, and not for net.tcp.listen.grep.

      Would it be possible for the below function to be changed so as to read /proc/net/tcp* in one go?

      int	NET_TCP_LISTEN(const char *cmd, const char *param, unsigned flags, AGENT_RESULT *result)
      {
      	FILE		*f = NULL;
      	char		tmp[MAX_STRING_LEN], pattern[64];
      	unsigned short	port;
      	zbx_uint64_t	listen = 0;
      	int		ret = SYSINFO_RET_FAIL;
      
      	if (num_param(param) > 1)
      		return ret;
      
      	if (0 != get_param(param, 1, tmp, sizeof(tmp)))
      		return ret;
      
      	if (SUCCEED != is_ushort(tmp, &port))
      		return ret;
      
      	if (NULL != (f = fopen("/proc/net/tcp", "r")))
      	{
      		zbx_snprintf(pattern, sizeof(pattern), "%04X 00000000:0000 0A", (unsigned int)port);
      
      		while (NULL != fgets(tmp, sizeof(tmp), f))
      		{
      			if (NULL != strstr(tmp, pattern))
      			{
      				listen = 1;
      				break;
      			}
      		}
      		zbx_fclose(f);
      
      		ret = SYSINFO_RET_OK;
      	}
      
      	if (0 == listen && NULL != (f = fopen("/proc/net/tcp6", "r")))
      	{
      		zbx_snprintf(pattern, sizeof(pattern), "%04X 00000000000000000000000000000000:0000 0A", (unsigned int)port);
      
      		while (NULL != fgets(tmp, sizeof(tmp), f))
      		{
      			if (NULL != strstr(tmp, pattern))
      			{
      				listen = 1;
      				break;
      			}
      		}
      		zbx_fclose(f);
      
      		ret = SYSINFO_RET_OK;
      	}
      
      	SET_UI64_RESULT(result, listen);
      
      	return ret;
      }
      

        Attachments

          Activity

            People

            Assignee:
            Unassigned
            Reporter:
            doug Doug Dixon
            Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: