[ZBX-7046] bug in get_ip_by_socket() Created: 2013 Sep 24  Updated: 2017 May 30  Resolved: 2014 Apr 24

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: API (A), Server (S)
Affects Version/s: 2.0.8
Fix Version/s: 2.0.12rc1, 2.2.4rc1, 2.3.0

Type: Incident report Priority: Major
Reporter: Nikita Kozlov Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: autoregistration, ipv4, ipv6
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

FreeBSD ... 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 UTC 2012 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64


Issue Links:
Duplicate

 Description   

get_ip_by_socket() always return '\0' for ipv4 socket if zabbix_server compiled with IPv6 support. This breaks auto registration of new agents without explicitly defined "ListenIP" in zabbix_agentd.conf.



 Comments   
Comment by Riaan Olivier [ 2013 Dec 17 ]

Found the same issue with the following configuration:

  • FreeBSD chromium.dmz 9.1-RELEASE-p7 FreeBSD 9.1-RELEASE-p7
  • Zabbix Server Version 2.0.10
 86088:20131217:141546.415 Cannot get socket IP address: [42] Protocol not available
 86088:20131217:141549.865 Cannot get socket IP address: [42] Protocol not available
Comment by Aleksandrs Saveljevs [ 2013 Dec 17 ]

Related issue: ZBX-7551.

Comment by Juris Miščenko (Inactive) [ 2014 Jan 16 ]

Fixed in svn://svn.zabbix.com/branches/dev/ZBX-7046

Comment by Aleksandrs Saveljevs [ 2014 Jan 17 ]

(1) Your attempt at the fix adds passing of "serv" and "servlen" arguments to getnameinfo().

Could you please explain the rationale behind this change? Manual page for getnameinfo() for Linux explicitly says that "serv" and "servlen" parameters are optional:

The caller can specify that no hostname (or no service name) is required by providing a NULL host (or serv) argument or a zero hostlen (or servlen) argument. However, at least one of hostname or service name must be requested.

The manual page on FreeBSD does not seem to make a different point either. On the contrary, it omits "serv" and "servlen" in one of the examples.

Finally, I have tested the branch on FreeBSD 8.2. I can confirm that the original problem is there and that the fix did not make a difference.

jurism
After investigating the bug, I have concluded that it stemmed from passing arguments to functions of inconsistent nature under various conditions.

In this particular case, we made a call to the library function getnameinfo(3) which takes in a pointer to a generic socket address structure (of type sockaddr_in or sockaddr_in6), the size of the structure, two buffers for the returned results (the host address and the service port), their length and four octets containing flags for determining the type of information to be returned to the caller.

The way BSD sockets work is that they provide a unified address structure interface via typecasts. Any socket address structure can be typecast to the base structure `struct sockaddr' which contains the most vital socket information to provide functionality (an unsigned short containing the address family of the socket and 14 bytes for the address information). `struct sockaddr' is the smallest structure of the family, with a total of 16 bytes (coincidentally, the same as `struct sockaddr_in'). On the other end of the spectrum is `struct sockaddr_stroage' which is the largest structure, containing 128 bytes of information, which is guaranteed to hold all and any socket information, independent of the socket type or address family. So a lot of functions request that you typecast your socket structures to the base type (struct sockaddr) which is then used to determine further actions to be taken.

The way FreeBSD have implemented their getnameinfo(3) lib call is that they have an external array holding records for every known address family, and the characteristics (for example, the size) of the socket structures that should be used to work with these address families, to which the passed parameters are validated against before anything is done, and in case of inconsistency, an EAI_FAIL is returned (I'm not 100% sure this return value is valid anymore, because while skimming over the FreeBSD libc code, I found that it, along with a few others, were disabled, yet had no replacements, and were still used in the getnameinfo(3) code). This is the characteristics array definition in FreeBSD:

head/lib/libc/net/getnameinfo.c
...
 97 static const struct afd {
 98         int a_af;
 99         size_t a_addrlen;
100         socklen_t a_socklen;
101         int a_off;
102 } afdl [] = {
103 #ifdef INET6
104         {PF_INET6, sizeof(struct in6_addr), sizeof(struct sockaddr_in6),
105                 offsetof(struct sockaddr_in6, sin6_addr)},
106 #endif
107         {PF_INET, sizeof(struct in_addr), sizeof(struct sockaddr_in),
108                 offsetof(struct sockaddr_in, sin_addr)},
109         {0, 0, 0},
110 };
...

The way OpenBSD approach this is that getnameinfo(3) is a wrapper function that does a little typecasting and argument preparation for the actual function that is used to perform the lookup. They don't use an array with predefined values to validate the parameters against. The approach they take is typecasting the passed socket address structure to the appropriate structure based on the address family defined in the first short of the base socket address structure (struct sockaddr) and then calculate the values to be passed to the underlying function on the fly. This approach can be seen here:

OpenBSD-lib-patches/libc/net/getnameinfo.c
...
 97 int
 98 getnameinfo(const struct sockaddr *sa, socklen_t salen,
 99             char *host, size_t hostlen,
100             char *serv, size_t servlen,
101             int flags)
102 {
103     switch (sa->sa_family) {
104 #ifdef HAVE_IPV6
105     case AF_INET6 : {
106         const struct sockaddr_in6 *sin6 = (const struct sockaddr_in6 *)sa;
107 
108         return doit (AF_INET6, &sin6->sin6_addr, sizeof(sin6->sin6_addr),
109                      sin6->sin6_port,
110                      host, hostlen,
111                      serv, servlen,
112                      flags);
113     }
114 #endif
115     case AF_INET : {
116         const struct sockaddr_in *sin = (const struct sockaddr_in *)sa;
117 
118         return doit (AF_INET, &sin->sin_addr, sizeof(sin->sin_addr),
119                      sin->sin_port,
120                      host, hostlen,
121                      serv, servlen,
122                      flags);
123     }
124     default :
125         return EAI_FAMILY;
126     }
127 }
...

In both of these cases, the length to which the passed size parameter is compared is either sizeof(in_addr) or sizeof(in6_addr) depending on the address family detected in the first two bytes of the passed structure.

The reason why this bug manifested itself was the way we determined the arguments to pass it. Until now, the way we did it was that we used a different structure for holding the socket address information depending on whether IPv6 support was enabled. Here's the code:

src/libs/zbxcomms/comms.c
...
24   #if defined(HAVE_IPV6)
25   #       define ZBX_SOCKADDR struct sockaddr_storage
26   #else
27   #       define ZBX_SOCKADDR struct sockaddr_in
28   #endif
...
1079 char    *get_ip_by_socket(zbx_sock_t *s)
1080 {
1081          ZBX_SOCKADDR    sa;
1082          ZBX_SOCKLEN_T   sz = sizeof(sa);
...

At this point, we have declared a socket structure `sa' of either type `struct sockaddr_in' or `struct sockaddr_storage' if IPv6 is disabled or enabled, respectively.
The variable `sz' holds the size of the address structure, which isn't actually used in the call that broke the functionality. This is completely valid for the call that we use it for because it explicitly specifies that the size provided should be the size of the argument structure passed to it. Unfortunately, due to the confusing nature of the getnameinfo(3) call, it is easy to call it with the wrong parameters leading to this kind of bug.

Here is the call to getnameinfo(3) with a little context:

src/libs/zbxcomms/comms.c
...
1093 #if defined(HAVE_IPV6)
1094         if (0 != getnameinfo((struct sockaddr*)&sa, sizeof(sa), host, sizeof(host), NULL, 0, NI_NUMERICHOST))
1095         {
1096                 error_message = strerror_from_system(zbx_sock_last_error());
1097                 zbx_set_tcp_strerror("connection rejected, getnameinfo() failed: %s", error_message);
1098         }
1099 #else
...

The problem that can be seen here is that `sa' is originally of type `struct sockaddr_storage' (due to this executing only when IPv6 is enabled and thus ZBX_SOCKADDR is defined as `struct sockaddr_storage'). As I mentioned earlier, BSD sockets try to have a unified interface via typecasting and then further addressing the contents of the structure based on the determined address family. The size parameter (2nd) here is meant to validate the passed arguments, but if it is looked at closer, it can also be used to verify whether the system implements correct sizes for the address socket structures. The way it verifies whether the passed size parameter is consistent is by looking at the first parameters `sa_family' value to determine the sockets family and based on that determine the appropriate size that should have been passed. We fail this validation because we pass the size of the structure that we use for storing the address information. To remedy this, in this specific case, the size of the appropriate address structure for the used address family should be passed.

While researching this issue, I concluded that the best practice, in general, when passing arguments to a function that requires a descriptive argument of another argument, would be to pass the size of the appropriate type to which the described parameter is typecast (i.e. if we pass an int typecast to a char, we should pass a size argument of sizeof(char), rather than the actual size of the parameter that is being typecast). This also improves portability and clarifies the code. Also, in essence, when we're typecasting, we're passing a completely different type (hence `typecasting'), which means that the underlying functions can and may address the data in different ways, and passing an incorrect size value is, in my opinion, a disaster waiting to happen. This approach, unfortunately, is not always consistent with socket address structures due to their portability interface and how various functions determine the validity of the values passed to them. This is also one of the reasons sockets have been a stumbling block for novice developers.

After evaluating the way we implemented this functionality previously, I tried to not move away from the original idea of how it was done, although one of my first thoughts was to drop the getnameinfo(3) call completely, because we were already using getpeername(2), which has been around since 4.2BSD and is IPv4 and IPv6 capable, but I wanted to change the code as little as possible. But the further I dug, I realized how this may be a point of confusion for anyone later reviewing the code or debugging it and that it would force them to do the same things I did to nail it down. So it was decided to drop the getnameinfo(3) call altogether and just use a combination of getpeername(2) and inet_ntop(3).

This, this is how our code looks now, after the changes:

src/libs/zbxcomms/comms.c
...
1079 char    *get_ip_by_socket(zbx_sock_t *s)
1080 {
1081         ZBX_SOCKADDR    sa;
1082         ZBX_SOCKLEN_T   sz = sizeof(sa);
1083         static char     host[64];
1084         char            *error_message = NULL;
1085 
1086         if (ZBX_TCP_ERROR == getpeername(s->socket, (struct sockaddr*)&sa, &sz))
1087         {
1088                 error_message = strerror_from_system(zbx_sock_last_error());
1089                 zbx_set_tcp_strerror("connection rejected, getpeername() failed: %s", error_message);
1090                 goto out;
1091         }
1092 
1093         switch (((struct sockaddr *)&sa)->sa_family)
1094         {
1095                 case AF_INET:
1096                         inet_ntop(AF_INET, &(((struct sockaddr_in *)&sa)->sin_addr), host, sizeof(host));
1097                         break;
1098 #if defined(HAVE_IPV6)
1099                 case AF_INET6:
1100                         inet_ntop(AF_INET6, &(((struct sockaddr_in6 *)&sa)->sin6_addr), host, sizeof(host));
1101                         break;
1102 #endif
1103                 default:
1104                         error_message = "Unrecognized address family";
1105         }               
1106 out:    
1107         if (NULL != error_message)
1108         {
1109                 zabbix_log(LOG_LEVEL_WARNING, "Cannot get socket IP address: %s", error_message);
1110                 strscpy(host, "unknown IP");
1111         }       
1112         
1113         return host;
1114 }
...

asaveljevs Only one occurence of getnameinfo() was adjusted, but there is another one that needs it.

asaveljevs Also, the code will not compile on Windows, because there is no inet_ntop() function.

asaveljevs Do you think we could just adjust the old getnameinfo() invocation so that it works? REOPENED.

jurism Added a wrapper for getnameinfo(3) that basically implements the behavior of the OpenBSD version. The parameter list omits the structure size parameter because we don't have to strictly match the interface with our wrappers and the argument would remain unused. RESOLVED.

asaveljevs The branch does not compile. REOPENED.

jurism Added missing condition to switch..case construct. RESOLVED.

wiper CLOSED

Comment by Aleksandrs Saveljevs [ 2014 Feb 17 ]

(2) Comment header for zbx_getnameinfo() contains tabs. Should use spaces only.

asaveljevs Similarly, this `quoting' style should not be used. Either single quotes or double quotes are preferred.

jurism Not relevant to current solution. CLOSED.

Comment by Aleksandrs Saveljevs [ 2014 Feb 17 ]

(3) Could you please explain the reason for checking for the presence of size_t in configure.in? It is a standard type.

jurism We had to define our own type to pass to that specific (getnameinfo) function so not the presence but its actual definition and size was checked. This approach has been dropped in favor of a different solution. CLOSED.

Comment by Aleksandrs Saveljevs [ 2014 Feb 17 ]

(4) It might be better to name parameters in the wrapper function the same way as in the original getnameinfo() call (e.g., "host" and "hostlen" instead of "hbuf" and "hlen").

asaveljevs Why is there a need to cast "sa->sa_family" to "unsigned short"?

jurism Current solution has dropped this approach, thus this isn't relevant anymore. CLOSED.

Comment by Aleksandrs Saveljevs [ 2014 Feb 17 ]

(5) How about a simple #define like the following? The current solution seems too heavyweight.

#define zbx_getnameinfo(sa, host, hostlen, serv, servlen, flags) \
	getnameinfo(sa, (AF_INET == (sa)->sa_family ? sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6)), host, hostlen, serv, servlen, flags)

jurism This approach has been implemented and it has been decided that the warnings produced are not critical to functionality. RESOLVED.

wiper CLOSED

Comment by Andris Zeila [ 2014 Mar 27 ]

Successfully tested

Comment by Juris Miščenko (Inactive) [ 2014 Apr 08 ]

Implemented in 2.0.12rc1 r43804, 2.2.4.rc1 r44191, 2.3.0 (trunk) r44192

Comment by richlv [ 2014 Apr 08 ]

(6) changelog entries have agent and server components marked - didn't this affect proxy, too ?

jurism
Corrected. RESOLVED.

<richlv> thanks, CLOSED

Generated at Sat Apr 20 10:58:01 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.