[ZBX-7046] bug in get_ip_by_socket() Created: 2013 Sep 24 Updated: 2017 May 30 Resolved: 2014 Apr 24 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | API (A), Server (S) |
Affects Version/s: | 2.0.8 |
Fix Version/s: | 2.0.12rc1, 2.2.4rc1, 2.3.0 |
Type: | Incident report | Priority: | Major |
Reporter: | Nikita Kozlov | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 1 |
Labels: | autoregistration, ipv4, ipv6 | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
FreeBSD ... 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 UTC 2012 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64 |
Issue Links: |
|
Description |
get_ip_by_socket() always return '\0' for ipv4 socket if zabbix_server compiled with IPv6 support. This breaks auto registration of new agents without explicitly defined "ListenIP" in zabbix_agentd.conf. |
Comments |
Comment by Riaan Olivier [ 2013 Dec 17 ] |
Found the same issue with the following configuration:
86088:20131217:141546.415 Cannot get socket IP address: [42] Protocol not available 86088:20131217:141549.865 Cannot get socket IP address: [42] Protocol not available |
Comment by Aleksandrs Saveljevs [ 2013 Dec 17 ] |
Related issue: |
Comment by Juris Miščenko (Inactive) [ 2014 Jan 16 ] |
Fixed in svn://svn.zabbix.com/branches/dev/ZBX-7046 |
Comment by Aleksandrs Saveljevs [ 2014 Jan 17 ] |
(1) Your attempt at the fix adds passing of "serv" and "servlen" arguments to getnameinfo(). Could you please explain the rationale behind this change? Manual page for getnameinfo() for Linux explicitly says that "serv" and "servlen" parameters are optional:
The manual page on FreeBSD does not seem to make a different point either. On the contrary, it omits "serv" and "servlen" in one of the examples. Finally, I have tested the branch on FreeBSD 8.2. I can confirm that the original problem is there and that the fix did not make a difference. jurism In this particular case, we made a call to the library function getnameinfo(3) which takes in a pointer to a generic socket address structure (of type sockaddr_in or sockaddr_in6), the size of the structure, two buffers for the returned results (the host address and the service port), their length and four octets containing flags for determining the type of information to be returned to the caller. The way BSD sockets work is that they provide a unified address structure interface via typecasts. Any socket address structure can be typecast to the base structure `struct sockaddr' which contains the most vital socket information to provide functionality (an unsigned short containing the address family of the socket and 14 bytes for the address information). `struct sockaddr' is the smallest structure of the family, with a total of 16 bytes (coincidentally, the same as `struct sockaddr_in'). On the other end of the spectrum is `struct sockaddr_stroage' which is the largest structure, containing 128 bytes of information, which is guaranteed to hold all and any socket information, independent of the socket type or address family. So a lot of functions request that you typecast your socket structures to the base type (struct sockaddr) which is then used to determine further actions to be taken. The way FreeBSD have implemented their getnameinfo(3) lib call is that they have an external array holding records for every known address family, and the characteristics (for example, the size) of the socket structures that should be used to work with these address families, to which the passed parameters are validated against before anything is done, and in case of inconsistency, an EAI_FAIL is returned (I'm not 100% sure this return value is valid anymore, because while skimming over the FreeBSD libc code, I found that it, along with a few others, were disabled, yet had no replacements, and were still used in the getnameinfo(3) code). This is the characteristics array definition in FreeBSD: head/lib/libc/net/getnameinfo.c ... 97 static const struct afd { 98 int a_af; 99 size_t a_addrlen; 100 socklen_t a_socklen; 101 int a_off; 102 } afdl [] = { 103 #ifdef INET6 104 {PF_INET6, sizeof(struct in6_addr), sizeof(struct sockaddr_in6), 105 offsetof(struct sockaddr_in6, sin6_addr)}, 106 #endif 107 {PF_INET, sizeof(struct in_addr), sizeof(struct sockaddr_in), 108 offsetof(struct sockaddr_in, sin_addr)}, 109 {0, 0, 0}, 110 }; ... The way OpenBSD approach this is that getnameinfo(3) is a wrapper function that does a little typecasting and argument preparation for the actual function that is used to perform the lookup. They don't use an array with predefined values to validate the parameters against. The approach they take is typecasting the passed socket address structure to the appropriate structure based on the address family defined in the first short of the base socket address structure (struct sockaddr) and then calculate the values to be passed to the underlying function on the fly. This approach can be seen here: OpenBSD-lib-patches/libc/net/getnameinfo.c ... 97 int 98 getnameinfo(const struct sockaddr *sa, socklen_t salen, 99 char *host, size_t hostlen, 100 char *serv, size_t servlen, 101 int flags) 102 { 103 switch (sa->sa_family) { 104 #ifdef HAVE_IPV6 105 case AF_INET6 : { 106 const struct sockaddr_in6 *sin6 = (const struct sockaddr_in6 *)sa; 107 108 return doit (AF_INET6, &sin6->sin6_addr, sizeof(sin6->sin6_addr), 109 sin6->sin6_port, 110 host, hostlen, 111 serv, servlen, 112 flags); 113 } 114 #endif 115 case AF_INET : { 116 const struct sockaddr_in *sin = (const struct sockaddr_in *)sa; 117 118 return doit (AF_INET, &sin->sin_addr, sizeof(sin->sin_addr), 119 sin->sin_port, 120 host, hostlen, 121 serv, servlen, 122 flags); 123 } 124 default : 125 return EAI_FAMILY; 126 } 127 } ... In both of these cases, the length to which the passed size parameter is compared is either sizeof(in_addr) or sizeof(in6_addr) depending on the address family detected in the first two bytes of the passed structure. The reason why this bug manifested itself was the way we determined the arguments to pass it. Until now, the way we did it was that we used a different structure for holding the socket address information depending on whether IPv6 support was enabled. Here's the code: src/libs/zbxcomms/comms.c ... 24 #if defined(HAVE_IPV6) 25 # define ZBX_SOCKADDR struct sockaddr_storage 26 #else 27 # define ZBX_SOCKADDR struct sockaddr_in 28 #endif ... 1079 char *get_ip_by_socket(zbx_sock_t *s) 1080 { 1081 ZBX_SOCKADDR sa; 1082 ZBX_SOCKLEN_T sz = sizeof(sa); ... At this point, we have declared a socket structure `sa' of either type `struct sockaddr_in' or `struct sockaddr_storage' if IPv6 is disabled or enabled, respectively. Here is the call to getnameinfo(3) with a little context: src/libs/zbxcomms/comms.c ... 1093 #if defined(HAVE_IPV6) 1094 if (0 != getnameinfo((struct sockaddr*)&sa, sizeof(sa), host, sizeof(host), NULL, 0, NI_NUMERICHOST)) 1095 { 1096 error_message = strerror_from_system(zbx_sock_last_error()); 1097 zbx_set_tcp_strerror("connection rejected, getnameinfo() failed: %s", error_message); 1098 } 1099 #else ... The problem that can be seen here is that `sa' is originally of type `struct sockaddr_storage' (due to this executing only when IPv6 is enabled and thus ZBX_SOCKADDR is defined as `struct sockaddr_storage'). As I mentioned earlier, BSD sockets try to have a unified interface via typecasting and then further addressing the contents of the structure based on the determined address family. The size parameter (2nd) here is meant to validate the passed arguments, but if it is looked at closer, it can also be used to verify whether the system implements correct sizes for the address socket structures. The way it verifies whether the passed size parameter is consistent is by looking at the first parameters `sa_family' value to determine the sockets family and based on that determine the appropriate size that should have been passed. We fail this validation because we pass the size of the structure that we use for storing the address information. To remedy this, in this specific case, the size of the appropriate address structure for the used address family should be passed. While researching this issue, I concluded that the best practice, in general, when passing arguments to a function that requires a descriptive argument of another argument, would be to pass the size of the appropriate type to which the described parameter is typecast (i.e. if we pass an int typecast to a char, we should pass a size argument of sizeof(char), rather than the actual size of the parameter that is being typecast). This also improves portability and clarifies the code. Also, in essence, when we're typecasting, we're passing a completely different type (hence `typecasting'), which means that the underlying functions can and may address the data in different ways, and passing an incorrect size value is, in my opinion, a disaster waiting to happen. This approach, unfortunately, is not always consistent with socket address structures due to their portability interface and how various functions determine the validity of the values passed to them. This is also one of the reasons sockets have been a stumbling block for novice developers. After evaluating the way we implemented this functionality previously, I tried to not move away from the original idea of how it was done, although one of my first thoughts was to drop the getnameinfo(3) call completely, because we were already using getpeername(2), which has been around since 4.2BSD and is IPv4 and IPv6 capable, but I wanted to change the code as little as possible. But the further I dug, I realized how this may be a point of confusion for anyone later reviewing the code or debugging it and that it would force them to do the same things I did to nail it down. So it was decided to drop the getnameinfo(3) call altogether and just use a combination of getpeername(2) and inet_ntop(3). This, this is how our code looks now, after the changes: src/libs/zbxcomms/comms.c ... 1079 char *get_ip_by_socket(zbx_sock_t *s) 1080 { 1081 ZBX_SOCKADDR sa; 1082 ZBX_SOCKLEN_T sz = sizeof(sa); 1083 static char host[64]; 1084 char *error_message = NULL; 1085 1086 if (ZBX_TCP_ERROR == getpeername(s->socket, (struct sockaddr*)&sa, &sz)) 1087 { 1088 error_message = strerror_from_system(zbx_sock_last_error()); 1089 zbx_set_tcp_strerror("connection rejected, getpeername() failed: %s", error_message); 1090 goto out; 1091 } 1092 1093 switch (((struct sockaddr *)&sa)->sa_family) 1094 { 1095 case AF_INET: 1096 inet_ntop(AF_INET, &(((struct sockaddr_in *)&sa)->sin_addr), host, sizeof(host)); 1097 break; 1098 #if defined(HAVE_IPV6) 1099 case AF_INET6: 1100 inet_ntop(AF_INET6, &(((struct sockaddr_in6 *)&sa)->sin6_addr), host, sizeof(host)); 1101 break; 1102 #endif 1103 default: 1104 error_message = "Unrecognized address family"; 1105 } 1106 out: 1107 if (NULL != error_message) 1108 { 1109 zabbix_log(LOG_LEVEL_WARNING, "Cannot get socket IP address: %s", error_message); 1110 strscpy(host, "unknown IP"); 1111 } 1112 1113 return host; 1114 } ... asaveljevs Only one occurence of getnameinfo() was adjusted, but there is another one that needs it. asaveljevs Also, the code will not compile on Windows, because there is no inet_ntop() function. asaveljevs Do you think we could just adjust the old getnameinfo() invocation so that it works? REOPENED. jurism Added a wrapper for getnameinfo(3) that basically implements the behavior of the OpenBSD version. The parameter list omits the structure size parameter because we don't have to strictly match the interface with our wrappers and the argument would remain unused. RESOLVED. asaveljevs The branch does not compile. REOPENED. jurism Added missing condition to switch..case construct. RESOLVED. wiper CLOSED |
Comment by Aleksandrs Saveljevs [ 2014 Feb 17 ] |
(2) Comment header for zbx_getnameinfo() contains tabs. Should use spaces only. asaveljevs Similarly, this `quoting' style should not be used. Either single quotes or double quotes are preferred. jurism Not relevant to current solution. CLOSED. |
Comment by Aleksandrs Saveljevs [ 2014 Feb 17 ] |
(3) Could you please explain the reason for checking for the presence of size_t in configure.in? It is a standard type. jurism We had to define our own type to pass to that specific (getnameinfo) function so not the presence but its actual definition and size was checked. This approach has been dropped in favor of a different solution. CLOSED. |
Comment by Aleksandrs Saveljevs [ 2014 Feb 17 ] |
(4) It might be better to name parameters in the wrapper function the same way as in the original getnameinfo() call (e.g., "host" and "hostlen" instead of "hbuf" and "hlen"). asaveljevs Why is there a need to cast "sa->sa_family" to "unsigned short"? jurism Current solution has dropped this approach, thus this isn't relevant anymore. CLOSED. |
Comment by Aleksandrs Saveljevs [ 2014 Feb 17 ] |
(5) How about a simple #define like the following? The current solution seems too heavyweight. #define zbx_getnameinfo(sa, host, hostlen, serv, servlen, flags) \ getnameinfo(sa, (AF_INET == (sa)->sa_family ? sizeof(struct sockaddr_in) : sizeof(struct sockaddr_in6)), host, hostlen, serv, servlen, flags) jurism This approach has been implemented and it has been decided that the warnings produced are not critical to functionality. RESOLVED. wiper CLOSED |
Comment by Andris Zeila [ 2014 Mar 27 ] |
Successfully tested |
Comment by Juris Miščenko (Inactive) [ 2014 Apr 08 ] |
Implemented in 2.0.12rc1 r43804, 2.2.4.rc1 r44191, 2.3.0 (trunk) r44192 |
Comment by richlv [ 2014 Apr 08 ] |
(6) changelog entries have agent and server components marked - didn't this affect proxy, too ? jurism <richlv> thanks, CLOSED |