[ZBXNEXT-1002] dns caching by zabbix daemons Created: 2011 Oct 17 Updated: 2025 Feb 20 |
|
Status: | Open |
Project: | ZABBIX FEATURE REQUESTS |
Component/s: | Agent (G) |
Affects Version/s: | 1.8.8, 2.0.0 |
Fix Version/s: | None |
Type: | Change Request | Priority: | Minor |
Reporter: | Miguel Di Ciurcio Filho | Assignee: | Unassigned |
Resolution: | Unresolved | Votes: | 28 |
Labels: | cache, dns, gethostbyname, ipv6, performance | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Ubuntu 10.04 |
Issue Links: |
|
Description |
I have configured agentd to connect to the Zabbix server by using a hostname, instead of an IP. It seams to me that for every item queried, gethostbyname() is called. 192.168.1.15 - DNS server in /etc/resolv.conf tcpdump sample from the 192.168.1.13: 17:12:15.075198 IP 192.168.1.15.53 > 192.168.1.13.46016: 25749* 1/2/2 A[|domain] 17:12:15.083781 IP 192.168.1.13.45727 > 192.168.1.15.53: 14928+ A? zabbix.elabsis.com. (36) 17:12:15.084509 IP 192.168.1.15.53 > 192.168.1.13.45727: 14928* 1/2/2 A[|domain] 17:12:15.153713 IP 192.168.1.13.42149 > 192.168.1.15.53: 8538+ A? zabbix.elabsis.com. (36) 17:12:15.154379 IP 192.168.1.15.53 > 192.168.1.13.42149: 8538* 1/2/2 A[|domain] 17:12:15.172920 IP 192.168.1.13.47295 > 192.168.1.15.53: 19772+ A? zabbix.elabsis.com. (36) 17:12:15.173582 IP 192.168.1.15.53 > 192.168.1.13.47295: 19772* 1/2/2 A[|domain] 17:12:16.371257 IP 192.168.1.13.33089 > 192.168.1.15.53: 48187+ A? zabbix.elabsis.com. (36) 17:12:16.371979 IP 192.168.1.15.53 > 192.168.1.13.33089: 48187* 1/2/2 A[|domain] 17:12:16.374865 IP 192.168.1.13.36497 > 192.168.1.15.53: 9540+ A? zabbix.elabsis.com. (36) 17:12:16.375738 IP 192.168.1.15.53 > 192.168.1.13.36497: 9540* 1/2/2 A[|domain] 17:12:16.434436 IP 192.168.1.13.43697 > 192.168.1.15.53: 12163+ A? zabbix.elabsis.com. (36) 17:12:16.435174 IP 192.168.1.15.53 > 192.168.1.13.43697: 12163* 1/2/2 A[|domain] 17:12:17.662556 IP 192.168.1.13.43674 > 192.168.1.15.53: 50570+ A? zabbix.elabsis.com. (36) 17:12:17.663198 IP 192.168.1.15.53 > 192.168.1.13.43674: 50570* 1/2/2 A[|domain] 17:12:17.684247 IP 192.168.1.13.42368 > 192.168.1.15.53: 9102+ A? zabbix.elabsis.com. (36) 17:12:17.684994 IP 192.168.1.15.53 > 192.168.1.13.42368: 9102* 1/2/2 A[|domain] 17:12:17.698499 IP 192.168.1.13.60401 > 192.168.1.15.53: 960+ A? zabbix.elabsis.com. (36) 17:12:17.699178 IP 192.168.1.15.53 > 192.168.1.13.60401: 960* 1/2/2 A[|domain] 17:12:17.727855 IP 192.168.1.13.51960 > 192.168.1.15.53: 8929+ A? zabbix.elabsis.com. (36) 17:12:17.728542 IP 192.168.1.15.53 > 192.168.1.13.51960: 8929* 1/2/2 A[|domain] 17:12:17.751009 IP 192.168.1.13.38707 > 192.168.1.15.53: 10705+ A? zabbix.elabsis.com. (36) 17:12:17.751738 IP 192.168.1.15.53 > 192.168.1.13.38707: 10705* 1/2/2 A[|domain] 17:12:18.916101 IP 192.168.1.13.38377 > 192.168.1.15.53: 33673+ A? zabbix.elabsis.com. (36) 17:12:18.916865 IP 192.168.1.15.53 > 192.168.1.13.38377: 33673* 1/2/2 A[|domain] 17:12:19.976746 IP 192.168.1.13.52731 > 192.168.1.15.53: 37538+ A? zabbix.elabsis.com. (36) 17:12:19.977385 IP 192.168.1.15.53 > 192.168.1.13.52731: 37538* 1/2/2 A[|domain] 17:12:20.007473 IP 192.168.1.13.49112 > 192.168.1.15.53: 1608+ A? zabbix.elabsis.com. (36) 17:12:20.008169 IP 192.168.1.15.53 > 192.168.1.13.49112: 1608* 1/2/2 A[|domain] 17:12:20.054203 IP 192.168.1.13.56993 > 192.168.1.15.53: 36012+ A? zabbix.elabsis.com. (36) 17:12:20.054905 IP 192.168.1.15.53 > 192.168.1.13.56993: 36012* 1/2/2 A[|domain] 17:12:20.109898 IP 192.168.1.13.54451 > 192.168.1.15.53: 13610+ A? zabbix.elabsis.com. (36) 17:12:20.110585 IP 192.168.1.15.53 > 192.168.1.13.54451: 13610* 1/2/2 A[|domain] 17:12:20.194716 IP 192.168.1.13.34243 > 192.168.1.15.53: 14421+ A? zabbix.elabsis.com. (36) 17:12:20.195350 IP 192.168.1.15.53 > 192.168.1.13.34243: 14421* 1/2/2 A[|domain] 17:12:20.198975 IP 192.168.1.13.49746 > 192.168.1.15.53: 37948+ A? zabbix.elabsis.com. (36) 17:12:20.199711 IP 192.168.1.15.53 > 192.168.1.13.49746: 37948* 1/2/2 A[|domain] 17:12:20.271991 IP 192.168.1.13.43933 > 192.168.1.15.53: 60673+ A? zabbix.elabsis.com. (36) 17:12:20.272650 IP 192.168.1.15.53 > 192.168.1.13.43933: 60673* 1/2/2 A[|domain] After adding a dozen hosts using the hostname of the server, my internal DNS server is being hammered by the same requests over and over. Using the IP address of the Zabbix server does not create any DNS traffic. |
Comments |
Comment by richlv [ 2011 Oct 18 ] |
indeed, zabbix daemons don't do any dns caching, thus ip usage is suggested. alternatively, nscd or similar name server caching daemons could be used to reduce the amount of remote dns queries |
Comment by Miguel Di Ciurcio Filho [ 2011 Oct 18 ] |
Just a notice in case someone tries to use nscd in Debian or Ubuntu: hosts lookups are not cached by default. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=335476 http://sourceware.org/bugzilla/show_bug.cgi?id=4428 You must edit /etc/nscd.conf to enable it, but it is not that reliable. |
Comment by Miguel Di Ciurcio Filho [ 2011 Oct 18 ] |
I have just noticed that zabbix_server has the same issue. When connecting to hosts using DNS name, for every item queried there is name a resolution call. |
Comment by Cisco Vila [ 2013 Feb 01 ] |
Has there been any solutions to this? We are experiencing the same issues for Enterprise installations. |
Comment by Marc [ 2013 Feb 07 ] |
As mentioned before, use nscd. Caching domain names has no place to be within ZABBIX applications. |
Comment by Oleksii Zagorskyi [ 2013 Mar 29 ] |
I've just checked zabbix agent (2.0.5) under Windows XP x32 and Windows 7 x64 - grab packets by wireshark. The are windows build-in useful commands which are self-descriptive: Just a note - some 3rd party software can optionally? disable the caching. |
Comment by Oleksii Zagorskyi [ 2013 Mar 30 ] |
I've performed some additional experiments under Debian 6.0.7 Every time when an agent need to know Server's IP the agent host sends two DNS queries simultaneously: "Type: A (Host address)" and "Type: AAAA (IPv6 address)" and then gets two DNS answers respectively. ( For active checks: if BufferSend is default 5 seconds then agent 12 times per one minute performs queries to DNS server + one query for list of active checks. But it more critical for passive checks. For every incoming TCP connection from zabbix server agent performs separate query to DNS server. |
Comment by Oleksii Zagorskyi [ 2013 Mar 30 ] |
ONLY about agent: What if will cache only IP of resolved "Server" and "ServerActive" parameters and will refresh it with some "hardcoded|configured|TTL from DNS response" period ? It will be maybe 1-5 values (remember about multiple servers for active checks support) which agent has to handle and it will resolve current issue. I don't see much sense to cache any other IPs possibly used in items. |
Comment by Oleksii Zagorskyi [ 2013 Mar 30 ] |
I've just tested agent on a FreeBSD 8.1 x32 host. Behavior is the same - DNS query for every incoming TCP connection from server. 01:50:16.930705 IP 10.20.0.20.20708 > 10.20.0.10.53: 21340+ A? mon2 (33) 01:50:16.930886 IP 10.20.0.10.53 > 10.20.0.20.20708: 21340* 1/0/0 A 10.20.0.32 (49) 01:50:16.930929 IP 10.20.0.20.13973 > 10.20.0.10.53: 21341+ AAAA? mon2 (33) 01:50:16.931051 IP 10.20.0.10.53 > 10.20.0.20.13973: 21341* 0/1/0 (113) I don't think I need to test something else. |
Comment by Oleksii Zagorskyi [ 2013 Mar 30 ] |
Just additional small test on Debian. |
Comment by Oleksii Zagorskyi [ 2013 Mar 30 ] |
heh, last comment about zabbix sources. There is some difference - with IPv6 zabbix daemons use "getaddrinfo()" system call, but without IPv6 - another system call - "gethostbyname()". This is also mentioned in |
Comment by richlv [ 2014 Jan 14 ] |
if we ever look into implementing this, it should be very well documented and there must be a way to drop this internal dns cache (like we can reload config for server) - lately i've hit some other software doing its own dns caching, and that can be mighty confusing |
Comment by Volker Fröhlich [ 2014 Sep 30 ] |
This issue is somewhat connected to |
Comment by MightyDok [ 2015 Aug 07 ] |
Russian hosting provider RTCOMM have DNS rate limit settings for domains, so with standard linux template zabbix server fault to query agent with error "ZBX_TCP_READ() failed: [4] Interrupted system call". I think we need this feature ASAP. |
Comment by Dmitry [ 2017 Apr 18 ] |
Lab:
Works for me:
Agent config + correct 'search' parameter make first zabbix agent lookup success. So, it doesnt try other suffixes (only 2 request per connection) Other way: configure nscd on each agent host. But nscd may cause of mistiming between real network and nscd cache |
Comment by amg1127 [ 2017 Oct 21 ] |
I agree with caching of IP addresses of hostnames supplied on "Server=" and "ServerActive=" parameters (Oleksiy Zagorskyi's suggestion above). For example:
The network I manage has a Zabbix deployment composed by a Zabbix server, some geographically dispersed Zabbix proxies, more than 2700 monitored hosts and more than 400000 monitored items. Until my team implements Zabbix muthual authentication, we are relying on hostnames specified in "Server=" agentd parameter to provide monitoring access control. However, Zabbix infrastructure is currently consuming a relevant amount of resources from our internal DNS servers, because the absence of a cache produces one name resolution query for every passive metric collect. |
Comment by Vitaly Zhuravlev [ 2018 Nov 30 ] |
Another approach is to use systemd-resolved if you are on systemd |
Comment by MArk [ 2024 May 22 ] |
Despite the benefits of caching systems like NSCD or Systemd-Resolved, it seems that some applications can bypass the host's cache, and Zabbix may be one of them. This is unconfirmed and could not be validated. However, my tests show that it may be true. Therefore, I believe it would be beneficial if Zabbix processes did not bypass the caching systems. |
Comment by Vladislavs Sokurenko [ 2025 Feb 20 ] |
For passive checks and snmp this should already work after diff --git a/src/libs/zbxpoller/async_poller.c b/src/libs/zbxpoller/async_poller.c index ecd4938d60b..dd0f5936963 100644 --- a/src/libs/zbxpoller/async_poller.c +++ b/src/libs/zbxpoller/async_poller.c @@ -556,7 +556,7 @@ static void async_poller_dns_init(zbx_poller_config_t *poller_config, zbx_thread { char *timeout; #ifdef HAVE_ARES - struct ares_options options; + struct ares_options options = {0}; int optmask, status; status = ares_library_init(ARES_LIB_INIT_ALL); @@ -567,10 +567,11 @@ static void async_poller_dns_init(zbx_poller_config_t *poller_config, zbx_thread exit(EXIT_FAILURE); } - optmask = ARES_OPT_SOCK_STATE_CB|ARES_OPT_TIMEOUT; + optmask = ARES_OPT_SOCK_STATE_CB|ARES_OPT_TIMEOUT|ARES_OPT_QUERY_CACHE; options.sock_state_cb = sock_state_cb; options.sock_state_cb_data = poller_config; options.timeout = poller_args_in->config_comms->config_timeout; + options.qcache_max_ttl = 3600; status = ares_init_options(&poller_config->channel, &options, optmask); |