Details

      Description

      I have configured agentd to connect to the Zabbix server by using a hostname, instead of an IP. It seams to me that for every item queried, gethostbyname() is called.

      192.168.1.15 - DNS server in /etc/resolv.conf
      192.168.1.13 - Host where the agent is running
      zabbix.elabsis.com - Zabbix server

      tcpdump sample from the 192.168.1.13:

      17:12:15.075198 IP 192.168.1.15.53 > 192.168.1.13.46016: 25749* 1/2/2 A[|domain]
      17:12:15.083781 IP 192.168.1.13.45727 > 192.168.1.15.53: 14928+ A? zabbix.elabsis.com. (36)
      17:12:15.084509 IP 192.168.1.15.53 > 192.168.1.13.45727: 14928* 1/2/2 A[|domain]
      17:12:15.153713 IP 192.168.1.13.42149 > 192.168.1.15.53: 8538+ A? zabbix.elabsis.com. (36)
      17:12:15.154379 IP 192.168.1.15.53 > 192.168.1.13.42149: 8538* 1/2/2 A[|domain]
      17:12:15.172920 IP 192.168.1.13.47295 > 192.168.1.15.53: 19772+ A? zabbix.elabsis.com. (36)
      17:12:15.173582 IP 192.168.1.15.53 > 192.168.1.13.47295: 19772* 1/2/2 A[|domain]
      17:12:16.371257 IP 192.168.1.13.33089 > 192.168.1.15.53: 48187+ A? zabbix.elabsis.com. (36)
      17:12:16.371979 IP 192.168.1.15.53 > 192.168.1.13.33089: 48187* 1/2/2 A[|domain]
      17:12:16.374865 IP 192.168.1.13.36497 > 192.168.1.15.53: 9540+ A? zabbix.elabsis.com. (36)
      17:12:16.375738 IP 192.168.1.15.53 > 192.168.1.13.36497: 9540* 1/2/2 A[|domain]
      17:12:16.434436 IP 192.168.1.13.43697 > 192.168.1.15.53: 12163+ A? zabbix.elabsis.com. (36)
      17:12:16.435174 IP 192.168.1.15.53 > 192.168.1.13.43697: 12163* 1/2/2 A[|domain]
      17:12:17.662556 IP 192.168.1.13.43674 > 192.168.1.15.53: 50570+ A? zabbix.elabsis.com. (36)
      17:12:17.663198 IP 192.168.1.15.53 > 192.168.1.13.43674: 50570* 1/2/2 A[|domain]
      17:12:17.684247 IP 192.168.1.13.42368 > 192.168.1.15.53: 9102+ A? zabbix.elabsis.com. (36)
      17:12:17.684994 IP 192.168.1.15.53 > 192.168.1.13.42368: 9102* 1/2/2 A[|domain]
      17:12:17.698499 IP 192.168.1.13.60401 > 192.168.1.15.53: 960+ A? zabbix.elabsis.com. (36)
      17:12:17.699178 IP 192.168.1.15.53 > 192.168.1.13.60401: 960* 1/2/2 A[|domain]
      17:12:17.727855 IP 192.168.1.13.51960 > 192.168.1.15.53: 8929+ A? zabbix.elabsis.com. (36)
      17:12:17.728542 IP 192.168.1.15.53 > 192.168.1.13.51960: 8929* 1/2/2 A[|domain]
      17:12:17.751009 IP 192.168.1.13.38707 > 192.168.1.15.53: 10705+ A? zabbix.elabsis.com. (36)
      17:12:17.751738 IP 192.168.1.15.53 > 192.168.1.13.38707: 10705* 1/2/2 A[|domain]
      17:12:18.916101 IP 192.168.1.13.38377 > 192.168.1.15.53: 33673+ A? zabbix.elabsis.com. (36)
      17:12:18.916865 IP 192.168.1.15.53 > 192.168.1.13.38377: 33673* 1/2/2 A[|domain]
      17:12:19.976746 IP 192.168.1.13.52731 > 192.168.1.15.53: 37538+ A? zabbix.elabsis.com. (36)
      17:12:19.977385 IP 192.168.1.15.53 > 192.168.1.13.52731: 37538* 1/2/2 A[|domain]
      17:12:20.007473 IP 192.168.1.13.49112 > 192.168.1.15.53: 1608+ A? zabbix.elabsis.com. (36)
      17:12:20.008169 IP 192.168.1.15.53 > 192.168.1.13.49112: 1608* 1/2/2 A[|domain]
      17:12:20.054203 IP 192.168.1.13.56993 > 192.168.1.15.53: 36012+ A? zabbix.elabsis.com. (36)
      17:12:20.054905 IP 192.168.1.15.53 > 192.168.1.13.56993: 36012* 1/2/2 A[|domain]
      17:12:20.109898 IP 192.168.1.13.54451 > 192.168.1.15.53: 13610+ A? zabbix.elabsis.com. (36)
      17:12:20.110585 IP 192.168.1.15.53 > 192.168.1.13.54451: 13610* 1/2/2 A[|domain]
      17:12:20.194716 IP 192.168.1.13.34243 > 192.168.1.15.53: 14421+ A? zabbix.elabsis.com. (36)
      17:12:20.195350 IP 192.168.1.15.53 > 192.168.1.13.34243: 14421* 1/2/2 A[|domain]
      17:12:20.198975 IP 192.168.1.13.49746 > 192.168.1.15.53: 37948+ A? zabbix.elabsis.com. (36)
      17:12:20.199711 IP 192.168.1.15.53 > 192.168.1.13.49746: 37948* 1/2/2 A[|domain]
      17:12:20.271991 IP 192.168.1.13.43933 > 192.168.1.15.53: 60673+ A? zabbix.elabsis.com. (36)
      17:12:20.272650 IP 192.168.1.15.53 > 192.168.1.13.43933: 60673* 1/2/2 A[|domain]
      

      After adding a dozen hosts using the hostname of the server, my internal DNS server is being hammered by the same requests over and over.

      Using the IP address of the Zabbix server does not create any DNS traffic.

        Activity

        Hide
        richlv added a comment -

        indeed, zabbix daemons don't do any dns caching, thus ip usage is suggested. alternatively, nscd or similar name server caching daemons could be used to reduce the amount of remote dns queries

        Show
        richlv added a comment - indeed, zabbix daemons don't do any dns caching, thus ip usage is suggested. alternatively, nscd or similar name server caching daemons could be used to reduce the amount of remote dns queries
        Hide
        Miguel Di Ciurcio Filho added a comment -

        Just a notice in case someone tries to use nscd in Debian or Ubuntu: hosts lookups are not cached by default.

        http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=335476

        http://sourceware.org/bugzilla/show_bug.cgi?id=4428

        You must edit /etc/nscd.conf to enable it, but it is not that reliable.

        Show
        Miguel Di Ciurcio Filho added a comment - Just a notice in case someone tries to use nscd in Debian or Ubuntu: hosts lookups are not cached by default. http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=335476 http://sourceware.org/bugzilla/show_bug.cgi?id=4428 You must edit /etc/nscd.conf to enable it, but it is not that reliable.
        Hide
        Miguel Di Ciurcio Filho added a comment -

        I have just noticed that zabbix_server has the same issue. When connecting to hosts using DNS name, for every item queried there is name a resolution call.

        Show
        Miguel Di Ciurcio Filho added a comment - I have just noticed that zabbix_server has the same issue. When connecting to hosts using DNS name, for every item queried there is name a resolution call.
        Hide
        Cisco Vila added a comment -

        Has there been any solutions to this? We are experiencing the same issues for Enterprise installations.

        Show
        Cisco Vila added a comment - Has there been any solutions to this? We are experiencing the same issues for Enterprise installations.
        Hide
        Marc added a comment -

        As mentioned before, use nscd.
        Name service caching is task of the operating system respectively task of systems like nscd. Alternatively one could use a local DNS server for caching.

        Caching domain names has no place to be within ZABBIX applications.

        Show
        Marc added a comment - As mentioned before, use nscd. Name service caching is task of the operating system respectively task of systems like nscd. Alternatively one could use a local DNS server for caching. Caching domain names has no place to be within ZABBIX applications.
        Hide
        Oleksiy Zagorskyi added a comment - - edited

        I've just checked zabbix agent (2.0.5) under Windows XP x32 and Windows 7 x64 - grab packets by wireshark.
        As expected there no problems - windows caches resolved DNS names by default.
        It's true as for requesting list of active checks (ServerActive=DNSname) and for agent passive incoming TCP connections (Server=DNSname)

        The are windows build-in useful commands which are self-descriptive:
        > ipconfig /displaydns
        > ipconfig /flushdns

        Just a note - some 3rd party software can optionally? disable the caching.
        I know at least one - Kaspersky antivirus provides possibility to disable it during installation.

        Show
        Oleksiy Zagorskyi added a comment - - edited I've just checked zabbix agent (2.0.5) under Windows XP x32 and Windows 7 x64 - grab packets by wireshark. As expected there no problems - windows caches resolved DNS names by default. It's true as for requesting list of active checks (ServerActive=DNSname) and for agent passive incoming TCP connections (Server=DNSname) The are windows build-in useful commands which are self-descriptive: > ipconfig /displaydns > ipconfig /flushdns Just a note - some 3rd party software can optionally? disable the caching. I know at least one - Kaspersky antivirus provides possibility to disable it during installation.
        Hide
        Oleksiy Zagorskyi added a comment - - edited

        I've performed some additional experiments under Debian 6.0.7
        Linux it0 3.2.0-3-amd64 #1 SMP Mon Jul 23 02:45:17 UTC 2012 x86_64 GNU/Linux
        Agent is ~2.0.5 compiled with IPv6 support. I'm using only IPv4 addresses in the experiments.

        Every time when an agent need to know Server's IP the agent host sends two DNS queries simultaneously: "Type: A (Host address)" and "Type: AAAA (IPv6 address)" and then gets two DNS answers respectively. (ZBX-4252 also mentions this and with some details)

        For active checks: if BufferSend is default 5 seconds then agent 12 times per one minute performs queries to DNS server + one query for list of active checks.
        This is true disregarding on count of monitored items and their update interval (I hope you don't overflow agent's BufferSize here )
        So not so bad

        But it more critical for passive checks. For every incoming TCP connection from zabbix server agent performs separate query to DNS server.
        So you can calculate agent_items*update interval=number of queries for a period from every agent host to DNS server.

        Show
        Oleksiy Zagorskyi added a comment - - edited I've performed some additional experiments under Debian 6.0.7 Linux it0 3.2.0-3-amd64 #1 SMP Mon Jul 23 02:45:17 UTC 2012 x86_64 GNU/Linux Agent is ~2.0.5 compiled with IPv6 support. I'm using only IPv4 addresses in the experiments. Every time when an agent need to know Server's IP the agent host sends two DNS queries simultaneously: "Type: A (Host address)" and "Type: AAAA (IPv6 address)" and then gets two DNS answers respectively. ( ZBX-4252 also mentions this and with some details) For active checks: if BufferSend is default 5 seconds then agent 12 times per one minute performs queries to DNS server + one query for list of active checks. This is true disregarding on count of monitored items and their update interval (I hope you don't overflow agent's BufferSize here ) So not so bad But it more critical for passive checks. For every incoming TCP connection from zabbix server agent performs separate query to DNS server. So you can calculate agent_items*update interval=number of queries for a period from every agent host to DNS server.
        Hide
        Oleksiy Zagorskyi added a comment - - edited

        ONLY about agent:

        What if will cache only IP of resolved "Server" and "ServerActive" parameters and will refresh it with some "hardcoded|configured|TTL from DNS response" period ?

        It will be maybe 1-5 values (remember about multiple servers for active checks support) which agent has to handle and it will resolve current issue.
        It should be not so hard to implement, IMO.

        I don't see much sense to cache any other IPs possibly used in items.

        Show
        Oleksiy Zagorskyi added a comment - - edited ONLY about agent: What if will cache only IP of resolved "Server" and "ServerActive" parameters and will refresh it with some "hardcoded|configured|TTL from DNS response" period ? It will be maybe 1-5 values (remember about multiple servers for active checks support) which agent has to handle and it will resolve current issue. It should be not so hard to implement, IMO. I don't see much sense to cache any other IPs possibly used in items.
        Hide
        Oleksiy Zagorskyi added a comment -

        I've just tested agent on a FreeBSD 8.1 x32 host.
        Agent is also compiled with IPv6 support.

        Behavior is the same - DNS query for every incoming TCP connection from server.
        But order for queries a bit differs from Linux. Here is IPv4 query-response first and then IPv6 ones:

        01:50:16.930705 IP 10.20.0.20.20708 > 10.20.0.10.53: 21340+ A? mon2 (33)
        01:50:16.930886 IP 10.20.0.10.53 > 10.20.0.20.20708: 21340* 1/0/0 A 10.20.0.32 (49)
        01:50:16.930929 IP 10.20.0.20.13973 > 10.20.0.10.53: 21341+ AAAA? mon2 (33)
        01:50:16.931051 IP 10.20.0.10.53 > 10.20.0.20.13973: 21341* 0/1/0 (113)
        

        I don't think I need to test something else.

        Show
        Oleksiy Zagorskyi added a comment - I've just tested agent on a FreeBSD 8.1 x32 host. Agent is also compiled with IPv6 support. Behavior is the same - DNS query for every incoming TCP connection from server. But order for queries a bit differs from Linux. Here is IPv4 query-response first and then IPv6 ones: 01:50:16.930705 IP 10.20.0.20.20708 > 10.20.0.10.53: 21340+ A? mon2 (33) 01:50:16.930886 IP 10.20.0.10.53 > 10.20.0.20.20708: 21340* 1/0/0 A 10.20.0.32 (49) 01:50:16.930929 IP 10.20.0.20.13973 > 10.20.0.10.53: 21341+ AAAA? mon2 (33) 01:50:16.931051 IP 10.20.0.10.53 > 10.20.0.20.13973: 21341* 0/1/0 (113) I don't think I need to test something else.
        Hide
        Oleksiy Zagorskyi added a comment -

        Just additional small test on Debian.
        Zabbix server v2.0.5 compiled with and without IPv6 support for single passive item check where host monitored by DNS name:
        with IPv6 - performs two queries to DNS;
        without IPv6 - performs one query to DNS.

        Show
        Oleksiy Zagorskyi added a comment - Just additional small test on Debian. Zabbix server v2.0.5 compiled with and without IPv6 support for single passive item check where host monitored by DNS name: with IPv6 - performs two queries to DNS; without IPv6 - performs one query to DNS.
        Hide
        Oleksiy Zagorskyi added a comment -

        heh, last comment about zabbix sources.

        There is some difference - with IPv6 zabbix daemons use "getaddrinfo()" system call, but without IPv6 - another system call - "gethostbyname()".
        See /src/libs/zbxcomms/comms.c lines 282 and 348

        This is also mentioned in ZBX-6326

        Show
        Oleksiy Zagorskyi added a comment - heh, last comment about zabbix sources. There is some difference - with IPv6 zabbix daemons use "getaddrinfo()" system call, but without IPv6 - another system call - "gethostbyname()". See /src/libs/zbxcomms/comms.c lines 282 and 348 This is also mentioned in ZBX-6326
        Hide
        richlv added a comment -

        if we ever look into implementing this, it should be very well documented and there must be a way to drop this internal dns cache (like we can reload config for server) - lately i've hit some other software doing its own dns caching, and that can be mighty confusing

        Show
        richlv added a comment - if we ever look into implementing this, it should be very well documented and there must be a way to drop this internal dns cache (like we can reload config for server) - lately i've hit some other software doing its own dns caching, and that can be mighty confusing
        Hide
        Volker Fröhlich added a comment -

        This issue is somewhat connected to ZBXNEXT-1862. One thing I find worth mentioning, is, that different caching solutions offer different feature sets. I remember comparing nscd, dnsmasq and bind at some point. I think to remember that one would not cache PTR records, the other one would not cache MX records, which can be relevant on the Zabbix server machine. My opinion matches Marc's: I don't consider it a Zabbix issue.

        Show
        Volker Fröhlich added a comment - This issue is somewhat connected to ZBXNEXT-1862 . One thing I find worth mentioning, is, that different caching solutions offer different feature sets. I remember comparing nscd, dnsmasq and bind at some point. I think to remember that one would not cache PTR records, the other one would not cache MX records, which can be relevant on the Zabbix server machine. My opinion matches Marc's: I don't consider it a Zabbix issue.
        Hide
        MightyDok added a comment - - edited

        Russian hosting provider RTCOMM have DNS rate limit settings for domains, so with standard linux template zabbix server fault to query agent with error "ZBX_TCP_READ() failed: [4] Interrupted system call". I think we need this feature ASAP.

        Show
        MightyDok added a comment - - edited Russian hosting provider RTCOMM have DNS rate limit settings for domains, so with standard linux template zabbix server fault to query agent with error "ZBX_TCP_READ() failed: [4] Interrupted system call". I think we need this feature ASAP.
        Hide
        Dmitry added a comment - - edited

        Lab:

        • LAN
        • ~100 agents
        • 2 DNS server (master-slave)
        • 1 zabbix-server 3.0.2 + mysql

        Works for me:
        1. Zabbix Server-related overload solve by nscd
        2. Zabbix client-related overload by:

        • setting short hostname in Server= and ServerActive= (like Server=zabbix instead of Server=zabbix.mydomain)
        • enshure that /etc/resolv.conf has full domain at first place in 'search' parameter (like 'mydomain.myprovider myprovider', but not 'myprovider mydomain.myprovider'
        • add 'option rotate' to resolv.conf

        Agent config + correct 'search' parameter make first zabbix agent lookup success. So, it doesnt try other suffixes (only 2 request per connection)
        'option rotate' balance loading between master and slave DNS (may not affect on zabbix agent, but balance other requests)

        Other way: configure nscd on each agent host. But nscd may cause of mistiming between real network and nscd cache
        On zabbix server nscd's negative cache with 20min TTL increase unreachable pollers load

        Show
        Dmitry added a comment - - edited Lab: LAN ~100 agents 2 DNS server (master-slave) 1 zabbix-server 3.0.2 + mysql Works for me: 1. Zabbix Server-related overload solve by nscd 2. Zabbix client-related overload by: setting short hostname in Server= and ServerActive= (like Server=zabbix instead of Server=zabbix.mydomain) enshure that /etc/resolv.conf has full domain at first place in 'search' parameter (like 'mydomain.myprovider myprovider', but not 'myprovider mydomain.myprovider' add 'option rotate' to resolv.conf Agent config + correct 'search' parameter make first zabbix agent lookup success. So, it doesnt try other suffixes (only 2 request per connection) 'option rotate' balance loading between master and slave DNS (may not affect on zabbix agent, but balance other requests) Other way: configure nscd on each agent host. But nscd may cause of mistiming between real network and nscd cache On zabbix server nscd's negative cache with 20min TTL increase unreachable pollers load

          People

          • Assignee:
            Unassigned
            Reporter:
            Miguel Di Ciurcio Filho
          • Votes:
            8 Vote for this issue
            Watchers:
            12 Start watching this issue

            Dates

            • Created:
              Updated: