Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-16392

Zabbix Agent SIGTERM from systemd Every Couple Minutes

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Won't fix
    • Icon: Critical Critical
    • None
    • 4.0.9
    • Agent (G)
    • CentOS Linux release 7.0.1406 (Core)

      Several hosts of our running the Zabbix agent appear to have issues with the agent shutting down after a couple minutes. This, in turn, means that we sporadically get alerts that "Zabbix agent on <server> is unreachable for 5 minutes" only to be resolved within 10 seconds. Digging deeper, it appears that systemd is set to restart Zabbix after 10 seconds, which explains why we get the "Resolved" alerts 10 seconds later.

      Our configuration is very simple for each agent. See attached zabbix_agentd.conf for one example.

      Resources look ok, so that doesn't appear to be an issue:

      [root@talhal11 ~]# top -b -n 1 | head -n 5
      top - 19:09:27 up 29 days, 36 min, 7 users, load average: 0.05, 0.05, 0.05
      Tasks: 331 total, 1 running, 330 sleeping, 0 stopped, 0 zombie
      %Cpu(s): 0.7 us, 0.1 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
      KiB Mem: 16262856 total, 13566872 used, 2695984 free, 1480 buffers
      KiB Swap: 5242876 total, 0 used, 5242876 free. 8503440 cached Mem

      It appears that Zabbix is receiving a SIGTERM every few minutes, shutting it down, and then restarting it. The attached zabbix_agentd.log (in debug level 5) shows a full instance of it running and then shutting down.

      Here's the output of "ps" showing some running processes when this happens:

      [root@talhal11 zabbix]# ps -aux
      USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
      root 1 0.0 0.0 47656 10280 ? Ss Jun21 17:42 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
      root 2 0.0 0.0 0 0 ? S Jun21 0:00 [kthreadd]
      <snipped for brevity>
      zabbix 19782 0.0 0.0 89640 1312 ? S 19:02 0:00 /usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf
      zabbix 19783 0.0 0.0 89640 1652 ? S 19:02 0:00 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
      zabbix 19784 0.0 0.0 89640 2080 ? S 19:02 0:00 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection]
      zabbix 19785 0.0 0.0 89640 2080 ? S 19:02 0:00 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection]
      zabbix 19786 0.0 0.0 89640 2080 ? S 19:02 0:00 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection]
      zabbix 19787 0.0 0.0 89776 2292 ? S 19:02 0:00 /usr/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
      root 19802 0.0 0.0 145704 1728 pts/3 R+ 19:02 0:00 ps -aux

      The log says the sender_pid is 1, so it appears to be systemd that is sending the SIGTERM (based on above output).

      I'm at best an intermediate Linux admin, but looking at the attached zabbix_agent.service file indicates that systemd will only restart the agent if it fails.

      So I'm at a loss. I'd love to help debug this. If you need any additional data please let me know.

        1. zabbix_agentd.conf
          10 kB
        2. zabbix_agentd.log
          50 kB
        3. zabbix-agent.service
          0.4 kB

            zabbix.support Zabbix Support Team
            kerrar Jason R
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: