Several hosts of our running the Zabbix agent appear to have issues with the agent shutting down after a couple minutes. This, in turn, means that we sporadically get alerts that "Zabbix agent on <server> is unreachable for 5 minutes" only to be resolved within 10 seconds. Digging deeper, it appears that systemd is set to restart Zabbix after 10 seconds, which explains why we get the "Resolved" alerts 10 seconds later.
Our configuration is very simple for each agent. See attached zabbix_agentd.conf for one example.
Resources look ok, so that doesn't appear to be an issue:
[root@talhal11 ~]# top -b -n 1 | head -n 5
top - 19:09:27 up 29 days, 36 min, 7 users, load average: 0.05, 0.05, 0.05
Tasks: 331 total, 1 running, 330 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.7 us, 0.1 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 16262856 total, 13566872 used, 2695984 free, 1480 buffers
KiB Swap: 5242876 total, 0 used, 5242876 free. 8503440 cached Mem
It appears that Zabbix is receiving a SIGTERM every few minutes, shutting it down, and then restarting it. The attached zabbix_agentd.log (in debug level 5) shows a full instance of it running and then shutting down.
Here's the output of "ps" showing some running processes when this happens:
[root@talhal11 zabbix]# ps -aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 47656 10280 ? Ss Jun21 17:42 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
root 2 0.0 0.0 0 0 ? S Jun21 0:00 [kthreadd]
<snipped for brevity>
zabbix 19782 0.0 0.0 89640 1312 ? S 19:02 0:00 /usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf
zabbix 19783 0.0 0.0 89640 1652 ? S 19:02 0:00 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
zabbix 19784 0.0 0.0 89640 2080 ? S 19:02 0:00 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection]
zabbix 19785 0.0 0.0 89640 2080 ? S 19:02 0:00 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection]
zabbix 19786 0.0 0.0 89640 2080 ? S 19:02 0:00 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection]
zabbix 19787 0.0 0.0 89776 2292 ? S 19:02 0:00 /usr/sbin/zabbix_agentd: active checks #1 [idle 1 sec]
root 19802 0.0 0.0 145704 1728 pts/3 R+ 19:02 0:00 ps -aux
The log says the sender_pid is 1, so it appears to be systemd that is sending the SIGTERM (based on above output).
I'm at best an intermediate Linux admin, but looking at the attached zabbix_agent.service file indicates that systemd will only restart the agent if it fails.
So I'm at a loss. I'd love to help debug this. If you need any additional data please let me know.