-
Incident report
-
Resolution: Incomplete
-
Major
-
None
-
2.4.7
-
Ubuntu 14.04 64Bit + Zabbix 2.4.7 from official Zabbix repos.
Hi,
we use externalscripts for some special application monitorings. These scripts are not very complex: bash script with a ssh command to a remote system and some magic with output of ssh command.
We noticed yesterday that these scripts are flapping. But only when Zabbix runs these scripts. When we executed them manually than all works fine.
So we took a look into differences between both executions and we found that the ssh command failed because Zabbix pass through a "broken" stdin.
After creation from some additional items with externalscripts that give us the information which Zabbix process has this problem we found 7 poller processes of 150 with the some issue. Here 2 examples with a correct one and the wrong one:
### right one ###
# ps faux | grep "poller #107"
zabbix 23700 0.0 0.7 23329700 245344 ? S Feb24 5:21 \_ /usr/sbin/zabbix_server: poller #107 [got 0 values in 0.000005 sec, idle 1 sec]
# ls -la /proc/23700/fd/ total 0 dr-x------ 2 root root 0 Feb 24 10:09 . dr-xr-xr-x 9 zabbix zabbix 0 Feb 24 09:54 .. lr-x------ 1 root root 64 Feb 24 10:09 0 -> /dev/null l-wx------ 1 root root 64 Feb 24 10:09 1 -> /var/log/zabbix/zabbix_server.log.1 l-wx------ 1 root root 64 Feb 24 10:09 2 -> /var/log/zabbix/zabbix_server.log.1 l-wx------ 1 root root 64 Feb 24 10:09 3 -> /run/zabbix/zabbix_server.pid lrwx------ 1 root root 64 Feb 24 10:09 4 -> socket:[4815415] lrwx------ 1 root root 64 Feb 24 10:09 5 -> socket:[4815416] lrwx------ 1 root root 64 Feb 24 10:09 6 -> socket:[105276876]
### wrong one ###
# ps faux | grep "poller #108"
zabbix 23701 0.0 0.7 23329736 243472 ? S Feb24 5:19 \_ /usr/sbin/zabbix_server: poller #108 [got 0 values in 0.000189 sec, idle 1 sec]
# ls -la /proc/23701/fd/ total 0 dr-x------ 2 root root 0 Feb 24 10:09 . dr-xr-xr-x 9 zabbix zabbix 0 Feb 24 09:54 .. lrwx------ 1 root root 64 Feb 24 10:09 0 -> socket:[291604878] l-wx------ 1 root root 64 Feb 24 10:09 1 -> /var/log/zabbix/zabbix_server.log.1 l-wx------ 1 root root 64 Feb 24 10:09 2 -> /var/log/zabbix/zabbix_server.log.1 l-wx------ 1 root root 64 Feb 24 10:09 3 -> /run/zabbix/zabbix_server.pid lrwx------ 1 root root 64 Feb 24 10:09 4 -> socket:[4815415] lrwx------ 1 root root 64 Feb 24 10:09 5 -> socket:[4815416]
# lsof -n | grep 291604878 zabbix_se 23701 zabbix 0u IPv4 291604878 0t0 TCP 10.72.64.5:37487->10.0.3.40:postgresql (ESTABLISHED)
So it seems that poller #108 created a new postgres connection and used fd0 for the new socket. But fd0 shoud point to /dev/null.
We recognize this behaviour on Zabbix 1.8.x and 2.4.x. Upgrade to 3.0 is planned. But could take a look on this please? Maybe the issue is still existing in 3.0 too.
Regards,
Marcel