[ZBX-22061] zabbix_agent2 crashes when hitting the open file descriptor limit Created: 2022 Dec 09 Updated: 2024 Apr 10 Resolved: 2023 Jan 23 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 6.0.12, 6.2.6, 6.4.0beta4 |
Fix Version/s: | 6.0.13rc1, 6.2.7rc1, 6.4.0beta6, 6.4 (plan) |
Type: | Problem report | Priority: | Critical |
Reporter: | Edgar Akhmetshin | Assignee: | Eriks Sneiders |
Resolution: | Fixed | Votes: | 2 |
Labels: | systemd | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
RHEL 8.7 |
Issue Links: |
|
||||||||||||
Team: | |||||||||||||
Sprint: | Sprint 96 (Jan 2023) | ||||||||||||
Story Points: | 1 |
Description |
Steps to reproduce:
Result: 2022/12/09 11:58:45.245630 failed to read response for plugin PostgreSQL, failed to read type header, EOF Problem 2: Zabbix Agent just keeps crashing with another weird error: 2022/12/09 12:25:04.260087 failed to clean up after plugins, operation not permitted And if try to start back manually: # zabbix_agent2 -c /etc/zabbix/zabbix_agent2.conf Starting Zabbix Agent 2 (6.0.12) Zabbix Agent2 hostname: [Zabbix server] Press Ctrl+C to exit. panic: failed to obtain PID of dead child process: no child processes goroutine 12 [running]: main.listenOnPluginFail(0x0, {0xc0001798f0, 0x7}) /tmp/build-rhel-8-x86_64.H7NajgV0/buildroot/BUILD/zabbix-6.0.12/src/go/cmd/zabbix_agent2/external_nix.go:96 +0x168 created by main.initExternalPlugin /tmp/build-rhel-8-x86_64.H7NajgV0/buildroot/BUILD/zabbix-6.0.12/src/go/cmd/zabbix_agent2/external.go:94 +0x115 And all the building process path traceback for official packages: /tmp/build-rhel-8-x86_64.H7NajgV0/buildroot/BUILD/zabbix-6.0.12/src/go/cmd/zabbix_agent2/ Expected: No crash, if Zabbix Agent (Active) is used for common items of the official template. |
Comments |
Comment by pfoo [ 2022 Dec 19 ] |
I'm experiencing the same issue concerning postgresql and MongoDB plugins .. even with no postgresql/mongodb templates configured :
failed to read response for plugin MongoDB, failed to read type header, EOF
failed to read response for plugin PostgreSQL, failed to read type header, EOF
Zabbix-agent2 refusing to (re)start however always has the same error (even with no plugin error):
monitor zabbix_agent2[3112]: Starting Zabbix Agent 2 (6.0.12)
monitor zabbix_agent2[3112]: Zabbix Agent2 hostname: [monitor]
monitor zabbix_agent2[3112]: Press Ctrl+C to exit.
monitor zabbix_agent2[3112]: panic: failed to obtain PID of dead child process: no child processes
monitor zabbix_agent2[3112]: goroutine 20 [running]:
monitor zabbix_agent2[3112]: main.listenOnPluginFail(0x0?, {0xc00017f970, 0xa})
monitor zabbix_agent2[3112]: /tmp/build-debian-11-x86_64.4aNaMOyj/buildroot/zabbix-6.0.12/debian/tmp.build-sqlite3/src/go/cmd/zabbix_agent2/external_nix.go:96 +0x168
monitor zabbix_agent2[3112]: created by main.initExternalPlugin
monitor zabbix_agent2[3112]: /tmp/build-debian-11-x86_64.4aNaMOyj/buildroot/zabbix-6.0.12/debian/tmp.build-sqlite3/src/go/cmd/zabbix_agent2/external.go:94 +0x110
monitor systemd[1]: zabbix-agent2.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
monitor systemd[1]: zabbix-agent2.service: Failed with result 'exit-code'.
|
Comment by Andrey Tocko (Inactive) [ 2022 Dec 30 ] |
|
Comment by Andrey Tocko (Inactive) [ 2022 Dec 30 ] |
If systemd is not used user limits are in effect. lsof -u zabbix | wc -l Check the open file limit set for user. The default soft limit of open files for a user is 1024 sudo -u zabbix ulimit -Sn sudo -u zabbix ulimit -Hn Most likely zabbix user already reached the allowed limit(agentd, server, proxy, java-gw), and starting of the additional process is stuck on this limit. Which can result in a crash during agent startup. zabbix soft nofile 51200 zabbix hard nofile 51200 Better to adjust accordingly to system/kernel as values beyond range will be substituted with defaults. |
Comment by Edgar Akhmetshin [ 2023 Jan 03 ] |
Please modify packages to use default locations for sockets defined in HFS guideline: https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s18.html |
Comment by Oleksii Zagorskyi [ 2023 Jan 17 ] |
Just FYI, what is more correct way to alter unit file in an official way, which will create an additional override file. Let's make sure what we have:
# systemctl cat zabbix-agent2
# /usr/lib/systemd/system/zabbix-agent2.service
[Unit]
Description=Zabbix Agent 2
After=syslog.target
After=network.target
[Service]
Environment="CONFFILE=/etc/zabbix/zabbix_agent2.conf"
EnvironmentFile=-/etc/sysconfig/zabbix-agent2
Type=simple
Restart=on-failure
PIDFile=/run/zabbix/zabbix_agent2.pid
KillMode=control-group
ExecStart=/usr/sbin/zabbix_agent2 -c $CONFFILE
ExecStop=/bin/kill -SIGTERM $MAINPID
RestartSec=10s
User=zabbix
Group=zabbix
[Install]
WantedBy=multi-user.target
Then we execute this command: # systemctl edit zabbix-agent2 which will run "vi" editor with empty contend, where we should paste these 2 lines: [Service] LimitNOFILE=8192 exit and save changes. Now lets check again the unit file:
# systemctl cat zabbix-agent2
# /usr/lib/systemd/system/zabbix-agent2.service
[Unit]
Description=Zabbix Agent 2
After=syslog.target
After=network.target
[Service]
Environment="CONFFILE=/etc/zabbix/zabbix_agent2.conf"
EnvironmentFile=-/etc/sysconfig/zabbix-agent2
Type=simple
Restart=on-failure
PIDFile=/run/zabbix/zabbix_agent2.pid
KillMode=control-group
ExecStart=/usr/sbin/zabbix_agent2 -c $CONFFILE
ExecStop=/bin/kill -SIGTERM $MAINPID
RestartSec=10s
User=zabbix
Group=zabbix
[Install]
WantedBy=multi-user.target
# /etc/systemd/system/zabbix-agent2.service.d/override.conf
[Service]
LimitNOFILE=8192
Now at end of the output we see that the override file has been created (with our 2 lines) and systemd will read it every time when working with "zabbix-agent2" unit. |
Comment by Juris Lambda [ 2023 Jan 18 ] |
Hey, zalex_ua! Note though, that override.conf is a local override configuration that the administrator gets to write upon systemctl edit ... As we "own" the main systemd service configuration (the package does), we should be declaring the file descriptor limit in that. However, both examples atocko and zalex_ua are valid for a system administrator to use for raising the limit themselves, say, in the case of a deployment of a previous version of the package. If done on a small scale or for an single system, I'd follow zalex_ua's approach and let systemd create an override.conf for me. On a larger scale, I'd follow atocko's approach, and probably deploy an additional configuration in the service configuration directory (named anything other than override.conf) and deploy those. Just noting this here if anyone runs into this and can't upgrade the package, and need to seek some workaround. |
Comment by Eriks Sneiders [ 2023 Jan 20 ] |
Fixed in: Zabbix agent 2
Zabbix PostgreSQL plugin
|