-
Incident report
-
Resolution: Duplicate
-
Trivial
-
None
-
2.4.4, 2.4.5
-
Sprint 72 (Jan 2021)
Every once in a while a "zabbix_server -R config_cache_reload" will cause the server to hang (indefinitely; no monitoring was occurring anymore). We have been seeing this every 1 or 2 days. I generated core dumps of all processes when this last happened, and got the stack traces for all from those. I've put them all into a single file and attached that. It should be fairly easy to reproduce this by just repeatedly issuing a config_cache_reload.
Analyzing the stack traces it became quickly clear what was happening: the config syncer process (pid 26620) was deadlocked, because the USR1 signal handler is making illegal library calls. Looking at the code, the whole signal handler is very problematic: in general, there are only very few things one is allowed to do in a signal handler, and calling things like semop() and fopen() are definitely not valid. Even setting the variable "sleep_remains" from the signal handler is problematic because that variable is not marked volatile and is not of type sig_atomic_t. All in all you might want to consider using a pipe instead of signals, or possibly just copying the flags to a (volatile sig_atomic_t) variable and issuing a sem_post() (one of the few async-signal-safe system calls on Linux at least) in the interrupt handler to wake up another process that can then properly process the flags.
- duplicates
-
ZBX-8761 Potential lockup if signal is received during message logging
- Closed