Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  2. ZBX-9538

Server hang when issuing a config_cache_reload


    • Icon: Incident report Incident report
    • Resolution: Duplicate
    • Icon: Trivial Trivial
    • None
    • 2.4.4, 2.4.5
    • Server (S)
    • Sprint 72 (Jan 2021)

      Every once in a while a "zabbix_server -R config_cache_reload" will cause the server to hang (indefinitely; no monitoring was occurring anymore). We have been seeing this every 1 or 2 days. I generated core dumps of all processes when this last happened, and got the stack traces for all from those. I've put them all into a single file and attached that. It should be fairly easy to reproduce this by just repeatedly issuing a config_cache_reload.

      Analyzing the stack traces it became quickly clear what was happening: the config syncer process (pid 26620) was deadlocked, because the USR1 signal handler is making illegal library calls. Looking at the code, the whole signal handler is very problematic: in general, there are only very few things one is allowed to do in a signal handler, and calling things like semop() and fopen() are definitely not valid. Even setting the variable "sleep_remains" from the signal handler is problematic because that variable is not marked volatile and is not of type sig_atomic_t. All in all you might want to consider using a pipe instead of signals, or possibly just copying the flags to a (volatile sig_atomic_t) variable and issuing a sem_post() (one of the few async-signal-safe system calls on Linux at least) in the interrupt handler to wake up another process that can then properly process the flags.

            vso Vladislavs Sokurenko
            roadrunner A B
            Team A
            2 Vote for this issue
            5 Start watching this issue