Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-9538

Server hang when issuing a config_cache_reload

    XMLWordPrintable

Details

    • Incident report
    • Status: Closed
    • Trivial
    • Resolution: Duplicate
    • 2.4.4, 2.4.5
    • None
    • Server (S)

    Description

      Every once in a while a "zabbix_server -R config_cache_reload" will cause the server to hang (indefinitely; no monitoring was occurring anymore). We have been seeing this every 1 or 2 days. I generated core dumps of all processes when this last happened, and got the stack traces for all from those. I've put them all into a single file and attached that. It should be fairly easy to reproduce this by just repeatedly issuing a config_cache_reload.

      Analyzing the stack traces it became quickly clear what was happening: the config syncer process (pid 26620) was deadlocked, because the USR1 signal handler is making illegal library calls. Looking at the code, the whole signal handler is very problematic: in general, there are only very few things one is allowed to do in a signal handler, and calling things like semop() and fopen() are definitely not valid. Even setting the variable "sleep_remains" from the signal handler is problematic because that variable is not marked volatile and is not of type sig_atomic_t. All in all you might want to consider using a pipe instead of signals, or possibly just copying the flags to a (volatile sig_atomic_t) variable and issuing a sem_post() (one of the few async-signal-safe system calls on Linux at least) in the interrupt handler to wake up another process that can then properly process the flags.

      Attachments

        Issue Links

          Activity

            People

              vso Vladislavs Sokurenko
              roadrunner A B
              Votes:
              2 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: