Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-11816

Deadlock If Crashed With Log Mutex Locked

XMLWordPrintable

    • 1

      There are several bug reports of zabbix hanging ZBX-11635, ZBX-11758 are 2 of them, but there could be more.

      They all hang because process that uses zabbix log function locks mutex then crashes during fprintf and signal handler for the same process try to log again, but mutex is already locked by the same process and all other processes also wait for mutex to be freed.
      This can cause data loss.

      Furthermore this issue is dangerous because it simply hangs and user don't understand what is happening, there is no log entry, this must be very frustrating.
      Also on our development machines usually issue will not be noticed because for example on ubuntu linux if we try to

      printf("%s", NULL);
      

      Result is that (NULL) is printed

      However on Solaris 10 SPARC T4-2
      It will crash will null pointer exception.

      Furthermore, it's possible that not all logs are enabled during testing or not valid pointer is passed to log function and it's unnoticed.

      This can be fixed by making mutex reentrant, patch with idea is attached.

      More info:
      if mutex would be reentrant we would see a crash. in short, proccess 1 mutex get locked it log try to printf to file, it crash, signal handler is launched for process 1, it try to lock mutex (again), but it is already locked so it waits for someone to unlock(while he is the one who locked) , but no unlock will happen since the one who locked is sig killed and try to lock again. Now everyone who wish to log something are waiting for unlock by process 1 that will never occur. This deadlock is easily spotted, no matter how you try to kill zabbix, you can't get any log out of it anymore.
      That's why I have suggested to fix this by allowing mutex to be reentrant as in patch attached, this would allow to avoid hang and potential loss of data.

      Note:
      Patch attached is only for Linux like operating systems that use fork.

            Unassigned Unassigned
            vso Vladislavs Sokurenko
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: