[ZBX-11816] Deadlock If Crashed With Log Mutex Locked Created: 2017 Feb 15  Updated: 2019 Dec 10

Status: Open
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G), Proxy (P), Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Incident report Priority: Trivial
Reporter: Vladislavs Sokurenko Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: deadlock, hang, logging, semaphores
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File reentrant_mutex_fix.diff    
Story Points: 1

 Description   

There are several bug reports of zabbix hanging ZBX-11635, ZBX-11758 are 2 of them, but there could be more.

They all hang because process that uses zabbix log function locks mutex then crashes during fprintf and signal handler for the same process try to log again, but mutex is already locked by the same process and all other processes also wait for mutex to be freed.
This can cause data loss.

Furthermore this issue is dangerous because it simply hangs and user don't understand what is happening, there is no log entry, this must be very frustrating.
Also on our development machines usually issue will not be noticed because for example on ubuntu linux if we try to

printf("%s", NULL);

Result is that (NULL) is printed

However on Solaris 10 SPARC T4-2
It will crash will null pointer exception.

Furthermore, it's possible that not all logs are enabled during testing or not valid pointer is passed to log function and it's unnoticed.

This can be fixed by making mutex reentrant, patch with idea is attached.

More info:
if mutex would be reentrant we would see a crash. in short, proccess 1 mutex get locked it log try to printf to file, it crash, signal handler is launched for process 1, it try to lock mutex (again), but it is already locked so it waits for someone to unlock(while he is the one who locked) , but no unlock will happen since the one who locked is sig killed and try to lock again. Now everyone who wish to log something are waiting for unlock by process 1 that will never occur. This deadlock is easily spotted, no matter how you try to kill zabbix, you can't get any log out of it anymore.
That's why I have suggested to fix this by allowing mutex to be reentrant as in patch attached, this would allow to avoid hang and potential loss of data.

Note:
Patch attached is only for Linux like operating systems that use fork.



 Comments   
Comment by Glebs Ivanovskis (Inactive) [ 2017 Feb 15 ]

Alternative option is to use zbx_error() in fatal signal handler. It does not lock the log file.
vso Yes it's possible to investigate other solutions, if zbx_error() does not lock log file then where does it write log ?

glebs.ivanovskis Check __zbx_zbx_error() function. It simply prints to stderr. Normally we redirect stderr to the same place as logging. __zbx_zabbix_log() is much more complicated, it rotates log files when necessary, manages concurrent output, adds very precise timestamps to messages, etc.

vso This should be investigated if task will get assigned, however I don't see why it would be acceptable to write to file from multiple processes without lock.

Generated at Sun May 25 07:20:15 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.