[ZBX-4151] server crash: memory corruption Created: 2011 Sep 16  Updated: 2017 May 30  Resolved: 2011 Oct 06

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: 1.9.7 (beta)

Type: Incident report Priority: Blocker
Reporter: richlv Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

trunk rev 21658


Attachments: File objdump_-DSswx_zabbix_server_crash.bz2     Text File server_crash_mallox_2.txt     Text File zabbix_server_crash.log     File zabbix_server_dev_branch_corrupted_double-linked_list.log.bz2     File zabbix_server_dev_branch_corrupted_double-linked_list_2.log.bz2    
Issue Links:
Duplicate
is duplicated by ZBX-4266 Zabbix server process crash Closed

 Comments   
Comment by Aleksandrs Saveljevs [ 2011 Sep 22 ]

Rich, we have tried investigating this memory issue and ZBX-4133 by running Zabbix server under Valgrind. However, no problems were found in our environment and, as noted in ZBX-4133, it is pretty difficult to analyze this issue, because heap could become corrupted in any part of Zabbix server code. Please reopen if the problem occurs again.

Comment by richlv [ 2011 Sep 30 ]

it just did

Comment by richlv [ 2011 Sep 30 ]

and looks like in both cases killed process is timer

Comment by Aleksandrs Saveljevs [ 2011 Oct 03 ]

Seems to crash right after start. Can you reliably reproduce the problem?

Comment by richlv [ 2011 Oct 04 ]

not really, it seems to happen every now and then. i could do some cycle of start/stop and see how often it happens - would that help any ?

maybe some debugging output can be added to the server that i could run with ?

Comment by Aleksandrs Saveljevs [ 2011 Oct 05 ]

There are two things we can try doing: (a) running Zabbix server under Valgrind and (b) running Zabbix server with additional debugging output.

I propose we start with (b). To that end, could you please try running Zabbix server from svn://svn.zabbix.com/branches/dev/ZBX-4151 ? It adds additional debugging output to memory allocation routines so that we can find out which allocated buffer is most close to the corrupted part of memory. You can probably keep DebugLevel=3, but if you could run it under DebugLevel=4, that would be nice, too.

Comment by richlv [ 2011 Oct 05 ]

i created a script to repeatedly start/stop server. after running it, server crashed on the first try...
although this time it was "corrupted double-linked list"

log of that start/crash session at debuglevel4 attached (zabbix_server_dev_branch_corrupted_double-linked_list.log.bz2)

Comment by richlv [ 2011 Oct 05 ]

zabbix_server_dev_branch_corrupted_double-linked_list_2.log.bz2 is another crash right after the startup.

additionally, in this case one zabbix_server process did not terminate upon kill -15. stracing it reveals that it got stuck on :

futex(0xb7356380, FUTEX_WAIT_PRIVATE, 2, NULL

Comment by Aleksandrs Saveljevs [ 2011 Oct 06 ]

Thanks, that was useful! The fix is available in development branch svn://svn.zabbix.com/branches/dev/ZBX-4151 .

The problem was that the buffer for time-based triggers was allocated for 0 triggers, then the configuration cache was synced, and then a non-zero amount of triggers were processed, which resulted in corrupted memory.

Comment by Aleksandrs Saveljevs [ 2011 Oct 06 ]

Fixed in pre-1.9.7 in r22185.

Generated at Fri Apr 26 02:32:00 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.