[#ZBX-6303] deadlocks in ORA-00060 in new zabbix 2.0.5 with Oracle g11r2 RAC

[ZBX-6303] deadlocks in ORA-00060 in new zabbix 2.0.5 with Oracle g11r2 RAC Created: 2013 Feb 22 Updated: 2017 May 30 Resolved: 2013 Mar 28
Status:	Closed
Project:	ZABBIX BUGS AND ISSUES
Component/s:	Server (S)
Affects Version/s:	2.0.5
Fix Version/s:	None

Type:

Incident report

Priority:

Blocker

Reporter:

Olgierd Wolodkiewicz

Assignee:

Unassigned

Resolution:

Duplicate

Votes:

Labels:

crash, oracle

Remaining Estimate:

Not Specified

Time Spent:

Not Specified

Original Estimate:

Not Specified

Environment:

RHEL + 2 node Oracle g11R2 RAC

Attachments:

Zabbix-ORA-00060.log

oracle_deadlock.log

trace.tkprof

zabbix_ora_32204.trc

Issue Links:

Duplicate
is duplicated by	~~ZBX-5225~~	Frequent "Lock wait timeout exceeded;...	Closed

Description

After upgrading to new 2.0.5 version we have:
6700:20130222:031443.782 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource
6218:20130222:031448.614 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource
6918:20130222:031451.613 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource
6122:20130222:031455.817 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource
6671:20130222:031531.993 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource
7423:20130222:031535.581 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource

which causes to grow usage of cache write buffer and eventually to crash zabbix_server after a while (1-4h).
I made many tests and it remains the same. When I run environment with zabbix_server 2.0.4 it is OK, but with 2.0.5 - it dies.

Fix
~~ZBX-5920~~ added prefetching of 2 MB of data for Oracle SQL selects
seems to work, but I think in this fix is modification which causes above errors.

Comments

Comment by Andris Zeila [ 2013 Feb 27 ]

Can you attach the oracle trace file? There should be more information about resources/transactions involved.

I don't think setting prefetch buffer would cause deadlocks (unless you have tested 2.05 version with removed ~~ZBX-5920~~ fix and it worked fine).

Actually after looking more at the quoted errors the messages seem to be truncated. For Z3005 errors the failed sql statement should be printed after error message in []. Is it possible to attach the original log (or create a new one with DebugLevel 4 if possible).

Comment by Ryan Rupp [ 2013 Mar 23 ]

I'm seeing this as well on 2.0.5 (although I haven't tested against earlier versions) when batch updates to the items table are made, I've attached a snippet of the log. I didn't see anything else really relevant but if the full trace is useful let me know. I don't have any logs currently from the DB side.

Comment by Andris Zeila [ 2013 Mar 25 ]

If you could get full trace with DebugLevel 4 it might be useful.

There is also possibility of similar deadlocks happening between frontend and server (~~ZBX-2494~~), not sure if that might be your case.

Comment by zabbixforme [ 2013 Mar 26 ]

Hi. I have same problem on Oracle 11.2.0.3 and zabbix 2.0.5. Server crash log attached.

Comment by zabbixforme [ 2013 Mar 26 ]

Oracle trace file attached

Comment by zabbixforme [ 2013 Mar 26 ]

Second Oracle trace file attached.

Comment by Andris Zeila [ 2013 Mar 26 ]

Thanks for the logs. Apparently there were circular deadlock between history and lld updates. It should be fixed with lld item update optimizations in 2.2 (~~ZBX-5225~~).

Regarding the crash - there are no DBget_seq_maxid() functions in zabbix server sources, so my guess that was some third party patch to improve ID number generation on oracle (which again should be improved if not fixed in 2.2).

Comment by zabbixforme [ 2013 Mar 26 ]

Thanks for reply Andris Zeila. I have examined the code in the link (~~ZBX-5225~~). The problem is that I do not get the message "Lock wait timeout exceeded; try restarting transaction".

Yes, we replaced DBget_seq_maxid() and use logic "select eventid_sequence.nextval from dual" to improve EVENT ID number.

Comment by Andris Zeila [ 2013 Mar 27 ]

That's true, but it should also help with lld deadlocks. Still we found another potential source of lld related deadlocks, so reopening to fix it.

Comment by Olgierd Wolodkiewicz [ 2013 Mar 27 ]

I have no access to DB logs - I can't provide this.
With debuglevel=4 application is very slow and takes 1hour to only read configuration from DB - useless.
I modified (reduced) size of prefetch_memory and recompiled application: didn't help even I set 1BYTE - I think problem is somewhere else (other Zabbix modifications for Oracle)

Comment by Alexander Vladishev [ 2013 Mar 28 ]

Fixed under ~~ZBX-5225~~. I'm closing the issue.

Generated at Sat May 31 03:24:38 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.

[ZBX-6303] deadlocks in ORA-00060 in new zabbix 2.0.5 with Oracle g11r2 RAC Created: 2013 Feb 22 Updated: 2017 May 30 Resolved: 2013 Mar 28