[ZBX-6303] deadlocks in ORA-00060 in new zabbix 2.0.5 with Oracle g11r2 RAC Created: 2013 Feb 22  Updated: 2017 May 30  Resolved: 2013 Mar 28

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 2.0.5
Fix Version/s: None

Type: Incident report Priority: Blocker
Reporter: Olgierd Wolodkiewicz Assignee: Unassigned
Resolution: Duplicate Votes: 2
Labels: crash, oracle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

RHEL + 2 node Oracle g11R2 RAC


Attachments: Text File Zabbix-ORA-00060.log     File oracle_deadlock.log     File trace.tkprof     File zabbix_ora_32204.trc    
Issue Links:
Duplicate
is duplicated by ZBX-5225 Frequent "Lock wait timeout exceeded;... Closed

 Description   

After upgrading to new 2.0.5 version we have:
6700:20130222:031443.782 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource
6218:20130222:031448.614 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource
6918:20130222:031451.613 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource
6122:20130222:031455.817 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource
6671:20130222:031531.993 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource
7423:20130222:031535.581 [Z3005] query failed: [-1] ORA-00060: deadlock detected while waiting for resource

which causes to grow usage of cache write buffer and eventually to crash zabbix_server after a while (1-4h).
I made many tests and it remains the same. When I run environment with zabbix_server 2.0.4 it is OK, but with 2.0.5 - it dies.

Fix
ZBX-5920 added prefetching of 2 MB of data for Oracle SQL selects
seems to work, but I think in this fix is modification which causes above errors.



 Comments   
Comment by Andris Zeila [ 2013 Feb 27 ]

Can you attach the oracle trace file? There should be more information about resources/transactions involved.

I don't think setting prefetch buffer would cause deadlocks (unless you have tested 2.05 version with removed ZBX-5920 fix and it worked fine).

Actually after looking more at the quoted errors the messages seem to be truncated. For Z3005 errors the failed sql statement should be printed after error message in []. Is it possible to attach the original log (or create a new one with DebugLevel 4 if possible).

Comment by Ryan Rupp [ 2013 Mar 23 ]

I'm seeing this as well on 2.0.5 (although I haven't tested against earlier versions) when batch updates to the items table are made, I've attached a snippet of the log. I didn't see anything else really relevant but if the full trace is useful let me know. I don't have any logs currently from the DB side.

Comment by Andris Zeila [ 2013 Mar 25 ]

If you could get full trace with DebugLevel 4 it might be useful.

There is also possibility of similar deadlocks happening between frontend and server (ZBX-2494), not sure if that might be your case.

Comment by zabbixforme [ 2013 Mar 26 ]

Hi. I have same problem on Oracle 11.2.0.3 and zabbix 2.0.5. Server crash log attached.

Comment by zabbixforme [ 2013 Mar 26 ]

Oracle trace file attached

Comment by zabbixforme [ 2013 Mar 26 ]

Second Oracle trace file attached.

Comment by Andris Zeila [ 2013 Mar 26 ]

Thanks for the logs. Apparently there were circular deadlock between history and lld updates. It should be fixed with lld item update optimizations in 2.2 (ZBX-5225).

Regarding the crash - there are no DBget_seq_maxid() functions in zabbix server sources, so my guess that was some third party patch to improve ID number generation on oracle (which again should be improved if not fixed in 2.2).

Comment by zabbixforme [ 2013 Mar 26 ]

Thanks for reply Andris Zeila. I have examined the code in the link (ZBX-5225). The problem is that I do not get the message "Lock wait timeout exceeded; try restarting transaction".

Yes, we replaced DBget_seq_maxid() and use logic "select eventid_sequence.nextval from dual" to improve EVENT ID number.

Comment by Andris Zeila [ 2013 Mar 27 ]

That's true, but it should also help with lld deadlocks. Still we found another potential source of lld related deadlocks, so reopening to fix it.

Comment by Olgierd Wolodkiewicz [ 2013 Mar 27 ]

I have no access to DB logs - I can't provide this.
With debuglevel=4 application is very slow and takes 1hour to only read configuration from DB - useless.
I modified (reduced) size of prefetch_memory and recompiled application: didn't help even I set 1BYTE - I think problem is somewhere else (other Zabbix modifications for Oracle)

Comment by Alexander Vladishev [ 2013 Mar 28 ]

Fixed under ZBX-5225. I'm closing the issue.

Generated at Sat May 31 03:24:38 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.