[ZBX-4661] server crash when Oracle database is not available Created: 2012 Feb 15 Updated: 2017 May 30 Resolved: 2016 Nov 30 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 1.8.8 |
Fix Version/s: | 2.0.20rc1, 2.2.16rc1, 3.0.6rc1, 3.2.2rc1, 3.4.0alpha1 |
Type: | Incident report | Priority: | Blocker |
Reporter: | Oleksii Zagorskyi | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | crash, oracle, webmonitoring | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
RHEL 5.4 + Oracle11GR2 + Zabbix 1.8.8 |
Attachments: | zabbix_server.log | ||||||||||||
Issue Links: |
|
Description |
Early today, when DataBase went down for backup process, all the zabbix_server's processes went down too. zabbix_server.log attached Zabbix_server: bb05b Seems problems in the webchecks, see zabbix_server.log and PID 1608 |
Comments |
Comment by Oleksii Zagorskyi [ 2012 Feb 15 ] |
very similar but seems another issue is |
Comment by Glebs Ivanovskis (Inactive) [ 2016 Jul 04 ] |
This crash may be still present in the current trunk. Imagine we lose connection to database during one of the "inner" DBselect()'s in process_httptests(): int process_httptests(int httppoller_num, int now) { ... result = DBselect(...); while (NULL != (row = DBfetch(result))) { /* very big and complicated loop with more DBselect()'s in it */ } ... DBfree_result(result); <--- double free statement handle ... } We will DBclose() the connection and attempt to DBconnect() several times. In zbx_db_close() we free all handles. According to Oracle documentation when parent handle is freed children handles are freed automatically. Including statement handle associated with "outer" DBselect(). |
Comment by Oleksii Zagorskyi [ 2016 Oct 13 ] |
Another case, maybe related: In zabbix server log are many messages like: 25027:20161012:224729.206 [Z3005] query failed: [-1] ORA-03113: end-of-file on communication channel Process ID: 12833 Session ID: 801 Serial number: 16322 [update hosts set lastaccess=1476325922 where hostid=13852] 25027:20161012:224729.206 slow query: 926.648444 sec, "update hosts set lastaccess=1476325922 where hostid=13852" Here is copy-paste how server was stopped for unknown reason: 24719:20161012:224730.444 [Z3005] query failed: [-1] ORA-03113: end-of-file on communication channel Process ID: 12841 Session ID: 524 Serial number: 57807 [select i.itemid,f.functionid,f.function,f.parameter,t.triggerid from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=t.triggerid and h.status in (0,1) and t.flags<>2] 24719:20161012:224730.444 slow query: 931.764048 sec, "select i.itemid,f.functionid,f.function,f.parameter,t.triggerid from hosts h,items i,functions f,triggers t where h.hostid=i.hostid and i.itemid=f.itemid and f.triggerid=t.triggerid and h.status in (0,1) and t.flags<>2" 24719:20161012:224730.558 [Z3006] fetch failed: [100] OCI_NODATA 24719:20161012:224730.558 no records in table 'config' 24716:20161012:224730.600 One child process died (PID:24719,exitcode/signal:9). Exiting ... 24716:20161012:224734.441 syncing history data... 24716:20161012:224736.709 syncing history data done 24716:20161012:224736.709 syncing trends data... 24716:20161012:224915.770 slow query: 7.964045 sec, "select distinct itemid from trends_uint where <trim> 24716:20161012:224921.529 syncing trends data done 24716:20161012:224921.531 Zabbix Server stopped. Zabbix 3.0.3 (revision 60173). |
Comment by Vladislavs Sokurenko [ 2016 Nov 02 ] |
(1) Incorrect order when deallocating, currently environment handle is not deallocated last which could potentially introduce problems. Note this is happening on each database close, not only when database is not available. From documentation: Terminating the Application An OCI application should perform the following steps before it terminates: Delete the user session by calling OCISessionEnd() for each session. Delete access to the data sources by calling OCIServerDetach() for each source. Explicitly deallocate all handles by calling OCIHandleFree() for each handle. Delete the environment handle, which deallocates all other handles associated with it. vso RESOLVED in r63497:r63502 wiper In zbx_db_close() function - while it doesn't change anything, freeing results before Oracle handles would seem more logical. wiper CLOSED |
Comment by Vladislavs Sokurenko [ 2016 Nov 02 ] |
(2) When doing selects then handles are open for each query but the order of deallocation is not guaranteed. It is possible that on database connection failure environment handle will get deleted before child handles do. wiper As the order of results is not important it would be better to use zbx_vector_ptr_remove_noorder() instead of zbx_vector_ptr_remove(). wiper CLOSED |
Comment by Vladislavs Sokurenko [ 2016 Nov 03 ] |
Fixed in: Note: |
Comment by Andris Zeila [ 2016 Nov 21 ] |
(3) When closing database conncetion would be better to free all results (OCI_DBfree_result), not only the OCIstmt handles. This will require setting stmthp to NULL after freeing it in OCI_DBfree_result() function. vso RESOLVED in r63903 wiper CLOSED |
Comment by Andris Zeila [ 2016 Nov 21 ] |
(4) Not related to this development, but we could fix it if we are already fixing Oracle related code. In zbx_db_fetch() function the column processing can be skipped if OCIStmtFetch2() didn't return OCI_SUCCESS. vso RESOLVED in r63910 wiper CLOSED |
Comment by Andris Zeila [ 2016 Nov 23 ] |
Successfully tested |
Comment by Vladislavs Sokurenko [ 2016 Nov 23 ] |
Fixed conflicts in development branch: wiper Looks good |
Comment by Vladislavs Sokurenko [ 2016 Nov 25 ] |
Fixed in: |