-
Incident report
-
Resolution: Fixed
-
Major
-
2.0.9
-
Zabbix 2.0.9, Oracle 11.2.0.3.0
First - I'm not sure it's a bug. But it looks suspicious for me.
Here is a filtered part of a zabbix server log. It's a unreachable poller
18124:20131223:114039.429 server #44 started unreachable poller #2
...
18124:20131225:010545.671 resuming Zabbix agent checks on host [pi]: connection restored
18124:20131225:011029.750 resuming Zabbix agent checks on host [ED]: connection restored
18124:20131225:011520.813 resuming Zabbix agent checks on host [ho]: connection restored
18124:20131225:011752.942 resuming Zabbix agent checks on host [pi]: connection restored
...
18124:20131225:014252.957 resuming Zabbix agent checks on host [sc]: connection restored
18124:20131225:014252.973 [Z3005] query failed: [-1] ORA-02396: exceeded maximum idle time, please connect again [update hosts set errors_from=0,disable_until=0,error='' where hostid=100100000010297]
18124:20131225:014458.687 [Z3005] query failed: [-1] ORA-01012: not logged on
Process ID: 61408842
Session ID: 2094 Serial number: 1297 [update hosts set error='Get value from agent failed: cannot connect to [[10.10.10.10]:10050]: [4] Interrupted system call',disable_until=1387929148 where hostid=100100000011062]
There is also another unreachable poller process which shows identical behavior.
As we can see the unreachable poller #2 periodically performed some activity (connection restored) or sometimes there are "another network errors".
Logically it periodically performs some calls to db. But after some period (25 minutes) of inactivity we can see single ORA-02396 and all the rest activity ends up with ORA-01012 error.
I guess the server cannot operate normally in this case.
I was able to see a lot of the same ORA-01012 errors in a old log file (end of it, from previous runs of zabbix server) generated by different zabbix process type - trapper.
Looks like the ORA-01012 error related to an "IDLE_TIME" mentioned here http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6010.htm
Of course the ~25 minutes for "exceeded maximum idle time" is too small (the DB server is not under my control), but is that ok that the zabbix process cannot "recovery" after some conditions ?