Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-7587

zabbix server does not re-login to oracle DB backend after ORA-02396 error

    Details

    • Type: Incident report
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.9
    • Fix Version/s: 2.0.11rc1, 2.2.2rc1, 2.3.0
    • Component/s: Server (S)
    • Labels:
    • Environment:
      Zabbix 2.0.9, Oracle 11.2.0.3.0

      Description

      First - I'm not sure it's a bug. But it looks suspicious for me.

      Here is a filtered part of a zabbix server log. It's a unreachable poller

      18124:20131223:114039.429 server #44 started unreachable poller #2
      ...
      18124:20131225:010545.671 resuming Zabbix agent checks on host [pi]: connection restored
      18124:20131225:011029.750 resuming Zabbix agent checks on host [ED]: connection restored
      18124:20131225:011520.813 resuming Zabbix agent checks on host [ho]: connection restored
      18124:20131225:011752.942 resuming Zabbix agent checks on host [pi]: connection restored

      ...
      18124:20131225:014252.957 resuming Zabbix agent checks on host [sc]: connection restored
      18124:20131225:014252.973 [Z3005] query failed: [-1] ORA-02396: exceeded maximum idle time, please connect again [update hosts set errors_from=0,disable_until=0,error='' where hostid=100100000010297]
      18124:20131225:014458.687 [Z3005] query failed: [-1] ORA-01012: not logged on
      Process ID: 61408842
      Session ID: 2094 Serial number: 1297 [update hosts set error='Get value from agent failed: cannot connect to [[10.10.10.10]:10050]: [4] Interrupted system call',disable_until=1387929148 where hostid=100100000011062]

      There is also another unreachable poller process which shows identical behavior.

      As we can see the unreachable poller #2 periodically performed some activity (connection restored) or sometimes there are "another network errors".
      Logically it periodically performs some calls to db. But after some period (25 minutes) of inactivity we can see single ORA-02396 and all the rest activity ends up with ORA-01012 error.
      I guess the server cannot operate normally in this case.

      I was able to see a lot of the same ORA-01012 errors in a old log file (end of it, from previous runs of zabbix server) generated by different zabbix process type - trapper.

      Looks like the ORA-01012 error related to an "IDLE_TIME" mentioned here http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6010.htm

      Of course the ~25 minutes for "exceeded maximum idle time" is too small (the DB server is not under my control), but is that ok that the zabbix process cannot "recovery" after some conditions ?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              zalex_ua Oleksiy Zagorskyi
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: