ZABBIX BUGS AND ISSUES
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-7587

zabbix server does not re-login to oracle DB backend after ORA-02396 error

    Details

    • Type: Incident report Incident report
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.9
    • Fix Version/s: 2.0.11rc1, 2.2.2rc1, 2.3.0
    • Component/s: Server (S)
    • Labels:
    • Environment:
      Zabbix 2.0.9, Oracle 11.2.0.3.0

      Description

      First - I'm not sure it's a bug. But it looks suspicious for me.

      Here is a filtered part of a zabbix server log. It's a unreachable poller

      18124:20131223:114039.429 server #44 started unreachable poller #2
      ...
      18124:20131225:010545.671 resuming Zabbix agent checks on host [pi]: connection restored
      18124:20131225:011029.750 resuming Zabbix agent checks on host [ED]: connection restored
      18124:20131225:011520.813 resuming Zabbix agent checks on host [ho]: connection restored
      18124:20131225:011752.942 resuming Zabbix agent checks on host [pi]: connection restored

      ...
      18124:20131225:014252.957 resuming Zabbix agent checks on host [sc]: connection restored
      18124:20131225:014252.973 [Z3005] query failed: [-1] ORA-02396: exceeded maximum idle time, please connect again [update hosts set errors_from=0,disable_until=0,error='' where hostid=100100000010297]
      18124:20131225:014458.687 [Z3005] query failed: [-1] ORA-01012: not logged on
      Process ID: 61408842
      Session ID: 2094 Serial number: 1297 [update hosts set error='Get value from agent failed: cannot connect to [[10.10.10.10]:10050]: [4] Interrupted system call',disable_until=1387929148 where hostid=100100000011062]

      There is also another unreachable poller process which shows identical behavior.

      As we can see the unreachable poller #2 periodically performed some activity (connection restored) or sometimes there are "another network errors".
      Logically it periodically performs some calls to db. But after some period (25 minutes) of inactivity we can see single ORA-02396 and all the rest activity ends up with ORA-01012 error.
      I guess the server cannot operate normally in this case.

      I was able to see a lot of the same ORA-01012 errors in a old log file (end of it, from previous runs of zabbix server) generated by different zabbix process type - trapper.

      Looks like the ORA-01012 error related to an "IDLE_TIME" mentioned here http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_6010.htm

      Of course the ~25 minutes for "exceeded maximum idle time" is too small (the DB server is not under my control), but is that ok that the zabbix process cannot "recovery" after some conditions ?

        Activity

        Hide
        Andris Zeila added a comment -

        Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-7587

        Show
        Andris Zeila added a comment - Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-7587
        Hide
        Andris Zeila added a comment - - edited

        To test this fix the idle_time limit must be set for the Oracle DB user (the idle time is set in minutes):

        CREATE PROFILE <profile> LIMIT idle_time 1;
        ALTER USER <user> PROFILE <profile>;
        ALTER SYSTEM SET resource_limit = TRUE;
        
        Show
        Andris Zeila added a comment - - edited To test this fix the idle_time limit must be set for the Oracle DB user (the idle time is set in minutes): CREATE PROFILE <profile> LIMIT idle_time 1; ALTER USER <user> PROFILE <profile>; ALTER SYSTEM SET resource_limit = TRUE;
        Hide
        Oleksiy Zagorskyi added a comment - - edited

        Is it correct that now zabbix server processes will reconnect/relogin to oracle if oracle responds with the errors ?

        Andris Zeila yes, now it will correctly handle ORA-02396 and ORA-01012 errors. The problem was that after receiving those errors querying server status returned OK. So now it will be forced as DOWN and Zabbix will reconnect on the next select/execute/prepare/bind attempt.

        Show
        Oleksiy Zagorskyi added a comment - - edited Is it correct that now zabbix server processes will reconnect/relogin to oracle if oracle responds with the errors ? Andris Zeila yes, now it will correctly handle ORA-02396 and ORA-01012 errors. The problem was that after receiving those errors querying server status returned OK. So now it will be forced as DOWN and Zabbix will reconnect on the next select/execute/prepare/bind attempt.
        Hide
        Aleksandrs Saveljevs added a comment - - edited

        (1) There was a warning about converting an int to a pointer of different size, so I have fixed that in r41600. Please take a look.

        Also, I have fixed a bit of typos and style. In particular, our conventions suggest parentheses around the whole ternary expression (see https://www.zabbix.org/wiki/C_coding_guidelines#Conditional_statements). RESOLVED.

        Andris Zeila thanks, CLOSED

        Show
        Aleksandrs Saveljevs added a comment - - edited (1) There was a warning about converting an int to a pointer of different size, so I have fixed that in r41600. Please take a look. Also, I have fixed a bit of typos and style. In particular, our conventions suggest parentheses around the whole ternary expression (see https://www.zabbix.org/wiki/C_coding_guidelines#Conditional_statements ). RESOLVED. Andris Zeila thanks, CLOSED
        Hide
        Andris Zeila added a comment -

        Released in:
        pre-2.0.11rc1 r41610
        pre-2.2.2rc1 r41611
        pre-2.3.0 r41612

        Show
        Andris Zeila added a comment - Released in: pre-2.0.11rc1 r41610 pre-2.2.2rc1 r41611 pre-2.3.0 r41612

          People

          • Assignee:
            Unassigned
            Reporter:
            Oleksiy Zagorskyi
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: