Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-23685

HA manager stops and Zabbix server is down when a query to DB fails

    XMLWordPrintable

Details

    • Team A
    • Sprint 107 (Dec 2023)
    • 1

    Description

      Expected:
      Since DB failover was executed without causing Zabbix server to go down in Zabbix 5.0, we anticipate consistent behavior as we had prior to Zabbix 5.0.

      Issue:
      The Zabbix server working with redundant DB was stopped during DB failover process.

      During the failover process, The DB returns an error because it cannot process queries.
      ex.

      query failed: [0] PGRES_FATAL_ERROR:ERROR: cannot execute INSERT in a read-only transaction
      

      It seems that the failed query was not retried and causes HA manager to stop and Zabbix server to go down.
      quoted from zabbix_server.log: 

      [begin;]
      959418:20231105:202447.031 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: cannot execute INSERT in a read-only transaction
      [insert into history (itemid,clock,ns,value) values (23266,1699183486,832177762,0),(45526,1699183486,628699336,34.729413000000001);
      ]
      959398:20231105:202447.358 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: cannot execute SELECT FOR UPDATE in a read-only transaction
      [select ha_nodeid,name,status,lastaccess,address,port,ha_sessionid from ha_node order by ha_nodeid for update]
      959398:20231105:202447.358 HA manager has been paused
      959397:20231105:202447.358 HA manager error: database error
      959398:20231105:202447.401 HA manager has been stopped
      959397:20231105:202447.403 Zabbix Server stopped. Zabbix 6.0.23 (revision 315e9acac58).
       

      When a DB fails over in an Act/Standby configuration, there may be a time when the DB cannot be updated.
      So if HA manager stops due to an error like above, it means that the Act/Standby DB configuration is not available.
      I think we need to make sure that HA manager does not stop in such situations, or provide a documentation to explain it to our customer.

      Can you please improve this error handling?
      Please see attached log file as well.

      Regards,

      Attachments

        Activity

          People

            vso Vladislavs Sokurenko
            shirai Sayaka Hirai
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: