Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-23685

HA manager stops and Zabbix server is down when a query to DB fails

XMLWordPrintable

    • Sprint 107 (Dec 2023)
    • 1

      Expected:
      Since DB failover was executed without causing Zabbix server to go down in Zabbix 5.0, we anticipate consistent behavior as we had prior to Zabbix 5.0.

      Issue:
      The Zabbix server working with redundant DB was stopped during DB failover process.

      During the failover process, The DB returns an error because it cannot process queries.
      ex.

      query failed: [0] PGRES_FATAL_ERROR:ERROR: cannot execute INSERT in a read-only transaction
      

      It seems that the failed query was not retried and causes HA manager to stop and Zabbix server to go down.
      quoted from zabbix_server.log: 

      [begin;]
      959418:20231105:202447.031 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: cannot execute INSERT in a read-only transaction
      [insert into history (itemid,clock,ns,value) values (23266,1699183486,832177762,0),(45526,1699183486,628699336,34.729413000000001);
      ]
      959398:20231105:202447.358 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: cannot execute SELECT FOR UPDATE in a read-only transaction
      [select ha_nodeid,name,status,lastaccess,address,port,ha_sessionid from ha_node order by ha_nodeid for update]
      959398:20231105:202447.358 HA manager has been paused
      959397:20231105:202447.358 HA manager error: database error
      959398:20231105:202447.401 HA manager has been stopped
      959397:20231105:202447.403 Zabbix Server stopped. Zabbix 6.0.23 (revision 315e9acac58).
       

      When a DB fails over in an Act/Standby configuration, there may be a time when the DB cannot be updated.
      So if HA manager stops due to an error like above, it means that the Act/Standby DB configuration is not available.
      I think we need to make sure that HA manager does not stop in such situations, or provide a documentation to explain it to our customer.

      Can you please improve this error handling?
      Please see attached log file as well.

      Regards,

            vso Vladislavs Sokurenko
            shirai Sayaka Hirai
            Team A
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: