-
Problem report
-
Resolution: Fixed
-
Trivial
-
6.0.23
-
None
-
Sprint 107 (Dec 2023)
-
1
Expected:
Since DB failover was executed without causing Zabbix server to go down in Zabbix 5.0, we anticipate consistent behavior as we had prior to Zabbix 5.0.
Issue:
The Zabbix server working with redundant DB was stopped during DB failover process.
During the failover process, The DB returns an error because it cannot process queries.
ex.
query failed: [0] PGRES_FATAL_ERROR:ERROR: cannot execute INSERT in a read-only transaction
It seems that the failed query was not retried and causes HA manager to stop and Zabbix server to go down.
quoted from zabbix_server.log:
[begin;] 959418:20231105:202447.031 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: cannot execute INSERT in a read-only transaction [insert into history (itemid,clock,ns,value) values (23266,1699183486,832177762,0),(45526,1699183486,628699336,34.729413000000001); ] 959398:20231105:202447.358 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: cannot execute SELECT FOR UPDATE in a read-only transaction [select ha_nodeid,name,status,lastaccess,address,port,ha_sessionid from ha_node order by ha_nodeid for update] 959398:20231105:202447.358 HA manager has been paused 959397:20231105:202447.358 HA manager error: database error 959398:20231105:202447.401 HA manager has been stopped 959397:20231105:202447.403 Zabbix Server stopped. Zabbix 6.0.23 (revision 315e9acac58).
When a DB fails over in an Act/Standby configuration, there may be a time when the DB cannot be updated.
So if HA manager stops due to an error like above, it means that the Act/Standby DB configuration is not available.
I think we need to make sure that HA manager does not stop in such situations, or provide a documentation to explain it to our customer.
Can you please improve this error handling?
Please see attached log file as well.
Regards,
- causes
-
ZBX-24514 ZBX-23685 makes impossible to stop the Server node running under PCS (Corosync/Pacemaker) with RO database
- Closed