-
Type:
Problem report
-
Resolution: Unresolved
-
Priority:
Critical
-
Affects Version/s: 7.0.22, 7.4.6, 8.0.0alpha1
-
Component/s: Server (S)
-
None
-
Environment:MariaDB 11.4
Zabbix 7.4
-
S26-W04/05, S26-W06/07
-
3
Steps to reproduce:
One of two Zabbix HA nodes is randomly encountering a deadlock while running an HA status update SQL query on its MariaDB Galera cluster node and then crashes with an assertion error after attempting a rollback without a transaction (txn_level = 0).
16479:20260102:090104.652 [Z3005] query failed: [1213] Deadlock found when trying to get lock; try restarting transaction [commit;] 16479:20260102:090104.657 slow query: 4.154537 sec, "commit;" 16518:20260102:090104.663 slow query: 6.919641 sec, "commit;" 16509:20260102:090104.667 slow query: 6.922353 sec, "delete from history_uint where itemid=1640442 and clock<1764662190" 16519:20260102:090104.672 slow query: 6.920088 sec, "commit;" 16521:20260102:090104.675 slow query: 6.903546 sec, "commit;" 16520:20260102:090104.682 slow query: 6.899709 sec, "commit;" 16612:20260102:090104.684 slow query: 6.634920 sec, "commit;" 16524:20260102:090104.695 slow query: 5.219587 sec, "commit;" 16666:20260102:090104.700 slow query: 4.756140 sec, "commit;" 16508:20260102:090104.703 slow query: 6.920938 sec, "commit;" 16479:20260102:090104.706 ERROR: rollback without transaction. Please report it to Zabbix Team. 16479:20260102:090104.710 === Backtrace: === 16479:20260102:090104.736 12: /usr/sbin/zabbix_server: ha manager(zbx_backtrace+0x3b) [0x564c3211c44b] 16479:20260102:090104.739 11: /usr/sbin/zabbix_server: ha manager(zbx_dbconn_rollback+0xf3) [0x564c32108493] 16479:20260102:090104.744 10: /usr/sbin/zabbix_server: ha manager(+0x252f04) [0x564c31f90f04] 16479:20260102:090104.750 9: /usr/sbin/zabbix_server: ha manager(ha_manager_thread+0xe39) [0x564c31f933a9] 16479:20260102:090104.754 8: /usr/sbin/zabbix_server: ha manager(zbx_thread_start+0x27) [0x564c32049bb7] 16479:20260102:090104.760 7: /usr/sbin/zabbix_server: ha manager(zbx_ha_start+0x61) [0x564c31f94e01] 16479:20260102:090104.765 6: /usr/sbin/zabbix_server: ha manager(+0xd8992) [0x564c31e16992] 16479:20260102:090104.770 5: /usr/sbin/zabbix_server: ha manager(MAIN_ZABBIX_ENTRY+0x10ff) [0x564c31e31b4f] 16479:20260102:090104.775 4: /usr/sbin/zabbix_server: ha manager(zbx_daemon_start+0x19d) [0x564c3211c08d] 16479:20260102:090104.780 3: /usr/sbin/zabbix_server: ha manager(main+0x371) [0x564c31e27761] 16479:20260102:090104.786 2: /lib64/libc.so.6(+0x40e6c) [0x7fb458040e6c] 16479:20260102:090104.791 1: /lib64/libc.so.6(__libc_start_main+0x87) [0x7fb458040f35] 16479:20260102:090104.796 0: /usr/sbin/zabbix_server: ha manager(_start+0x2a) [0x564c31e2e22a] zabbix_server: ha manager: dbconn.c:1067: dbconn_rollback: Assertion `0' failed.
The deadlocks and rollback attempts without transactions indicate a bug in the HA manager, possibly related to Galera’s optimistic locking system, as explained in this link:
https://www.percona.com/blog/percona-xtradb-cluster-multi-node-writing-and-unexpected-deadlocks/
It seems that the Zabbix HA manager doesn’t handle these aborted transactions correctly and causes it to crash or cause undefined behavior. The expected behavior is that it should just retry the transaction, similar to how other processes are handling temporary disconnects or query failures. We have tried to increase wsrep_slave_threads on Galera to reduce temporary flow control behavior under load, but it did not improve the situation.