-
Problem report
-
Resolution: Fixed
-
Major
-
6.4.15
-
None
-
Ubuntu 20.04
Galera Cluster with MariaDB 10.5
-
S24-W24/25
-
1
When using Native HA in combination with Galera Cluster, the HA manager crashes during a failover, causing it to not run. Node that should be active constantly restarts processes.
After some time the node starts in normal mode. This can happen from 1-2 iterations or after 20-30 minutes, I could not find any dependency.
It seems that the problem is related to a large volume of configuration, because I was not able to repeat the problem on a test environment with a new Zabbix database. However, it exists all the time on a server with about 5 million records in the items table.
The problem does not depend on the number of nodes in the cluster. Even if there is only one node in the cluster and both servers use it as a database directly (without ProxySQL or similar) - the problem persists.
Steps to reproduce:
- Start the both nodes
- Stop the active node
- Observe the standby node going into the loop:
346080:20240531:130404.728 starting HA manager 346080:20240531:130404.850 HA manager started in standby mode 346079:20240531:130404.850 "StandBy" node started in "standby" mode ... 346391:20240531:130725.204 server #294 started [trigger housekeeper #1] 346079:20240531:130725.204 "StandBy" node switched to "standby" mode 346393:20240531:130725.644 starting HA manager 346393:20240531:130725.644 HA manager started in standby mode 346079:20240531:130739.735 "StandBy" node switched to "active" mode 346079:20240531:130739.740 server #0 started [main process] 346395:20240531:130739.741 server #1 started [service manager #1] ... 346709:20240531:130900.371 server #295 started [odbc poller #1] 346079:20240531:130900.371 "StandBy" node switched to "standby" mode 346710:20240531:130900.819 starting HA manager 346710:20240531:130900.820 HA manager started in standby mode 346079:20240531:130914.910 "StandBy" node switched to "active" mode 346079:20240531:130914.916 server #0 started [main process] 346711:20240531:130914.916 server #1 started [service manager #1] ... 325719:20240531:101215.912 "StandBy" node switched to "standby" mode 326036:20240531:101215.912 server #295 started [odbc poller #1] 326037:20240531:101216.385 starting HA manager 326037:20240531:101216.385 HA manager started in standby mode 325719:20240531:101230.476 "StandBy" node switched to "active" mode
- is duplicated by
-
ZBX-24648 HA mode fails to run standalone after upgrading to 7.0
- Closed