Loading...

XML

Word

Printable

Type: Problem report
Resolution: Fixed
Priority: Major
Fix Version/s: 6.0.31rc1, 6.4.16rc1, 7.0.1rc1, 7.2.0alpha1
Affects Version/s: 6.4.15
Component/s: Server (S)
Labels:
None
Environment:
Ubuntu 20.04
Galera Cluster with MariaDB 10.5

Sprint:
S24-W24/25
Story Points:
1

When using Native HA in combination with Galera Cluster, the HA manager crashes during a failover, causing it to not run. Node that should be active constantly restarts processes.

After some time the node starts in normal mode. This can happen from 1-2 iterations or after 20-30 minutes, I could not find any dependency.

It seems that the problem is related to a large volume of configuration, because I was not able to repeat the problem on a test environment with a new Zabbix database. However, it exists all the time on a server with about 5 million records in the items table.

The problem does not depend on the number of nodes in the cluster. Even if there is only one node in the cluster and both servers use it as a database directly (without ProxySQL or similar) - the problem persists.

Steps to reproduce:

Start the both nodes
Stop the active node
Observe the standby node going into the loop:

346080:20240531:130404.728 starting HA manager
346080:20240531:130404.850 HA manager started in standby mode
346079:20240531:130404.850 "StandBy" node started in "standby" mode
...
346391:20240531:130725.204 server #294 started [trigger housekeeper #1]
346079:20240531:130725.204 "StandBy" node switched to "standby" mode
346393:20240531:130725.644 starting HA manager
346393:20240531:130725.644 HA manager started in standby mode
346079:20240531:130739.735 "StandBy" node switched to "active" mode
346079:20240531:130739.740 server #0 started [main process]
346395:20240531:130739.741 server #1 started [service manager #1] 
...
346709:20240531:130900.371 server #295 started [odbc poller #1]
346079:20240531:130900.371 "StandBy" node switched to "standby" mode
346710:20240531:130900.819 starting HA manager
346710:20240531:130900.820 HA manager started in standby mode
346079:20240531:130914.910 "StandBy" node switched to "active" mode
346079:20240531:130914.916 server #0 started [main process]
346711:20240531:130914.916 server #1 started [service manager #1] 
...
325719:20240531:101215.912 "StandBy" node switched to "standby" mode
326036:20240531:101215.912 server #295 started [odbc poller #1]
326037:20240531:101216.385 starting HA manager
326037:20240531:101216.385 HA manager started in standby mode
325719:20240531:101230.476 "StandBy" node switched to "active" mode

is duplicated by

ZBX-24648 HA mode fails to run standalone after upgrading to 7.0

Closed

Assignee:: Andris Zeila

Reporter:: Maksym Buz

Team:: Team A

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024 Jun 03 09:49

Updated:: 2024 Dec 27 16:27

Resolved:: 2024 Jun 08 11:05

Details

Description

Attachments

Issue Links

Activity

People

Dates