[ZBX-21613] Zabbix HA cluster standby node can start database upgrade Created: 2022 Sep 08  Updated: 2024 Apr 10  Resolved: 2023 May 21

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 6.2.2
Fix Version/s: 6.0.18rc1, 7.0 (plan)

Type: Problem report Priority: Major
Reporter: Kaspars Mednis Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Team: Team A
Sprint: Sprint 99 (Apr 2023), Sprint 100 (May 2023)
Story Points: 1

 Description   

Steps to reproduce:

  1. Install two Zabbix server HA nodes with Zabbix server 6.0
  2. Upgrade Zabbix to 6.2 on the standby node
  3. Watch log files

Result:
Standby node upgrades Zabbix DB schema, because database version check is initialized before HA manager starts:

  1978:20220908:070025.835 Starting Zabbix Server. Zabbix 6.2.2 (revision 35455866073).
  1978:20220908:070025.835 ******************************
  1978:20220908:070025.835 using configuration file: /etc/zabbix/zabbix_server.conf
  1978:20220908:070025.961 current database version (mandatory/optional): 06000000/06000004
  1978:20220908:070025.961 required mandatory version: 06020000
  1978:20220908:070025.961 starting automatic database upgrade
  1978:20220908:070025.978 completed 1% of database upgrade
  1978:20220908:070026.064 completed 2% of database upgrade
  1978:20220908:070026.112 completed 3% of database upgrade
  ........
  1978:20220908:070039.160 completed 97% of database upgrade
  1978:20220908:070039.425 completed 98% of database upgrade
  1978:20220908:070039.443 completed 100% of database upgrade
  1978:20220908:070039.445 database upgrade fully completed
  2058:20220908:070039.486 starting HA manager
  2058:20220908:070039.526 HA manager started in standby mode
  1978:20220908:070039.526 "zbx_node_2" node started in "standby" mode

 Active node continues to work with incompatible DB schema and does not even crash:

  7980:20220908:070026.691 [Z3005] query failed: [1412] Table definition has changed, please retry transaction [update triggers set state=1,error='Cannot evaluate function last(/Zabbix server/zabbix.nodes.status[cl7sp6fau0001fpqs4u4nctrw],#2): not enough data.' where triggerid=22963;
]
  7963:20220908:070128.948 [Z3005] query failed: [1054] Unknown column 'lastaccess' in 'field list' [select hostid,proxy_hostid,host,ipmi_authtype,ipmi_privilege,ipmi_username,ipmi_password,maintenance_status,maintenance_type,maintenance_from,status,name,lastaccess,tls_connect,tls_accept,tls_issuer,tls_subject,tls_psk_identity,tls_psk,proxy_address,auto_compress,maintenanceid from hosts where status in (0,1,5,6) and flags<>2]
zabbix_server [7963]: ERROR [file and function: <dbconfig.c,DCsync_configuration>, revision:c7c3044a4a2, line:6687] Something impossible has just happened.
  7963:20220908:070128.948 === Backtrace: ===
  7963:20220908:070128.950 9: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](zbx_backtrace+0x3f) [0x55993573d8c0]
  7963:20220908:070128.950 8: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](DCsync_configuration+0x310f) [0x5599356d3778]
  7963:20220908:070128.950 7: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](dbconfig_thread+0x31e) [0x55993556ab0d]
  7963:20220908:070128.950 6: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](zbx_thread_start+0x37) [0x55993574d9b3]
  7963:20220908:070128.950 5: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](+0x66924) [0x559935558924]
  7963:20220908:070128.950 4: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](MAIN_ZABBIX_ENTRY+0xa8c) [0x559935559d8e]
  7963:20220908:070128.950 3: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](daemon_start+0x384) [0x55993573d4df]
  7963:20220908:070128.950 2: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](main+0x33a) [0x5599355581ce]
  7963:20220908:070128.950 1: /lib64/libc.so.6(__libc_start_main+0xf3) [0x7fb7575a5493]
  7963:20220908:070128.950 0: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](_start+0x2e) [0x559935556f1e]

Expected:
Standby node will not upgrade DB schema, or at least Zabbix server will stop if incompatible DB schema is detected



 Comments   
Comment by Kaspars Mednis [ 2022 Sep 08 ]

Confirmed

Comment by Vladislavs Sokurenko [ 2022 Sep 12 ]

It would be nice to improve upgrade patch so that first patch would check if there are any active nodes and not upgrade in that case.

Comment by Lauri Rood [ 2023 Feb 15 ]

Same behavior is in version 6.0 with minor version upgrades.

 

1928460:20230215:125943.213 Zabbix Server stopped. Zabbix 6.0.9 (revision 64721203c07).
  1814:20230215:130158.841 Starting Zabbix Server. Zabbix 6.0.13 (revision fdfa8cef9ce).
  1814:20230215:130158.844 ****** Enabled features ******
  1814:20230215:130158.844 SNMP monitoring:           YES
  1814:20230215:130158.844 IPMI monitoring:           YES
  1814:20230215:130158.844 Web monitoring:            YES
  1814:20230215:130158.844 VMware monitoring:         YES
  1814:20230215:130158.844 SMTP authentication:       YES
  1814:20230215:130158.844 ODBC:                      YES
  1814:20230215:130158.844 SSH support:               YES
  1814:20230215:130158.844 IPv6 support:              YES
  1814:20230215:130158.844 TLS support:               YES
  1814:20230215:130158.844 ******************************
  1814:20230215:130158.844 using configuration file: zabbix_server.conf
  1814:20230215:130159.060 current database version (mandatory/optional): 06000000/06000006
  1814:20230215:130159.060 required mandatory version: 06000000
  1814:20230215:130159.060 optional patches were found
  1814:20230215:130159.060 starting automatic database upgrade
  1814:20230215:130159.084 completed 8% of database upgrade
  1814:20230215:130159.104 completed 16% of database upgrade
  1814:20230215:130159.118 completed 25% of database upgrade
  1814:20230215:130159.174 completed 33% of database upgrade
  1814:20230215:130159.197 completed 41% of database upgrade
  1814:20230215:130159.204 completed 50% of database upgrade
  1814:20230215:130159.217 completed 58% of database upgrade
  1814:20230215:130159.247 completed 66% of database upgrade
  1814:20230215:130200.861 completed 75% of database upgrade
  1814:20230215:130200.909 completed 83% of database upgrade
  1814:20230215:130200.936 completed 91% of database upgrade
  1814:20230215:130200.958 completed 100% of database upgrade
  1814:20230215:130200.958 database upgrade fully completed
  2613:20230215:130201.289 starting HA manager
  2613:20230215:130201.484 HA manager started in standby mode
  1814:20230215:130201.484 "server 2" node started in "standby" mode
Comment by Vladislavs Sokurenko [ 2023 Apr 25 ]

Thank you for your report Iit is fixed under ZBX-22310

Comment by Lauri Rood [ 2023 Apr 25 ]

So no backporting to 6.0? ZBX-22310 only mentions 6.2, 6.4 and 7.0... 

Comment by Vladislavs Sokurenko [ 2023 Apr 25 ]

This was about upgrade from 6.0 to 6.2,6.4 and 7.0, but you are right that minor upgrades should also be improved for 6.0

Comment by Vladislavs Sokurenko [ 2023 May 18 ]

Fixed in:

Comment by Lauri Rood [ 2024 Jan 09 ]

@Vladislavs Sokurenko

Not fixed...   (or reintroduced).

Just patched 6.0.23 to 6.025 

logs from standby host.. 

 

2749190:20240109:140508.427 HA manager has been stopped
  1520:20240109:140508.429 Zabbix Server stopped. Zabbix 6.0.23 (revision 315e9acac58).
  1683:20240109:140640.702 Starting Zabbix Server. Zabbix 6.0.25 (revision 1706b11e866).
  1683:20240109:140640.705 ****** Enabled features ******
  1683:20240109:140640.705 SNMP monitoring:           YES
  1683:20240109:140640.705 IPMI monitoring:           YES
  1683:20240109:140640.705 Web monitoring:            YES
  1683:20240109:140640.705 VMware monitoring:         YES
  1683:20240109:140640.705 SMTP authentication:       YES
  1683:20240109:140640.705 ODBC:                      YES
  1683:20240109:140640.705 SSH support:               YES
  1683:20240109:140640.705 IPv6 support:              YES
  1683:20240109:140640.705 TLS support:               YES
  1683:20240109:140640.705 ******************************
  1683:20240109:140640.705 using configuration file: zabbix_server.conf
  1683:20240109:140640.888 current database version (mandatory/optional): 06000000/06000043
  1683:20240109:140640.888 required mandatory version: 06000000
  1683:20240109:140640.888 optional patches were found
  1683:20240109:140640.888 starting automatic database upgrade
  1683:20240109:140641.357 completed 100% of database upgrade
  1683:20240109:140641.357 database upgrade fully completed
  2629:20240109:140641.552 starting HA manager
  2629:20240109:140641.639 HA manager started in standby mode
  1683:20240109:140641.640 "<server2>" node started in "standby" mode 

As long as I see, standby host performed DB upgrade here, while main server was still running ... it was done before HA manager even started, so there was no option for it to even think about not doing upgrades....

 

Comment by Vladislavs Sokurenko [ 2024 Jan 09 ]

This fixes broken database upgrade if started more than one Zabbix server at the same time on one database with or without high availability, it does not introduce any limits.

Comment by Lauri Rood [ 2024 Jan 10 ]

Can you elaborate... ? What broken DB upgrade? 

What I read in ticket description:

Standby node upgrades Zabbix DB schema, because database version check is initialized before HA manager starts:
Expected: 
Standby node will not upgrade DB schema, or at least Zabbix server will stop if incompatible DB schema is detected

This is marked as "closed" now... Resolution: fixed

Ho is it fixed, if it still does it? 

 

Comment by Vladislavs Sokurenko [ 2024 Jan 18 ]

Ticket description mention major upgrade from 6.0 to 6.2. It is still allowed to do minor upgrade from 6.0.23 to 6.0.25 as in your case without requiring that node is changed to standalone, but it has been improved that if accidentally launched 2 upgrades at the same time then it should not break database because when node upgrades database it will block other node.

Comment by Lauri Rood [ 2024 Jan 18 ]

All right.. If you guarantee, that you will not do breaking (DB) changes in minor upgrades (within one version), then its probably fine.

But for me it still looks as a place for potential disaster, if standby host performs upgrades in DB, while primary is running..  

Comment by Vladislavs Sokurenko [ 2024 Jan 18 ]

No breaking changes are introduced in minor versions. This fix handles situation when accidentally started upgrade on 2 nodes at the same time but it is better to stop the nodes and update one at a time. Also it handles situation when accidentally started upgrade on 2 standalone servers with same database. Database should not get broken now, however it is still better if such accidents don't occur if possible..

Generated at Tue May 13 09:21:16 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.