[ZBX-21613] Zabbix HA cluster standby node can start database upgrade Created: 2022 Sep 08 Updated: 2024 Apr 10 Resolved: 2023 May 21 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 6.2.2 |
Fix Version/s: | 6.0.18rc1, 7.0 (plan) |
Type: | Problem report | Priority: | Major |
Reporter: | Kaspars Mednis | Assignee: | Vladislavs Sokurenko |
Resolution: | Fixed | Votes: | 2 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Team: | |
Sprint: | Sprint 99 (Apr 2023), Sprint 100 (May 2023) |
Story Points: | 1 |
Description |
Steps to reproduce:
Result: 1978:20220908:070025.835 Starting Zabbix Server. Zabbix 6.2.2 (revision 35455866073). 1978:20220908:070025.835 ****************************** 1978:20220908:070025.835 using configuration file: /etc/zabbix/zabbix_server.conf 1978:20220908:070025.961 current database version (mandatory/optional): 06000000/06000004 1978:20220908:070025.961 required mandatory version: 06020000 1978:20220908:070025.961 starting automatic database upgrade 1978:20220908:070025.978 completed 1% of database upgrade 1978:20220908:070026.064 completed 2% of database upgrade 1978:20220908:070026.112 completed 3% of database upgrade ........ 1978:20220908:070039.160 completed 97% of database upgrade 1978:20220908:070039.425 completed 98% of database upgrade 1978:20220908:070039.443 completed 100% of database upgrade 1978:20220908:070039.445 database upgrade fully completed 2058:20220908:070039.486 starting HA manager 2058:20220908:070039.526 HA manager started in standby mode 1978:20220908:070039.526 "zbx_node_2" node started in "standby" mode Active node continues to work with incompatible DB schema and does not even crash: 7980:20220908:070026.691 [Z3005] query failed: [1412] Table definition has changed, please retry transaction [update triggers set state=1,error='Cannot evaluate function last(/Zabbix server/zabbix.nodes.status[cl7sp6fau0001fpqs4u4nctrw],#2): not enough data.' where triggerid=22963; ] 7963:20220908:070128.948 [Z3005] query failed: [1054] Unknown column 'lastaccess' in 'field list' [select hostid,proxy_hostid,host,ipmi_authtype,ipmi_privilege,ipmi_username,ipmi_password,maintenance_status,maintenance_type,maintenance_from,status,name,lastaccess,tls_connect,tls_accept,tls_issuer,tls_subject,tls_psk_identity,tls_psk,proxy_address,auto_compress,maintenanceid from hosts where status in (0,1,5,6) and flags<>2] zabbix_server [7963]: ERROR [file and function: <dbconfig.c,DCsync_configuration>, revision:c7c3044a4a2, line:6687] Something impossible has just happened. 7963:20220908:070128.948 === Backtrace: === 7963:20220908:070128.950 9: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](zbx_backtrace+0x3f) [0x55993573d8c0] 7963:20220908:070128.950 8: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](DCsync_configuration+0x310f) [0x5599356d3778] 7963:20220908:070128.950 7: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](dbconfig_thread+0x31e) [0x55993556ab0d] 7963:20220908:070128.950 6: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](zbx_thread_start+0x37) [0x55993574d9b3] 7963:20220908:070128.950 5: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](+0x66924) [0x559935558924] 7963:20220908:070128.950 4: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](MAIN_ZABBIX_ENTRY+0xa8c) [0x559935559d8e] 7963:20220908:070128.950 3: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](daemon_start+0x384) [0x55993573d4df] 7963:20220908:070128.950 2: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](main+0x33a) [0x5599355581ce] 7963:20220908:070128.950 1: /lib64/libc.so.6(__libc_start_main+0xf3) [0x7fb7575a5493] 7963:20220908:070128.950 0: /usr/sbin/zabbix_server: configuration syncer [synced configuration in 0.247249 sec, syncing configuration](_start+0x2e) [0x559935556f1e] Expected: |
Comments |
Comment by Kaspars Mednis [ 2022 Sep 08 ] |
Confirmed |
Comment by Vladislavs Sokurenko [ 2022 Sep 12 ] |
It would be nice to improve upgrade patch so that first patch would check if there are any active nodes and not upgrade in that case. |
Comment by Lauri Rood [ 2023 Feb 15 ] |
Same behavior is in version 6.0 with minor version upgrades.
1928460:20230215:125943.213 Zabbix Server stopped. Zabbix 6.0.9 (revision 64721203c07). 1814:20230215:130158.841 Starting Zabbix Server. Zabbix 6.0.13 (revision fdfa8cef9ce). 1814:20230215:130158.844 ****** Enabled features ****** 1814:20230215:130158.844 SNMP monitoring: YES 1814:20230215:130158.844 IPMI monitoring: YES 1814:20230215:130158.844 Web monitoring: YES 1814:20230215:130158.844 VMware monitoring: YES 1814:20230215:130158.844 SMTP authentication: YES 1814:20230215:130158.844 ODBC: YES 1814:20230215:130158.844 SSH support: YES 1814:20230215:130158.844 IPv6 support: YES 1814:20230215:130158.844 TLS support: YES 1814:20230215:130158.844 ****************************** 1814:20230215:130158.844 using configuration file: zabbix_server.conf 1814:20230215:130159.060 current database version (mandatory/optional): 06000000/06000006 1814:20230215:130159.060 required mandatory version: 06000000 1814:20230215:130159.060 optional patches were found 1814:20230215:130159.060 starting automatic database upgrade 1814:20230215:130159.084 completed 8% of database upgrade 1814:20230215:130159.104 completed 16% of database upgrade 1814:20230215:130159.118 completed 25% of database upgrade 1814:20230215:130159.174 completed 33% of database upgrade 1814:20230215:130159.197 completed 41% of database upgrade 1814:20230215:130159.204 completed 50% of database upgrade 1814:20230215:130159.217 completed 58% of database upgrade 1814:20230215:130159.247 completed 66% of database upgrade 1814:20230215:130200.861 completed 75% of database upgrade 1814:20230215:130200.909 completed 83% of database upgrade 1814:20230215:130200.936 completed 91% of database upgrade 1814:20230215:130200.958 completed 100% of database upgrade 1814:20230215:130200.958 database upgrade fully completed 2613:20230215:130201.289 starting HA manager 2613:20230215:130201.484 HA manager started in standby mode 1814:20230215:130201.484 "server 2" node started in "standby" mode |
Comment by Vladislavs Sokurenko [ 2023 Apr 25 ] |
Thank you for your report Iit is fixed under |
Comment by Lauri Rood [ 2023 Apr 25 ] |
So no backporting to 6.0? |
Comment by Vladislavs Sokurenko [ 2023 Apr 25 ] |
This was about upgrade from 6.0 to 6.2,6.4 and 7.0, but you are right that minor upgrades should also be improved for 6.0 |
Comment by Vladislavs Sokurenko [ 2023 May 18 ] |
Fixed in:
|
Comment by Lauri Rood [ 2024 Jan 09 ] |
@Vladislavs Sokurenko Not fixed... Just patched 6.0.23 to 6.025 logs from standby host..
2749190:20240109:140508.427 HA manager has been stopped 1520:20240109:140508.429 Zabbix Server stopped. Zabbix 6.0.23 (revision 315e9acac58). 1683:20240109:140640.702 Starting Zabbix Server. Zabbix 6.0.25 (revision 1706b11e866). 1683:20240109:140640.705 ****** Enabled features ****** 1683:20240109:140640.705 SNMP monitoring: YES 1683:20240109:140640.705 IPMI monitoring: YES 1683:20240109:140640.705 Web monitoring: YES 1683:20240109:140640.705 VMware monitoring: YES 1683:20240109:140640.705 SMTP authentication: YES 1683:20240109:140640.705 ODBC: YES 1683:20240109:140640.705 SSH support: YES 1683:20240109:140640.705 IPv6 support: YES 1683:20240109:140640.705 TLS support: YES 1683:20240109:140640.705 ****************************** 1683:20240109:140640.705 using configuration file: zabbix_server.conf 1683:20240109:140640.888 current database version (mandatory/optional): 06000000/06000043 1683:20240109:140640.888 required mandatory version: 06000000 1683:20240109:140640.888 optional patches were found 1683:20240109:140640.888 starting automatic database upgrade 1683:20240109:140641.357 completed 100% of database upgrade 1683:20240109:140641.357 database upgrade fully completed 2629:20240109:140641.552 starting HA manager 2629:20240109:140641.639 HA manager started in standby mode 1683:20240109:140641.640 "<server2>" node started in "standby" mode As long as I see, standby host performed DB upgrade here, while main server was still running ... it was done before HA manager even started, so there was no option for it to even think about not doing upgrades....
|
Comment by Vladislavs Sokurenko [ 2024 Jan 09 ] |
This fixes broken database upgrade if started more than one Zabbix server at the same time on one database with or without high availability, it does not introduce any limits. |
Comment by Lauri Rood [ 2024 Jan 10 ] |
Can you elaborate... ? What broken DB upgrade? What I read in ticket description:
Standby node upgrades Zabbix DB schema, because database version check is initialized before HA manager starts:
Expected:
Standby node will not upgrade DB schema, or at least Zabbix server will stop if incompatible DB schema is detected
This is marked as "closed" now... Resolution: fixed Ho is it fixed, if it still does it?
|
Comment by Vladislavs Sokurenko [ 2024 Jan 18 ] |
Ticket description mention major upgrade from 6.0 to 6.2. It is still allowed to do minor upgrade from 6.0.23 to 6.0.25 as in your case without requiring that node is changed to standalone, but it has been improved that if accidentally launched 2 upgrades at the same time then it should not break database because when node upgrades database it will block other node. |
Comment by Lauri Rood [ 2024 Jan 18 ] |
All right.. But for me it still looks as a place for potential disaster, if standby host performs upgrades in DB, while primary is running.. |
Comment by Vladislavs Sokurenko [ 2024 Jan 18 ] |
No breaking changes are introduced in minor versions. This fix handles situation when accidentally started upgrade on 2 nodes at the same time but it is better to stop the nodes and update one at a time. Also it handles situation when accidentally started upgrade on 2 standalone servers with same database. Database should not get broken now, however it is still better if such accidents don't occur if possible.. |