[ZBX-21159] Zabbix HA Manager crashing Created: 2022 Jun 02 Updated: 2024 Apr 10 Resolved: 2022 Jul 11 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 6.0.5 |
Fix Version/s: | 6.0.7rc1, 6.2.1rc1, 6.4.0alpha1, 6.4 (plan) |
Type: | Problem report | Priority: | Critical |
Reporter: | Chris Bateson | Assignee: | Andris Zeila |
Resolution: | Fixed | Votes: | 1 |
Labels: | HA, crash, highavailability, selinux | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
RHEL 8.6 |
Issue Links: |
|
||||
Team: | |||||
Sprint: | Sprint 90 (Jul 2022) | ||||
Story Points: | 1 |
Description |
I have my system set to auto-update (I know mistake Config LogFile=/var/log/zabbix/zabbix_server.log LogFileSize=0 DebugLevel=4 PidFile=/var/run/zabbix/zabbix_server.pid SocketDir=/var/run/zabbix DBName=zabbix DBUser=zabbix DBPassword=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX StartPollers=25 StartPollersUnreachable=100 SNMPTrapperFile=/var/log/snmptrap/snmptrap.log CacheSize=64M Timeout=20 LogSlowQueries=3000 StatsAllowedIP=127.0.0.1 Steps to reproduce:
Result: 8823:20220602:094806.426 Starting Zabbix Server. Zabbix 6.0.5 (revision 8da3e1f8419). 8823:20220602:094806.426 ****** Enabled features ****** 8823:20220602:094806.426 SNMP monitoring: YES 8823:20220602:094806.426 IPMI monitoring: YES 8823:20220602:094806.427 Web monitoring: YES 8823:20220602:094806.427 VMware monitoring: YES 8823:20220602:094806.427 SMTP authentication: YES 8823:20220602:094806.427 ODBC: YES 8823:20220602:094806.427 SSH support: YES 8823:20220602:094806.427 IPv6 support: YES 8823:20220602:094806.427 TLS support: YES 8823:20220602:094806.427 ****************************** 8823:20220602:094806.427 using configuration file: /etc/zabbix/zabbix_server.conf 8823:20220602:094806.427 In zbx_load_modules() 8823:20220602:094806.427 End of zbx_load_modules():SUCCEED 8823:20220602:094806.427 In zbx_ipc_service_start() service:rtc 8823:20220602:094806.427 In zbx_ipc_socket_open() 8823:20220602:094806.427 End of zbx_ipc_socket_open():FAIL 8823:20220602:094806.427 End of zbx_ipc_service_start():SUCCEED 8823:20220602:094806.427 In zbx_db_get_database_type() 8823:20220602:094806.427 In DBconnect() flag:0 8823:20220602:094806.432 End of DBconnect():0 8823:20220602:094806.432 query [txnlev:0] [select userid from users limit 1] 8823:20220602:094806.432 there is at least 1 record in "users" table 8823:20220602:094806.432 End of zbx_db_get_database_type():ZBX_DB_SERVER 8823:20220602:094806.432 In init_database_cache() 8823:20220602:094806.432 In zbx_mem_create() param:'HistoryCacheSize' size:16777216 8823:20220602:094806.432 valid user addresses: [0x7f0437693170, 0x7f0438692ff0] total size: 16776832 8823:20220602:094806.432 End of zbx_mem_create() 8823:20220602:094806.432 In zbx_mem_create() param:'HistoryIndexCacheSize' size:4194304 8823:20220602:094806.432 valid user addresses: [0x7f0437293180, 0x7f0437692ff0] total size: 4193904 8823:20220602:094806.432 End of zbx_mem_create() 8823:20220602:094806.432 In init_trend_cache() 8823:20220602:094806.432 In zbx_mem_required_size() size:0 chunks_num:1 descr:'trend cache' param:'TrendCacheSize' 8823:20220602:094806.432 End of zbx_mem_required_size() size:422 8823:20220602:094806.432 In zbx_mem_create() param:'TrendCacheSize' size:4194304 8823:20220602:094806.432 valid user addresses: [0x7f0436e93170, 0x7f0437292ff0] total size: 4193920 8823:20220602:094806.432 End of zbx_mem_create() 8823:20220602:094806.432 End of init_trend_cache() 8823:20220602:094806.432 End of init_database_cache() 8823:20220602:094806.432 In DBconnect() flag:0 8823:20220602:094806.434 End of DBconnect():0 8823:20220602:094806.434 query [txnlev:0] [select default_character_set_name,default_collation_name from information_schema.SCHEMATA where schema_name='zabbix'] 8823:20220602:094806.434 query [txnlev:0] [select count(*) from information_schema.`COLUMNS` where table_schema='zabbix' and data_type in ('text','varchar','longtext') and (character_set_name not in ('utf8','utf8mb3','utf8mb4') or collation_name not in ('utf8_bin','utf8mb3_bin','utf8mb4_bin'))] 8823:20220602:094806.444 In DBconnect() flag:0 8823:20220602:094806.444 End of DBconnect():0 8823:20220602:094806.444 In zbx_dbms_version_info_extract() 8823:20220602:094806.444 End of zbx_dbms_version_info_extract() version:80026 8823:20220602:094806.444 In DBcheck_version() 8823:20220602:094806.444 In DBconnect() flag:0 8823:20220602:094806.445 End of DBconnect():0 8823:20220602:094806.445 query [txnlev:0] [show tables like 'dbversion'] 8823:20220602:094806.446 query [txnlev:0] [select mandatory,optional from dbversion] 8823:20220602:094806.446 current database version (mandatory/optional): 06000000/06000002 8823:20220602:094806.447 required mandatory version: 06000000 8823:20220602:094806.447 End of DBcheck_version():SUCCEED 8823:20220602:094806.447 In DBconnect() flag:0 8823:20220602:094806.448 End of DBconnect():0 8823:20220602:094806.448 query [txnlev:0] [show columns from config like 'dbversion_status'] 8823:20220602:094806.450 query [txnlev:0] [show index from history where key_name='PRIMARY'] 8823:20220602:094806.451 In DBflush_version_requirements() 8823:20220602:094806.451 query without transaction detected 8823:20220602:094806.451 query [txnlev:0] [update config set dbversion_status='[{"database":"MySQL","current_version":"8.00.26","min_version":"5.07.28","max_version":"8.00.x","history_pk":1,"min_supported_version":"8.00.0","flag":0}]'] 8823:20220602:094806.452 End of DBflush_version_requirements() 8823:20220602:094806.452 In DBcheck_double_type() 8823:20220602:094806.452 In DBconnect() flag:0 8823:20220602:094806.453 End of DBconnect():0 8823:20220602:094806.453 query [txnlev:0] [select count(*) from information_schema.columns where table_schema='zabbix' and column_type='double' and ((lower(table_name)='trends' and (lower(column_name) in ('value_min', 'value_avg', 'value_max'))) or (lower(table_name)='history' and lower(column_name)='value'))] 8823:20220602:094806.454 End of DBcheck_double_type() 8823:20220602:094806.454 In DBconnect() flag:0 8823:20220602:094806.455 End of DBconnect():0 8823:20220602:094806.455 query [txnlev:0] [select configid,instanceid from config order by configid] 8823:20220602:094806.456 In zbx_ha_start() 8823:20220602:094806.456 In zbx_ipc_service_recv() timeout:1.000 8824:20220602:094806.456 zbx_setproctitle() title:'ha manager' 8824:20220602:094806.456 starting HA manager 8824:20220602:094806.457 In zbx_ipc_service_start() service:haservice 8824:20220602:094806.457 In zbx_ipc_socket_open() 8824:20220602:094806.457 End of zbx_ipc_socket_open():FAIL 8824:20220602:094806.457 End of zbx_ipc_service_start():SUCCEED 8824:20220602:094806.457 In zbx_ipc_async_socket_open() 8824:20220602:094806.457 In zbx_ipc_socket_open() 8823:20220602:094807.457 End of zbx_ipc_service_recv():2 8823:20220602:094807.458 In zbx_ipc_service_recv() timeout:1.000 8823:20220602:094808.459 End of zbx_ipc_service_recv():2 8823:20220602:094808.459 In zbx_ipc_service_recv() timeout:1.000 8823:20220602:094809.460 End of zbx_ipc_service_recv():2 8823:20220602:094809.460 In zbx_ipc_service_recv() timeout:1.000 8823:20220602:094810.461 End of zbx_ipc_service_recv():2 8823:20220602:094810.461 In zbx_ipc_service_recv() timeout:1.000 8823:20220602:094811.461 End of zbx_ipc_service_recv():2 8823:20220602:094811.461 In zbx_ipc_service_recv() timeout:1.000 8823:20220602:094812.462 End of zbx_ipc_service_recv():2 8823:20220602:094812.462 In zbx_ipc_service_recv() timeout:1.000 8823:20220602:094813.464 End of zbx_ipc_service_recv():2 8823:20220602:094813.464 In zbx_ipc_service_recv() timeout:1.000 8823:20220602:094814.465 End of zbx_ipc_service_recv():2 8823:20220602:094814.465 In zbx_ipc_service_recv() timeout:1.000 8823:20220602:094815.465 End of zbx_ipc_service_recv():2 8823:20220602:094815.465 In zbx_ipc_service_recv() timeout:1.000 8823:20220602:094816.466 End of zbx_ipc_service_recv():2 8823:20220602:094816.467 One child process died (PID:8824,exitcode/signal:9). Exiting ... 8823:20220602:094816.467 End of zbx_ha_start():FAIL 8823:20220602:094816.467 cannot start HA manager: timeout while waiting for HA manager registration Expected: |
Comments |
Comment by Christoph Schmocker [ 2022 Jun 07 ] |
The same issue here. After update from 6.0.4 to 6.0.5, the HA manager is starting without a config. 32787:20220607:085316.471 Starting Zabbix Server. Zabbix 6.0.5 (revision 8da3e1f8419). 32787:20220607:085316.471 ****** Enabled features ****** 32787:20220607:085316.471 SNMP monitoring: YES 32787:20220607:085316.471 IPMI monitoring: YES 32787:20220607:085316.471 Web monitoring: YES 32787:20220607:085316.471 VMware monitoring: YES 32787:20220607:085316.471 SMTP authentication: YES 32787:20220607:085316.471 ODBC: YES 32787:20220607:085316.471 SSH support: YES 32787:20220607:085316.471 IPv6 support: YES 32787:20220607:085316.471 TLS support: YES 32787:20220607:085316.471 ****************************** 32787:20220607:085316.471 using configuration file: /etc/zabbix/zabbix_server.conf 32787:20220607:085316.588 current database version (mandatory/optional): 06000000/06000002 32787:20220607:085316.588 required mandatory version: 06000000 32787:20220607:085316.625 database could be upgraded to use primary keys in history tables 32788:20220607:085316.666 starting HA manager 32787:20220607:085326.678 One child process died (PID:32788,exitcode/signal:9). Exiting ... 32787:20220607:085326.678 cannot start HA manager: timeout while waiting for HA manager registration |
Comment by Chung Yun Loo [ 2022 Jun 09 ] |
I ran into the same issue on a CentOS 8 Stream server. While reviewing the SELinux audit messages, I found the following log entries: UID="zabbix" GID="zabbix" EUID="zabbix" SUID="zabbix" FSUID="zabbix" EGID="zabbix" SGID="zabbix" FSGID="zabbix" type=AVC msg=audit(1654803948.194:241440): avc: denied { connectto } for pid=18382 comm="zabbix_server" path="/run/zabbix/zabbix_server_rtc.sock" scontext=system_u:system_r:zabbix_t:s0 tcontext=system_u:system_r:zabbix_t:s0 tclass=unix_stream_socket permissive=0 type=SYSCALL msg=audit(1654803948.194:241440): arch=c000003e syscall=42 success=no exit=-13 a0=e a1=7ffdb4ab0900 a2=6e a3=2 items=0 ppid=18381 pid=18382 auid=4294967295 uid=993 gid=990 euid=993 suid=993 fsuid=993 egid=990 sgid=990 fsgid=990 tty=(none) ses=4294967295 comm="zabbix_server" exe="/usr/sbin/zabbix_server_mysql" subj=system_u:system_r:zabbix_t:s0 key=(null)ARCH=x86_64 SYSCALL=connect AUID="unset" UID="zabbix" GID="zabbix" EUID="zabbix" SUID="zabbix" FSUID="zabbix" EGID="zabbix" SGID="zabbix" FSGID="zabbix" type=SERVICE_STOP msg=audit(1654803948.290:241441): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=zabbix-server comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'UID="root" AUID="unset" It's a SELinux access violation when zabbix server tries to open the socket file /run/zabbix/zabbix_server_rtc.sock. Passing the SELinux audit message thru audit2why returned the following result: [root@zabbix ~]# audit2why < selinux.log type=AVC msg=audit(1654803948.194:241440): avc: denied { connectto } for pid=18382 comm="zabbix_server" path="/run/zabbix/zabbix_server_rtc.sock" scontext=system_u:system_r:zabbix_t:s0 tcontext=system_u:system_r:zabbix_t:s0 tclass=unix_stream_socket permissive=0 Was caused by: The boolean daemons_enable_cluster_mode was set incorrectly. Description: Allow daemons to enable cluster mode Allow access by executing: # setsebool -P daemons_enable_cluster_mode 1 So, as root, issue the following command to allow clustering in SELinux... setsebool -P daemons_enable_cluster_mode 1 or using the "on" and "off" keywords: setsebool -P daemons_enable_cluster_mode on (The -P option writes the policy change to disk so it persists between reboots.) Two additional SELinux policy changes are required to allow Apache HTTP Server to initiate network connections and connect to zabbix server, otherwise there's a connection error on the Zabbix dashboard: setsebool -P httpd_can_network_connect 1 setsebool -P httpd_can_connect_zabbix 1 Might also need to restart zabbix server depending on your particular setup: systemctl restart zabbix-server To see the entire list of SELinux boolean values: getsebool -a It looks like the problem is from the upgraded SELinux policy packages. From my server's /var/log/dnf.log: 2022-06-09T01:24:45-0500 DEBUG Upgraded: selinux-policy-3.14.3-99.el8.noarch 2022-06-09T01:24:45-0500 DEBUG Upgraded: selinux-policy-targeted-3.14.3-99.el8.noarch 2022-06-09T01:24:45-0500 DEBUG Upgraded: zabbix-selinux-policy-6.0.5-1.el8.x86_64 |
Comment by Chris Bateson [ 2022 Jun 10 ] |
You are correct it looks like it was SELinux related. To be honest I completely forgot I enabled that but figured I'd go check after reviewing your comment. Sure enough I have it set to enforcing. |
Comment by Jurijs Klopovskis [ 2022 Jun 29 ] |
Released updated zabbix-selinux-policy-6.0.6-2 package for rhel 7 & 8 A buggy %postun added scriptlet was addded in 6.0.5. It purged the the installed zabbix_policy not only during package deinstallation as it should, but also during an update. If you have the buggy 6.0.5-1 or 6.0.6-1 package installed, then direct update to 6.0.6-2 will not work, because the old buggy package will still purge zabbix_policy. You must first uninstall the old zabbix-selinux-policy package and then install the new one. # dnf remove zabbix-selinux-policy # dnf clean all # dnf install zabbix-selinux-policy During deinstallation of the buggy package, you may see the following message libsemanage.semanage_direct_remove_key: Unable to remove module zabbix_policy at priority 400. (No such file or directory). semodule: Failed! That's OK. Upgrade from 6.0.4 and older should be OK. |
Comment by Andris Zeila [ 2022 Jul 11 ] |
Released
|