Loading...

XML

Word

Printable

Type: Problem report
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 7.0.22, 7.4.6, 8.0.0alpha1
Component/s: Server (S)
Labels:
- HA
- failover
- sigkill

Sprint:
Support backlog
Story Points:
1

Action

Killing zabbix_server main process.

Expected

zabbix server switch to standby node

Observed

zabbix server does not respond to runtime commends and does not switch to a secondary node. At least some data continue to come in (not tested how long).

Replication:

Installed two clean install ubuntu servers with latest patches
192.168.196.23 ha1.local ha1
192.168.196.24 ha2.local ha2

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 24.04.3 LTS
Release:        24.04
Codename:       noble

Standart 7.0.22. installation on ubuntu , frontend and DB on ha1 for testing.

On servers zabbix 7.0.22 is installed
ha1 packages

zabbix-agent/zabbix,now 1:7.0.22-1+ubuntu24.04 amd64 [installed]
zabbix-frontend-php/zabbix,now 1:7.0.22-1+ubuntu24.04 all [installed]
zabbix-nginx-conf/zabbix,now 1:7.0.22-1+ubuntu24.04 all [installed]
zabbix-release/zabbix,zabbix,now 1:7.0-2+ubuntu24.04 all [installed]
zabbix-server-mysql/zabbix,now 1:7.0.22-1+ubuntu24.04 amd64 [installed]
zabbix-sql-scripts/zabbix,now 1:7.0.22-1+ubuntu24.04 all [installed]

ha2 packages

zabbix-agent/zabbix,now 1:7.0.22-1+ubuntu24.04 amd64 [installed]
zabbix-release/zabbix,zabbix,now 1:7.0-2+ubuntu24.04 all [installed]
zabbix-server-mysql/zabbix,now 1:7.0.22-1+ubuntu24.04 amd64 [installed]

Servers configured in HA mode
ha1

HANodeName=ha1
NodeAddress=192.168.196.23:10051

ha2

HANodeName=ha2
NodeAddress=192.168.196.24:10051

Processes on ha1

systemd-+-ModemManager---3*[{ModemManager}]
        |-agetty
        |-cron
        |-dbus-daemon
        |-mariadbd---79*[{mariadbd}]
        |-multipathd---6*[{multipathd}]
        |-nginx---7*[nginx]
        |-php-fpm8.3---9*[php-fpm8.3]
        |-polkitd---3*[{polkitd}]
        |-rsyslogd---3*[{rsyslogd}]
        |-snmpd
        |-sshd---sshd---sshd---bash---tmux: client
        |-systemd-+-(sd-pam)
        |         `-dbus-daemon
        |-systemd-journal
        |-systemd-logind
        |-systemd-network
        |-systemd-resolve
        |-systemd-timesyn---{systemd-timesyn}
        |-systemd-udevd
        |-tmux: server---bash---sudo---sudo---su---bash---pstree
        |-udisksd---5*[{udisksd}]
        |-unattended-upgr---{unattended-upgr}
        `-zabbix_server-+-45*[zabbix_server]
                        |-zabbix_server---16*[{zabbix_server}]
                        |-zabbix_server---5*[{zabbix_server}]
                        `-4*[zabbix_server---{zabbix_server}]

Processes on ha2

systemd-+-ModemManager---3*[{ModemManager}]
        |-agetty
        |-cron
        |-dbus-daemon
        |-multipathd---6*[{multipathd}]
        |-polkitd---3*[{polkitd}]
        |-rsyslogd---3*[{rsyslogd}]
        |-snmpd
        |-sshd---sshd---sshd---bash---tmux: client
        |-systemd-+-(sd-pam)
        |         `-dbus-daemon
        |-systemd-journal
        |-systemd-logind
        |-systemd-network
        |-systemd-resolve
        |-systemd-timesyn---{systemd-timesyn}
        |-systemd-udevd
        |-tmux: server---bash---sudo---sudo---su---bash---pstree
        |-udisksd---5*[{udisksd}]
        |-unattended-upgr---{unattended-upgr}
        `-zabbix_server---zabbix_server

Zabbix server shows working HA with Failover delay 60 seconds

root@ha1:~# zabbix_server -R ha_status
Failover delay: 60 seconds
Cluster status:
   #  ID                        Name                      Address                        Status      Last Access
   1. cmlfdgn8f0001b8819kmay2tp ha1                       192.168.196.23:10051           active      1s
   2. cmlfdgub50001le82yj2ulgwr ha2                       192.168.196.24:10051           standby     5s

Find and kill active zabbix server process

root@ha1:~# ps -ef|grep "zabbix_server -c"
zabbix      1992       1  0 14:51 ?        00:00:00 /usr/sbin/zabbix_server -c /etc/zabbix/zabbix_server.conf
root        2110    1291  0 14:54 pts/2    00:00:00 grep --color=auto zabbix_server -c
root@ha1:~# kill -9 1992

Processes now are owned by systemd

/systemd-+-ModemManager---3*[{ModemManager}]
        |-agetty
        |-cron
        |-dbus-daemon
        |-mariadbd---79*[{mariadbd}]
        |-multipathd---6*[{multipathd}]
        |-nginx---7*[nginx]
        |-php-fpm8.3---10*[php-fpm8.3]
        |-polkitd---3*[{polkitd}]
        |-rsyslogd---3*[{rsyslogd}]
        |-snmpd
        |-sshd---sshd---sshd---bash---tmux: client
        |-systemd-+-(sd-pam)
        |         `-dbus-daemon
        |-systemd-journal
        |-systemd-logind
        |-systemd-network
        |-systemd-resolve
        |-systemd-timesyn---{systemd-timesyn}
        |-systemd-udevd
        |-tmux: server---bash---sudo---sudo---su---bash---pstree
        |-udisksd---5*[{udisksd}]
        |-unattended-upgr---{unattended-upgr}
        |-45*[zabbix_server]
        |-zabbix_server---16*[{zabbix_server}]
        |-zabbix_server---5*[{zabbix_server}]
        `-4*[zabbix_server---{zabbix_server}]

Start waiting period , zabbix server does not switch to secondary node . HA remains in standby

root@ha1:~# date
Tue Feb 10 02:55:06 PM UTC 2026
root@ha1:~# zabbix_server -R ha_status
zabbix_server [2125]: Cannot perform runtime control command: Timeout while waiting for response

after three minutes waiting , nothing changes

root@ha1:~# date
Tue Feb 10 02:58:06 PM UTC 2026
root@ha1:~# zabbix_server -R ha_status
zabbix_server [2145]: Cannot perform runtime control command: Timeout while waiting for response

no changes on secondary node too

root@ha2:~# date
Tue Feb 10 03:00:14 PM UTC 2026
root@ha2:~# zabbix_server -R ha_status
Runtime commands can be executed only in active mode

meanwhile zabbix server writes messages in zabbix server log in ha1

2001:20260210:150158.744 cannot send history syncer notification
2001:20260210:150216.760 cannot write to IPC socket: Broken pipe
2001:20260210:150216.760 cannot send history syncer notification

Service does not react on restart , processes can be only killed.
After killing processes on ha1 and starting zabbix server on ha1 again, have switched to second node

root@ha2:~# zabbix_server -R ha_status
Failover delay: 60 seconds
Cluster status:
   #  ID                        Name                      Address                        Status      Last Access
   1. cmlfdgn8f0001b8819kmay2tp ha1                       192.168.196.23:10051           standby     3s
   2. cmlfdgub50001le82yj2ulgwr ha2                       192.168.196.24:10051           active      2s

in server logfile is message

 857:20260210:150823.947 "ha2" node switched to "active" mode

Assignee:: Zabbix Development Team
Reporter:: Guntis Liepins
Votes:: 1 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: 2026 Feb 10 17:45
Updated:: 2026 Apr 23 15:11

Estimated:

Not Specified

Remaining:

Not Specified

Logged:

Details

Description

Attachments

Activity

People

Dates

Time Tracking