Loading...

Type: Problem report
Resolution: Unresolved
Priority: Trivial
Fix Version/s: None
Affects Version/s: 8.0.0beta1
Component/s: Server (S)
Labels:
None

🐞 Zabbix Server transient mass “not supported (database error)” state during MySQL restart

Summary:
During a short MySQL restart, Zabbix Server (8.0.0 beta1) temporarily marks a large number of items and discovery rules as “not supported (database error)”, even though database connectivity is restored automatically within seconds and the server fully recovers without restart.

This leads to unnecessary monitoring noise and temporary instability in item/discovery state consistency during normal database maintenance operations.

Environment:

Zabbix Server: 8.0.0 beta1
OS: Ubuntu Server 26.04(systemd)
Database: MySQL 8.4.x
DB connection: local UNIX socket (/var/run/mysqld/mysqld.sock)
Deployment: single-node
Service manager: systemd

Steps to reproduce:
1. Start Zabbix Server under normal load
2. Restart MySQL:
systemctl restart mysql
3. Observe Zabbix Server behavior during DB outage and recovery

Actual behavior:

Immediate DB loss detected:
[Z3001] connection to database 'zabbix' failed
Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock'
database is down: reconnecting in 10 seconds

During DB downtime:
multiple items and discovery rules become “not supported (database error)”

After MySQL recovery:
database connection re-established
all items and discovery rules return to supported state automatically

No Zabbix Server restart is triggered; recovery is automatic

Expected behavior:
Short DB unavailability (such as MySQL restart) should not cause mass transitions of items/discovery rules into “not supported” state. A short grace period or suppression of transient DB failure propagation is expected.

Impact:

No data loss
No Zabbix server crash
No manual intervention required
However:
- temporary large-scale monitoring noise
- state flapping across many items
- reduced clarity during maintenance windows

Frequency:

100% reproducible during MySQL restart
Window: ~10–20 seconds

Severity:
Medium (operational noise / observability instability, not functional failure)

──────────────────────────────────────────────
Systemd configuration verification (mitigation test)
──────────────────────────────────────────────

During investigation, systemd configuration was validated to ensure the issue is not caused by service dependency propagation or restart ordering.

Final tested systemd unit used during reproduction:

[Unit]
Description=Zabbix Server
After=network.target mysql.service mysqld.service mariadb.service
Wants=mysql.service

[Service]
Environment="CONFFILE=/etc/zabbix/zabbix_server.conf"
EnvironmentFile=-/etc/default/zabbix-server

Type=forking
PIDFile=/run/zabbix/zabbix_server.pid

ExecStart=/usr/sbin/zabbix_server -c $CONFFILE
ExecStop=/bin/sh -c '[ -n "$MAINPID" ] && kill -TERM "$MAINPID"'

Restart=on-failure
RestartSec=10s

TimeoutStartSec=infinity
TimeoutStopSec=infinity

KillMode=control-group

LimitNOFILE=65536:1048576

StartLimitIntervalSec=30s
StartLimitBurst=5

Result of systemd verification:

Zabbix Server does NOT stop when MySQL restarts
No StopPropagatedFrom or dependency cascade behavior is involved
No systemd-triggered restart occurs
DB outage handling is fully internal to Zabbix process logic

Conclusion:
The issue is independent of systemd configuration. It is caused by Zabbix internal database reconnection handling, which propagates short DB outages into mass “not supported” state transitions.

Suggested improvement:

Introduce configurable DB outage grace period (e.g. 5–10 seconds)
Suppress transient unsupported state transitions during short DB outages
Improve DB reconnect state handling to reduce monitoring noise during maintenance windows

Details

Description

Attachments

Activity

People

Dates