Loading...

XML

Word

Printable

Type: Problem report
Resolution: Duplicate
Priority: Trivial
Fix Version/s: None
Affects Version/s: 7.0.21
Component/s: Proxy (P), Server (S)
Labels:
None

In our environment, we have 10 proxies serving approximately 40,000 hosts. These proxies are grouped into a proxy group, where each proxy handles around 4,000 hosts and over 1 million items.

We intentionally avoid frequent polling to reduce load, so the average Required vps value remains around 1600.
The environment is generally stable — VM resource usage as well as internal Zabbix proxy process loads stay around 20–30%.

All hosts behind the proxy group are mostly uniform and share the same item set. The majority of items come from templates linked to all hosts, and there are very few unique or exceptional hosts with custom items.

Problem

We previously experienced configuration refresh problems after changing item configuration (more details in ticket https://support.zabbix.com/browse/ZBX-26732).
After upgrading the entire environment to 7.0.21, the original problem was resolved, but a new (similar) issue appeared.

We noticed a very long configuration resynchronization time in the following situation:

detaching a proxy from its proxy group

the proxy being unavailable for longer than the failover period (becomes “unavailable”)

When the proxy becomes available again, the configuration syncer remains at 80–100% load for a very long time (measured in hours), and the proxy does not start collecting data.

The only workaround we have found so far was to:

detach the proxy from the proxy group

detach all hosts from this proxy (GUI must show 0 NVPS)

restart the proxy

wait until the proxy processes the configuration (several minutes)

reattach the proxy to the proxy group

After reviewing zabbix-proxy and PostgreSQL logs, we noticed that the configuration syncer has significant problems removing old configuration, especially in the item_rtdata table, after a proxy is detached from a proxy group.

The DELETE query looks very similar to the one involved in the earlier issue (~~ZBX-26732~~):

delete from item_rtdata where
  (itemid in (ITEMID x 1000) or itemid in (ITEMID x 1000) or ...)

This DELETE operation runs extremely slowly, but additionally it is repeatedly interrupted by another query, resulting in deadlocks:
2025-12-03 10:00:02.894 UTC [24585] zabbix@zabbix_proxy ERROR: deadlock detected
2025-12-03 10:00:02.894 UTC [24585] zabbix@zabbix_proxy DETAIL: Process 24585 waits for ShareLock on transaction 947797861; blocked by process 24583.
Process 24583 waits for ShareLock on transaction 947796829; blocked by process 24585.
Process 24585: delete from item_rtdata where (itemid in (....................
Process 24583: update item_rtdata set lastlogsize=10606934,mtime=1764755866 where itemid=33410972;
update item_rtdata set lastlogsize=10547081,mtime=1764755871 where itemid=33410973;
update item_rtdata set lastlogsize=10575986,mtime=1764755871 where itemid=33410974;
..............................

This causes the following error on the proxy side:
24385:20251203:100003.058 End of zbx_proxyconfig_process()
24385:20251203:100003.058 cannot process received configuration data from server at "<ZABBIX DNS>": cannot remove old objects from table "item_rtdata"
24385:20251203:100003.152 End of process_configuration_sync()
24385:20251203:100003.153 zbx_setproctitle() title:'configuration syncer [synced config 4680782 bytes in 53.426632 sec, idle 10 sec]'

This process continues indefinitely — the proxy is unable to remove old configuration from item_rtdata because deadlocks occur continuously.
The above workaround resolves the issue because with no items assigned to the proxy, there are no updates to item_rtdata, so DELETE can finally complete.

Questions

Can this DELETE query be optimized in the same way as the SELECT query in ticket ~~ZBX-26732~~?
For example:

- batching DELETE operations in groups of 1000 item IDs, or

- performing DELETE per item (similar to how UPDATE is executed).

Does Zabbix use SELECT ... FOR UPDATE when updating or deleting rows to acquire row-level locks proactively?
Could this mechanism also be applied here to avoid deadlocks on DELETE and UPDATE operations?

related to

ZBX-26769 Remove slow "or" conditions from "in" statements

Closed

Assignee:: Zabbix Development Team
Reporter:: Piotr Zakrzewski
Votes:: 1 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: 2025 Dec 05 18:39
Updated:: 2025 Dec 08 12:53
Resolved:: 2025 Dec 08 12:53

Details

Description

Problem

Questions

Attachments

Issue Links

Activity

People

Dates