-
Type:
Problem report
-
Resolution: Unresolved
-
Priority:
Critical
-
Affects Version/s: 7.0.22
-
Component/s: Proxy (P), Server (S)
-
None
-
Environment:10x Proxy 7.0.22: 4 vcpu, 16 gb RAM, postgresql 16.5
-
S26-W10/11, S26-W12/13
-
0.5
Description
In our environment, we have 10 proxies serving approximately 40,000 hosts. Each proxy handles around 4,000 hosts and over 1 million items. All hosts are mostly uniform and share the same item set, with very few exceptions.
After updating the environment to Zabbix 7.0.21, we noticed the emergence of a new problem with deleting old items from the item_rtdata table (described in https://support.zabbix.com/browse/ZBX-26732). Even after upgrading further to 7.0.22, this issue persists in its current form:
Previous DELETE query (very slow):
delete from item_rtdata where (itemid in (ITEMID x 1000) or itemid in (ITEMID x 1000) or ...)
Current DELETE query (single IN clause with all items):
delete from item_rtdata where itemid in (ITEMID x all items to delete)
Although the current query reduces construction overhead, it still runs for a long time and continues generating locks and deadlocks in PostgreSQL, preventing the proxy from processing configuration.
Problem
The proxy experiences this problem when:
- Detached from its proxy group
- Unavailable for longer than the failover period
- Becomes available again
Previously, the proxy could remain unavailable for hours due to DELETE deadlocks in item_rtdata. In the current version (7.0.22), there is a high chance that the proxy recovers on its own, but it can still be offline for tens of minutes because of the same DELETE and lock contention issues.
Sample deadlock from PostgreSQL logs:
ERROR: deadlock detected DETAIL: Process 24585 waits for ShareLock on transaction 947797861; blocked by process 24583. Process 24583 waits for ShareLock on transaction 947796829; blocked by process 24585. Process 24585: delete from item_rtdata where (itemid in (....................) Process 24583: update item_rtdata set lastlogsize=10606934,mtime=1764755866 where itemid=33410972;
Proxy logs show errors such as:
cannot process received configuration data from server at "<ZABBIX DNS>": cannot remove old objects from table "item_rtdata"
The only reliable workaround is to detach all hosts from the proxy so that no items exist to update, allowing DELETE to complete without interference.
Questions / Requests
- Could the DELETE query on item_rtdata be refactored to use batched deletions in chunks of 1000 items, similar to the above web UI method?
This is how it looks in the DB.php file — the fragment responsible for deleting objects by ID:
while ($row = DBfetch($resource)) { $chunkids[] = $row[$pk_field_name]; if (count($chunkids) == self::CHUNK_SIZE) { self::deleteByIdField($table_name, $pk_field_name, $chunkids); $chunkids = []; } }
- Does Zabbix use SELECT ... FOR UPDATE when updating or deleting rows to acquire row-level locks proactively?
Could this mechanism also be applied here to avoid deadlocks on DELETE and UPDATE operations?