-
Incident report
-
Resolution: Unresolved
-
Trivial
-
None
-
None
-
None
-
None
We are running a large-scale Zabbix environment with thousands of devices and have been encountering a recurring issue when modifying templates linked to a large number of hosts.
The problem we are facing arises when, in the worst case, nearly 55.000 hosts are linked to the same template. When we attempt to update that template, the changes must be propagated to all those hosts. This propagation process is synchronous, and the frontend either waits for the operation to complete or returns a timeout.
As a test, we tried increasing the frontend timeout to around five minutes. However, even with this extended (and already impractical) timeout, the issue persisted. We have implemented some workarounds to reduce the impact, but these have added complexity to the platform and made updates more difficult to manage.
We understand that database performance and IOPS, MySQL in our case, are likely key contributing factors.
An asynchronous process for propagating template changes could be a significant improvement. However, we recognize that implementing such functionality in the Zabbix frontend would be quite complex.