Steps to visibly reproduce the issue
- Setup: Zabbix server, active proxy, agent
- Set the agent to be monitored by active proxy. Ideally if that is a test setup where the proxy does not monitor other hosts.
- Open two parallel SSH sessions to proxy host where one session continuously displays amount of items still unsent to Zabbix server in the proxy buffer.
- Command to display the values.
- Import the attached Zabbix agent load test template to server.
- Link the template to the host monitored by proxy.
- Reload the proxy configuration cache by running
- Watch the items unsent to Zabbix server. Normally there should be 0 items but occasionally some values may appear in the list.
- Stop communication between proxy and server by e.g. closing the firewall port (or optionally stopping Zabbix server).
- Watch as the amount of unsent items on the proxy rapidly increases.
- Let it collect the data for around 5 minutes so that there is some data to process.
- Meanwhile Unlink and clear the template from the monitored test agent.
- Link some other small template, e.g. Template App OS Linux
- Wait until Zabbix server reloads configuration cache (by default in 60 seconds).
- Re-enable the communication between proxy and server.
- Watch as the amount of unsent items on the proxy rapidly decreases while the items are still in the proxy configuration cache. It attempts to clear the buffer by sending 1000 item ids multiple times per second even if the items do not exist on the server anymore.
- It is visible in the proxy log with DebugLevel=4 that it sends full chunks of collected data.
- Reload the proxy configuration cache by running zabbix_proxy -R config_cache_reload . At this moment proxy learns from the server that there are no more load test items in its cache.
- Watch as the amount of unsent items on the proxy decreases only by 1000 ids every second because the proxy scans 1000 ids but sends only the values of the items that still remain in the configuration cache for from those 1000 ids.
- The actual bug. In case from scanned 1000 ids there will be at least one item removed from the cache, the proxy data sender will send only the values for the items found in the cache and sleep for 1 second thus the buffer may clear very slowly even though there are much more values to send in the buffer.
- There is no way to speed up clearing of the backlog without losing the collected history. For this the proxy must be stopped and the following queries must be executed in proxy database:
- After starting the proxy and configuration cache reload the unsent buffer will be 0 again.