Since a couple of weeks, for hosts being monitored with SNMP, proxies randomly stop monitoring hosts with SNMP for up to an hour.
In /etc/zabbix/zabbix_proxy.conf, we have the following settings"
Here is an example in the log (with warning log level) where things work as expected:
There was an issue getting SNMP data, the proxy tried again shortly after, marked as unuavailabe, and short after was marked as available again.
Here is an example of unexpected behavior (log level 4 enabled after host was marked unavailable):
Zabbix only reports a single issue, then 2 minutes later, immediately marks the host as unavailable, and starts monitoring again after 1 hour (tcpdump confirmed no SNMP traffic to the host in between). This happens with random hosts (different hosts each time) and just a couple of times per day.
This seemed behavior seemed to have started randomly somewhere in Zabbix 5.0.x, and we still have it with Zabbix 5.2.4. Before that, everything was stable. The poller process usage is less then 25% at it's peak, unreachable pollers less than 4%. The host has plenty of resources. It happens.