-
Incident report
-
Resolution: Unresolved
-
Trivial
-
None
-
6.4.2
-
None
When using the Kubernetes Node by HTTP template -> https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/kubernetes_http/kubernetes_nodes_http , the item "Kubernetes: Get nodes" is expected to run at least once per minute. Some of the triggers are built like this:
count(/Kubernetes nodes by HTTP/kube.pod.status.phase{#POD},10m, "regexp","^(1|4|5)$")>=9
This means that this will only trigger if the check returns 9 or more failures in a 10min period, so the only way that could happen is "Get Nodes" needs to run once per minute.
The problem I am facing is, we have some node groups which are very dynamic in that they can spin up nodes every so often, so I am filtering them using "{$KUBE.NODE.FILTER.LABELS}" but the problem is, we check the K8s API so often that there is a race condition between us checking for the nodes and the K8s API adding the labels to the nodes, so occasionally we get nodes we don't care about.
I think the templates need to be re-worked to not need to be run every 1min.