-
Incident report
-
Resolution: Cannot Reproduce
-
Trivial
-
None
-
6.0.10
-
None
-
Kubernetes
We add and remove items heavily for jobs on our Kubernetes cluster. We are running a modified version of the Kubernetes monitoring template optimized for this.
One of the otimisations we are running is that we are using a regex preprocessing step to level down data
Item 1: All kuberentes stats in a line seperated format i.e
production/x4b-job1/{json data} production/x4b-job2/{json data}
Item 2. Inside a LLD rule extract the json data for that namespace and job using regex
Item 3. Inside the same LLD rule JSONPath or similar on this to get the required value
We have found this to be a high performance solution for this sort of data
What we have also found is a bug. Sometimes we get an attempt to store data before the preprocessor does it's thing in the final (item 3) item. This should be impossible.
Result:
`
In the logs:
414:20221116:063300.678 item "kubernetes.newark.production.k8s.x4b.net:kube.job.status_failed[production/x4b-kq-serverupdate-1190750]" became not supported: Value of type "string" is not suitable for value type "Numeric (unsigned)". Value "production/x4b-kq-serverupdate-1191082/{"kube_job_annotations":1,"kube_job_labels":1,"kube_job_info":1,"kube_job_created":1668580323,"kube_job_spec_parallelism":1,"kube_job_spec_completions":1,"kube_job_status_succeeded":1,"kube_job_status_failed":0,"kube_job_status_active":0,"kube_job_complete":1,"kube_job_status_start_time":1668580323,"kube_job_status_completion_time":1668580371,"kube_job_owner":1} production/x4b-kq-serverupdate-1191083/{"kube_job_annotations":1,"kube_job_labels":1,"kube_job_info":1,"kube_job_created":1668580323,"kube_job_spec_parallelism":1,"kube_job_spec_completions":1,"kube_job_status_succeeded":0,"kube_job_status_failed":0,"kube_job_status_active":1,"kube_job_status_start_time":1668580323,"kube_job_owner":1} production/x4b-kq-serverupdate-1191084/{"kube_job_annotations":1,"kube_job_labels":1,"kube_job_info":1,"kube_job_created":1668580323,"kube_job_spec_parallelism":1,"kube_job_spec_completions":1,"kube_job_status_succeeded":1,"kube_job_status_failed":0,"kube_job_status_active":0,"kube_job_complete":1,"kube_job_status_start_time":1668580323,"kube_job_status_completion_time":1668580369,"kube_job_owner":1} production/x4b-kq-serverupdate-1191085/{"kube_job_annotations":1,"kube_job_labels":1,"kube_job_info":1,"kube_job_created":1668580323,"kube_job_spec_parallelism":1,"kube_job_spec_completions":1,"kube_job_status_succeeded":0,"kube_job_status_failed":0,"kube_job_status_active":1,"kube_job_status_start_time":1668580323,"kube_job_owner":1} production/x4b-kq-serverupdate-1191086/{"kube_job_annotations":1,"kube_job_labels":1,"kube_job_info":1,"kube_job_created":1668580323,"kube_job_spec_parallelism":1,"kube_job_spec_completions":1,"kube_job_status_succeeded":1,"kube_job_status_failed":0,"kube_job_status_active":0,"kube_job_complete":1,"kube_job_status_start_time":1668580323,"kube_job_status_completion_time":1668580373,"kube_job_owner":1}
Expected:
An integer value and no error.
Any error that occurs should have been handled by the preprocessor fail handling which tells me the item is being executed before the preprocessor steps are created by LLD or at-least before Zabbix Server (S) sees that it is.