[ZBX-21571] Kubernetes: no trigger prototype associated with "Containers ready" condition Created: 2022 Sep 06  Updated: 2024 Apr 10  Resolved: 2023 Jul 07

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Templates (T)
Affects Version/s: 6.2.1
Fix Version/s: 6.0.20rc1, 6.4.5rc1, 7.0.0alpha3, 7.0 (plan)

Type: Problem report Priority: Trivial
Reporter: Julien Le Huludut Assignee: Denis Rasikhov
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File pod_latest_data.png    
Team: Team INT
Sprint: Sprint 102 (Jul 2023)
Story Points: 1

 Description   

Hi there !

We test monitoring our kubernetes cluster via zabbix with the "Kubernetes nodes by HTTP" template version 6.2.1

Description:

When a container is failing in a pod, kubernetes tries to restart it until it reaches the "CrashLoopBackoff" state. But while a container inside a pod is in this state, no alert is shown in zabbix.

Expected behaviour:

When any container is in "CrashLoopBackoff" state, a warning should be triggered by default.

 

Steps to reproduce:

1. Create a faulty container, let's call it Gorbatchev

apiVersion: v1
kind: Pod
metadata:
  name: gorbatchev
  namespace: test-zabbix
spec: 
  containers: 
    - image: "busybox"
      name: gorbatchev
      #This command will cause the container to fail
      args: ["perestroika"]
 

2. wait until it reaches the CrashLoopBackoff state

NAME             READY   STATUS             RESTARTS   AGE
pod/gorbatchev   0/1     CrashLoopBackOff   7          13m 

Result:
No alert is shown.

Expected:
A "trigger prototype" should be added to the template to alert when any pod has Conditions: Containers ready to false.

 

We were going to create the trigger on our zabbix instance but maybe this should be the default on the template ? Any pod in this state is an issue for cluster admins to investigate IMO.

 

Thanks in advance

Julien

 

 



 Comments   
Comment by Denis Rasikhov [ 2023 Jun 28 ]

If some of the containers are not ready in the pod, it doesn't directly means that the pod is in the CrashLoopBackOff state. In many examples of the AlertManager rules still the number of container restarts is used to determine that state. According to the Kubernetes documentation containers are restarted with an increasing exponential delay with a maximum of 5 minutes. Because of that the trigger expression will never work in case if there is only 1 container in the pod and will fire during first couple of minutes after pod creation if there is more than one container, but it'll close after that delay increases. For the proper functioning of the trigger thresholds must be adjusted to include cases with only one container in the pod as well as the evaluation period should be increased to take exponentiality of the back-off delay into account.

Comment by Denis Rasikhov [ 2023 Jul 02 ]

Fixed in:

Generated at Tue Jan 07 17:25:36 EET 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.