[#ZBX-23939] OpenShift/Kubernetes monitoring issue with node/pod discovery

[ZBX-23939] OpenShift/Kubernetes monitoring issue with node/pod discovery Created: 2024 Jan 12 Updated: 2024 Dec 17
Status:	Confirmed
Project:	ZABBIX BUGS AND ISSUES
Component/s:	Templates (T)
Affects Version/s:	6.4.10
Fix Version/s:	None

Type:

Problem report

Priority:

Major

Reporter:

Rudolf Kastl

Assignee:

Zabbix Development Team

Resolution:

Unresolved

Votes:

Labels:

None

Remaining Estimate:

Not Specified

Time Spent:

Not Specified

Original Estimate:

Not Specified

Description

Steps to reproduce:

Setup latest zabbix (6.4.10)
Setup zabbix kubernetes helm chart on OpenShift
Setup latest kubernetes 6.4 zabbix templates.
Watch node / pod discovery fail (likely because of memory limits)

Result:
169:20240112:182731.040 End of zbx_es_execute():FAIL RangeError: execution timeout
at [anon] (duktape.c:79217) internal
at [anon] (function:257) preventsyield allocated memory: 10662696 max allocated or requested memory: 529088086 max allowed memory: 536870912

Expected:
Have node and pod discovery work

NOTE: The node/pod discovery does work with smaller openshift clusters but does not scale.

To give you an idea about the amount of data:

curl -ik --header "Authorization: Bearer ${token}" -X GET https://api.ocpserver:6443/api/v1/pods | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 77.4M    0 77.4M    0     0  3293k      0 --:--:--  0:00:24 --:--:-- 7670k
2205714

meaning total 77.4 mb of data from the pods api which corresponds to: 2205714 lines

If you need any additional info, please let me know!

Comments

Comment by Fernando Ferraz [ 2024 Jan 16 ]

Hi guys, checking on the script responsible for gathering nodes and pods information [1], we think we might be hitting a memory limit in [2][3]. Pod information is gathered in batches of 1000 from Kubernetes API and stored in the array 'result' [2]. In our case, the information gathered from pods exceeds the maximum allowed [3], and there is no way to override the memory limit or implement any annotation and label filters, as filters are processed only later in the script.

Changing the annotation and label filters to iterate over the "1000" limit batches [5] before storing pods in the "result" array first may help mitigate the issue, as we could apply them to get only the pods we really want to track and reduce the need for memory.

Perhaps another alternative would be creating an environment variable to override the memory limit instead of having it hard-coded, but maybe there is some good reason to keep it that way and I'm just not seeing it yet.

[1] https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/kubernetes_http/kubernetes_nodes_http/template_kubernetes_nodes.yaml?at=refs%2Fheads%2Frelease%2F6.4#37

[2] https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/kubernetes_http/kubernetes_nodes_http/template_kubernetes_nodes.yaml?at=refs%2Fheads%2Frelease%2F6.4#173

[3] https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/src/libs/zbxembed/embed.c?at=refs%2Ftags%2F6.4.10#31

[4] https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/src/libs/zbxembed/embed.c?at=refs%2Ftags%2F6.4.10#659

[5] https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/kubernetes_http/kubernetes_nodes_http/template_kubernetes_nodes.yaml?at=refs%2Fheads%2Frelease%2F6.4#39

Comment by Rudolf Kastl [ 2024 Feb 01 ]

Additional info:

The large server of my initial post has around 3500 pods total.
Even on a smaller server with ~300-400 pods the pod discovery itsself works (no memory issue) but results in:

Cannot create item "kube.pod.phase.failed[somepod]": maximum dependent item count reached.

Comment by Igor Gorbach (Inactive) [ 2024 Feb 01 ]

Hello!

Thank you for reporting the issue and for the detailed investigation!

As you're able to see - the problem is related to hardcode limits by defining the environment variable or changing the source values directly in

https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/src/libs/zbxembed/embed.c?at=refs%2Ftags%2F6.4.10#31

and build the custom zabbix-proxy image for the helm chart, but be careful with implementation to prod.

Regards, Igor

Comment by Rudolf Kastl [ 2024 Feb 01 ]

The same is true for the maximum dependent item count (29999 limit by default).

Igor, do you see any potential implications with raising those limits? (memory and dependent item one?)

Comment by Fernando Ferraz [ 2024 May 02 ]

Hi igorbach, wouldn't it make sense to have ZBX_ES_MEMORY_LIMIT as an environment variable? I see the memory limit increased over time [1][2].

That would help us to tweak the memory limit to fit our requirements.

[1] https://support.zabbix.com/browse/ZBXNEXT-6386

[2] https://support.zabbix.com/browse/ZBX-22805

Comment by Evgenii Gordymov [ 2024 May 28 ]

Hi sfernand, these size ZBX_ES_MEMORY_LIMIT are made by security design.

Create ZBXNEXT task with full reasoning that the ZBX_ES_MEMORY_LIMIT as an environment variable needs to be added to the configuration.

Comment by Fernando Ferraz [ 2024 Dec 16 ]

hi,

Changing ZBX_ES_MEMORY_LIMIT wasn't our first option but considering it has been increased several times in the past I believe it is worth considering as a feature request. We would like to know from you if any other alternatives or workarounds would be possible to prevent the get Nodes/Pods from reaching these limits. Maybe changing the script itself, splitting it in two so we would have a script only for gathering node data and a separate one for pods).

Comment by Fernando Ferraz [ 2024 Dec 17 ]

I've created a feature request to make ZBX_ES_MEMORY_LIMIT configurable through an environment variable https://support.zabbix.com/browse/ZBXNEXT-9685

Generated at Tue Jul 01 08:40:31 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.

[ZBX-23939] OpenShift/Kubernetes monitoring issue with node/pod discovery Created: 2024 Jan 12 Updated: 2024 Dec 17