[ZBX-23939] OpenShift/Kubernetes monitoring issue with node/pod discovery Created: 2024 Jan 12 Updated: 2024 Dec 17 |
|
Status: | Confirmed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Templates (T) |
Affects Version/s: | 6.4.10 |
Fix Version/s: | None |
Type: | Problem report | Priority: | Major |
Reporter: | Rudolf Kastl | Assignee: | Zabbix Development Team |
Resolution: | Unresolved | Votes: | 1 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Description |
Steps to reproduce:
Result: Expected: NOTE: The node/pod discovery does work with smaller openshift clusters but does not scale. To give you an idea about the amount of data: curl -ik --header "Authorization: Bearer ${token}" -X GET https://api.ocpserver:6443/api/v1/pods | wc -l % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 77.4M 0 77.4M 0 0 3293k 0 --:--:-- 0:00:24 --:--:-- 7670k 2205714 meaning total 77.4 mb of data from the pods api which corresponds to: 2205714 lines If you need any additional info, please let me know! |
Comments |
Comment by Fernando Ferraz [ 2024 Jan 16 ] |
Hi guys, checking on the script responsible for gathering nodes and pods information [1], we think we might be hitting a memory limit in [2][3]. Pod information is gathered in batches of 1000 from Kubernetes API and stored in the array 'result' [2]. In our case, the information gathered from pods exceeds the maximum allowed [3], and there is no way to override the memory limit or implement any annotation and label filters, as filters are processed only later in the script. Changing the annotation and label filters to iterate over the "1000" limit batches [5] before storing pods in the "result" array first may help mitigate the issue, as we could apply them to get only the pods we really want to track and reduce the need for memory. Perhaps another alternative would be creating an environment variable to override the memory limit instead of having it hard-coded, but maybe there is some good reason to keep it that way and I'm just not seeing it yet.
|
Comment by Rudolf Kastl [ 2024 Feb 01 ] |
Additional info: The large server of my initial post has around 3500 pods total.
Cannot create item "kube.pod.phase.failed[somepod]": maximum dependent item count reached.
|
Comment by Igor Gorbach (Inactive) [ 2024 Feb 01 ] |
Hello! Thank you for reporting the issue and for the detailed investigation! As you're able to see - the problem is related to hardcode limits by defining the environment variable or changing the source values directly in and build the custom zabbix-proxy image for the helm chart, but be careful with implementation to prod. Regards, Igor |
Comment by Rudolf Kastl [ 2024 Feb 01 ] |
The same is true for the maximum dependent item count (29999 limit by default). Igor, do you see any potential implications with raising those limits? (memory and dependent item one?) |
Comment by Fernando Ferraz [ 2024 May 02 ] |
Hi igorbach, wouldn't it make sense to have ZBX_ES_MEMORY_LIMIT as an environment variable? I see the memory limit increased over time [1][2]. That would help us to tweak the memory limit to fit our requirements.
[1] https://support.zabbix.com/browse/ZBXNEXT-6386 [2] https://support.zabbix.com/browse/ZBX-22805
|
Comment by Evgenii Gordymov [ 2024 May 28 ] |
Hi sfernand, these size ZBX_ES_MEMORY_LIMIT are made by security design. Create ZBXNEXT task with full reasoning that the ZBX_ES_MEMORY_LIMIT as an environment variable needs to be added to the configuration.
|
Comment by Fernando Ferraz [ 2024 Dec 16 ] |
hi, Changing ZBX_ES_MEMORY_LIMIT wasn't our first option but considering it has been increased several times in the past I believe it is worth considering as a feature request. We would like to know from you if any other alternatives or workarounds would be possible to prevent the get Nodes/Pods from reaching these limits. Maybe changing the script itself, splitting it in two so we would have a script only for gathering node data and a separate one for pods). |
Comment by Fernando Ferraz [ 2024 Dec 17 ] |
I've created a feature request to make ZBX_ES_MEMORY_LIMIT configurable through an environment variable https://support.zabbix.com/browse/ZBXNEXT-9685 |