Loading...

XML

Word

Printable

Type: Incident report
Resolution: Duplicate
Priority: Trivial
Fix Version/s: None
Affects Version/s: 6.4.5
Component/s: Server (S)
Labels:
None

This issue is very similar to:

[ZBX-20590] preprocessing worker utilization - ZABBIX SUPPORT

~~ZBX-23012~~ Slow query from LLD worker - ZABBIX SUPPORT

NOTE: Below only describes how my team found about this issue in our specific use case, but the issue is wider than the specific templates I have mentioned.

We are currently using Kubernetes templates to discover pods etc in some of our clusters, which are quite big, and where some nodes (call them batch job nodes) constantly are redeployed, so new pods are created, which means they are discovered etc. So now, we ended up with some hosts with 28,000+ items.

We then discovered the LLD queue was forever growing and never completed. We checked our config and saw we have 20+ LLD workers configured, but only 1-2 workers are actually doing work and are pinning CPU threads to 100%. After doing some debugging and seeing what items the LLD workers were actually working on, it was the kubernetes pod related ones. **

Unfortuantely, kube-state-metrics doesn't give us a way to filter pods by host/annotation/node group etc, so we are working on some internal solution for that.

But the point of this issue is, the LLD workers exhibit the same issue as per ~~ZBX-20590~~ preprocessing worker utilization - ZABBIX SUPPORT where the preprocessing workers pin the CPU and the queue grows.

The "quick" fix is (again I am just assuming here) to NOT rely on a parent item for every discovery and instead perform the discovery every time for each discovery item. That way, a different thread should pick up the job. Yes, it means significantly more calls to the initial API/target (in our case, kube-state-metrics) but in my case I don't mind/I am happy to compromise that to ensure the queue actually gets cleared.

Happy to provide further details/screenshots where needed.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

zbx_23012_6_4-1.patch
16 kB
2023 Aug 31 11:38

duplicates

ZBX-23012 Slow query from LLD worker

Closed

Assignee:: Edgar Akhmetshin

Reporter:: Steve

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023 Aug 30 04:26

Updated:: 2023 Sep 01 10:40

Resolved:: 2023 Sep 01 10:40

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates