Loading...

XML

Word

Printable

Type: Problem report
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: None
Affects Version/s: 6.0.6, 6.2.0
Component/s: Server (S)
Labels:
- management
- preprocessing

Sprint:
Support backlog

Steps to reproduce:

Use "ClickHouse by HTTP" template for 15 hosts
Upgrade zabbix server from 5.2 to 6.0.6
Get troubles with preprocessing queue, also getting any other items' values stuck in the queue

-------------------

Hi. I was pretty close to just post a comment in ZBX-20590, but, being unsure is it the same issue or not, I've decided to create a new report.

I had performed 5.2.7 -> 6.0.6 upgrade a couple of days ago, and I faced a trouble: with preprocessing queue - it started to grow constantly.

Symptoms:

Preprocessing queue - growing uncontrollably
Preprocessing manager utilization - slowly, but surely was growing to 100%
Preprocessing workers utilization - I had 10 workers initially, and utilization did not raise higher than 40-50%. I did not try lowering the workers count, but I did try incresing it (up to 75-100) - did not solve the issue.
The worst part: looks like ALL items' values became stuck in the preprocessing queue. I'm not talking about dependent items only, or about items only generated by ClickHouse template - I'm talking about any existing zabbix items. Analyzing the output of "zabbix_server -R diaginfo=preprocessing" when the queue was full, "top values" were the values of "system.uptime" items from completely unrelated hosts, which are checked every 30s by default in "Linux by Zabbix Agent" template.

I've managed to figure out, that if I disable "ClickHouse by HTTP" template (it was used for something like 15 hosts before, and did not cause any troubles with zabbix-server 5.2), then the problem goes away. In the end, I've "solved" the issue by decreasing the rate of master item check in clickhouse template - from 1 minute to 10 minutes. Processing queue still spikes a bit every 10 minutes, but it goes away quickly afterwards.

In ZBX-20590, author mentions that his preprocessing workers got 100% utilization - this is not my case. In my case, it looks to me like preprocessing manager process was the problem, being not able to handle all the incoming data in real time and/or split it equally between all available workers.

I did not check preprocessing workers utilization per-individual-process, but I did check "TIME" (in linux "top" terminology) for them. Only first couple of worker processes had non-zero values, every other workers had zero - I think, that's why raising workers count from 10 to 100 did not make any difference.

I can provide any additional details (or try to perform any additional tests) you need, except for downgrading back to 5.x .

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

clickhouse.tables_example.txt
388 kB
2022 Jul 12 04:29

part of

ZBX-20590 preprocessing worker utilization

Closed

Assignee:: Andrejs Sitals (Inactive)

Reporter:: Alexandr Paliy

Votes:: 4 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2022 Jul 10 06:11

Updated:: 2022 Nov 16 12:01

Resolved:: 2022 Nov 14 14:53

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates