[ZBXNEXT-4900] Improve preprocessing performance Created: 2018 Dec 04  Updated: 2020 Nov 09

Status: Open
Project: ZABBIX FEATURE REQUESTS
Component/s: Server (S)
Affects Version/s: 4.2.0alpha1
Fix Version/s: None

Type: Change Request Priority: Major
Reporter: Andris Zeila Assignee: Zabbix Development Team
Resolution: Unresolved Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by ZBXNEXT-4914 Multiple preprocessing managers Closed

 Description   

With new features (LLD preprocessing, custom scripts) preprocessing might become a bottleneck. Currently it can push ~160k small values/sec on i7-7700 cpu. With preprocessing values becoming larger (especially in LLD case) and steps more complex (scripts) the performance drops and might not be enough.

There are two options - rework preprocessing to use worker threads instead of processes. The data exchange load will be significantly reduced, especially with large data. Another option is brute force approach - use multiple preprocessing managers each having own worker process set, split items between managers by itemid.
Some tests results by converting processes to threads (a 'trim' preprocessing step was applied to all values) :

Value Size (bytes) Values/sec
(trunk)
Values/sec
(threaded workers using sockets)
Values/sec
(threaded workers using queues)
4 167K 173k 590k
128 158K 170k 530k
1024 136K 148k 362k
2048 124K 141k 268k
4096 84K 127k 183k
8192 68K 99k 115k

Threaded workers using sockets

The worker processes were replaced with threads. The old communication protocol (sockets) was used, but instead of sending data only references to the data objects were sent. It could be optimized further, but still gives a rough estimate.

Threaded workers using queues

In this test the manager-worker communication was changed to simple mutex protected queues.

 

Or in worst case we can merge both options.

 



 Comments   
Comment by Glebs Ivanovskis [ 2018 Dec 06 ]

Network traffic compression was advertised to provide 5 times improvement in required bandwidth with "no impact on CPU or memory usage". Why can't it be used for sending LLD data to preprocessing workers?

Complexity of preprocessing steps shouldn't be the issue since it only affects preprocessing workers and you can have as many of them as you like.

wiper: I would say - no noticeable impact on CPU usage. But that's with large data. With smaller data packets it might have negative result. Currently we are limited by preprocessing manager performance, so we are looking either how to reduce or share it.

I did test pure preprocessing performance, without pushing data into history cache. With contested history cache the results would be even worse.

On the other hand if the compression is done in data gathering processes (is that what you were thinking?), then preprocessing manager has only to cache/forward the compressed data - so that should help with performance when large data (LLD) is processed.

 cyclone Honestly speaking, I wasn't thinking that far. But it sounds like a sensible approach to offload the preprocessing manager. And of course you are not obliged to compress all, small data can go uncompressed. It is a matter of protocol design to allow both types of messages. Since the protocol isn't for public use you are free to design it the way you like it.

As far as I recall, one and the only preprocessing manager was needed to keep the guarantee that the data which came first will be processed by triggers and actions first regardless of preprocessing steps it has to go through.

wiper: yes, but that guarantee is lost in history cache.

cyclone Well, I meant for items in one group of interdependent triggers... Isn't this guarantee still true if we neglect the fact that internal item processing is prioritized?

Comment by Evren Yurtesen [ 2020 Nov 09 ]

Is there a reason why preprocessing can't be moved to zabbix-agent side? this way the load can be distributed. Also with options like threshold/discard, the final transmitted data amount can be considerably smaller.

Generated at Fri Apr 26 12:44:02 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.