With new features (LLD preprocessing, custom scripts) preprocessing might become a bottleneck. Currently it can push ~160k small values/sec on i7-7700 cpu. With preprocessing values becoming larger (especially in LLD case) and steps more complex (scripts) the performance drops and might not be enough.
There are two options - rework preprocessing to use worker threads instead of processes. The data exchange load will be significantly reduced, especially with large data. Another option is brute force approach - use multiple preprocessing managers each having own worker process set, split items between managers by itemid.
Some tests results by converting processes to threads (a 'trim' preprocessing step was applied to all values) :
|Value Size (bytes)||Values/sec
(threaded workers using sockets)
(threaded workers using queues)
Threaded workers using sockets
The worker processes were replaced with threads. The old communication protocol (sockets) was used, but instead of sending data only references to the data objects were sent. It could be optimized further, but still gives a rough estimate.
Threaded workers using queues
In this test the manager-worker communication was changed to simple mutex protected queues.
Or in worst case we can merge both options.