[ZBXNEXT-3051] Count of actions has a significant impact on event processing Created: 2015 Nov 23 Updated: 2016 Apr 01 Resolved: 2016 Jan 19 |
|
Status: | Closed |
Project: | ZABBIX FEATURE REQUESTS |
Component/s: | Server (S) |
Affects Version/s: | 2.2.10 |
Fix Version/s: | 3.0.0beta1 |
Type: | Change Request | Priority: | Major |
Reporter: | Marc | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 14 |
Labels: | actions, conditions, history, performance, synchronization | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
Description |
The count of actions respectively action conditions may have a significant impact on performance of history syncers. Simplified illustration how the processing of events in connection with actions takes place: for each event do for each action do for each action condition do select_trigger_condition() In this process flow are quite some database queries involved. On a Zabbix installation with:
I was able to bring the system out of service in a few minutes by just adding 4 events per second in addition to the ambient noise. In opposite to that I was no more able to affect the service after having all actions disabled. Even not by generating many times more events per second. How about caching most relevant information needed for action processing in memory of Zabbix server? |
Comments |
Comment by Oleksii Zagorskyi [ 2015 Nov 23 ] |
Just fyi |
Comment by Marc [ 2015 Nov 25 ] |
Some statistics from a 50 seconds system call trace of one history syncer being affected by this issue:
Query count has been derived from count of sendto(select ...) system calls. Edit: % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 91.48 0.132029 8 17085 poll 4.65 0.006706 0 17122 sendto 2.19 0.003155 0 17086 recvfrom 1.68 0.002430 0 91964 semop 0.00 0.000000 0 38 write 0.00 0.000000 0 38 open 0.00 0.000000 0 38 close 0.00 0.000000 0 42 stat 0.00 0.000000 0 38 fstat 0.00 0.000000 0 38 mmap 0.00 0.000000 0 38 munmap 0.00 0.000000 0 6 rt_sigaction 0.00 0.000000 0 12 rt_sigprocmask 0.00 0.000000 0 5 nanosleep 0.00 0.000000 0 11 times 0.00 0.000000 0 1 restart_syscall ------ ----------- ----------- --------- --------- ---------------- 100.00 0.144320 143562 total |
Comment by Raymond Kuiper [ 2015 Dec 02 ] |
Perhaps an 'actions cache' might be a solution? |
Comment by Marc [ 2015 Dec 02 ] |
To me this must not necessarily be a dedicated cache. From my point of view this is configuration data and the existing configuration cache appears to me to be an appropriate place. I don't suspect the count of actions to be the major/only issue but the count of conditions and the matter of fact that most of condition checks result in SQL queries. Finally I think all related information (action + conditions) which currently lead to SQL queries in the act of history synchronization should be cached. |
Comment by Oleksii Zagorskyi [ 2015 Dec 02 ] |
I'm personally not very sure that internal cache for actions should help here a lot. I don't think that tables related to actions are so huge, even with ~100 actions etc. |
Comment by Marc [ 2015 Dec 02 ] |
zalex_ua, the issue is not related to the payload resp. size of configuration data which indeed is very low. I mean just do the math:
Now this query has been done ~16,000 times in 50 seconds what sums up to ~32 seconds just for doing SQL queries. |
Comment by Marc [ 2015 Dec 19 ] |
Btw, when proposing of "[...] caching most relevant information needed for action processing in memory [...]", In fact the database queries made there are the most time consuming part in the whole chain during history synchronization respectively action processing. Considering a scenario of having only 100 actions with 1000 conditions in total respectively 10 conditions per action in average for simplicity reasons, then the count of SQL queries to issue may be distributed like this over the event process chain:
It's definitely not my intention to say that caching actions and conditions only is not worth to! Personally I'd order the condition types from most worthy to consider for caching to least worthy as follows:
|
Comment by Andris Zeila [ 2016 Jan 07 ] |
I created |
Comment by Andris Zeila [ 2016 Jan 11 ] |
Specifications at https://www.zabbix.org/wiki/Docs/specs/ZBXNEXT-3051 Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-3051 |
Comment by Sandis Neilands (Inactive) [ 2016 Jan 15 ] |
(1) In configuration cache for actions we save only actionid, eventsource, evaltype, formula rows from the actions table. Fetching the rest of the rows is not necessary. sandis.neilands RESOLVED in r57682. wiper CLOSED |
Comment by Sandis Neilands (Inactive) [ 2016 Jan 15 ] |
(2) When documenting don't forget to mention the effect of CacheUpdateFrequency configuration parameter. wiper CLOSED. |
Comment by Sandis Neilands (Inactive) [ 2016 Jan 18 ] |
Successfully tested.
The performance is still limited by DB access elsewhere (see |
Comment by Andris Zeila [ 2016 Jan 19 ] |
Released in:
|
Comment by Andris Zeila [ 2016 Jan 19 ] |
(3) Documentation:
sasha CLOSED |
Comment by richlv [ 2016 Jan 19 ] |
should the 'alpha' above be changed to 'beta' now ? |
Comment by Oleksii Zagorskyi [ 2016 Apr 01 ] |
It caused a regression, see |