[ZBX-15572] history syncer 100% CPU Created: 2019 Feb 01  Updated: 2019 Feb 04  Resolved: 2019 Feb 04

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: None
Fix Version/s: None

Type: Incident report Priority: Critical
Reporter: Egon Burgener Assignee: Arturs Lontons
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Top.PNG     Text File debuglog.txt     PNG File item.config.png     PNG File item.config2.png     Text File strace_output.txt    
Issue Links:
Duplicate
duplicates ZBX-15428 PCRE regexp seems use much more CPU r... Closed

 Description   

We encounter a problem with the history syncer process. The history syncer process reaches 100% CPU while processing log items. According to the process information it takes always around 60 seconds to import a few 100 items. As a result the history write cache decreases.

After doing a lot of tests, I may exclude issues with the DB. There are also no slow queries reported. Checking the process with strace I notice that the process is doing constantly semop and brk operations very fast. (see attachements)

Also the debuglog of the zabbix-server reports constantly substitute functions.

We encounter the issue with Zabbix 3.2 and 3.4.

I think there is an issue on processing logdata.



 Comments   
Comment by Glebs Ivanovskis [ 2019 Feb 01 ]

Check triggers for these items. If there are, for example, regex(5m,...) and item has thousands of values per second it will cause massive load for history syncer due to its internal logic.

Comment by Egon Burgener [ 2019 Feb 01 ]

Thanks for the hint. Yes we use 14 Triggers with 2 regex checks each on that logfile. Therefore 28 regex to eval on each logline. 

The goal is to monitor application state based on applications logoutput. What is the recommended way to achieve this? Is there another way to achieve this with zabbix?

Comment by Glebs Ivanovskis [ 2019 Feb 01 ]

Therefore 28 regex to eval on each logline.

If it is regex(<interval>, ...) and not regex(#<count>, ...) it is even worse. For every new log line Zabbix will apply regexp to every log line within <interval>, therefore load is proportional to number of lines per second squared.

Things to try:

  1. optimize regular expressions themselves or even replace them with multiple str() functions;
  2. consider using shorter interval or replace <interval> with #<count>;
  3. offload regexp matching to the agent, make it send only important lines;
  4. prevent "log storms" by limiting amount of lines agent is allowed to send per second.

By the way, there may be an improvement coming your way in the form of ZBX-15428.

Comment by Arturs Lontons [ 2019 Feb 04 ]

Hi,
I'll be closing this issue as a clone of ZBX-15428.
Please adjust your expressions as described by Glebs.

The performance of regexp functions should also be improved in updates 4.0.4 and 4.2.0

Generated at Thu Apr 25 10:41:14 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.