[ZBX-6731] Log files are slow processed by Agent on Unix Created: 2013 Jun 19  Updated: 2017 Dec 25  Resolved: 2014 Jan 27

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 2.0.7rc1, 2.1.0
Fix Version/s: 2.0.12rc1, 2.2.3rc1, 2.3.0

Type: Incident report Priority: Major
Reporter: Alexey Pustovalov Assignee: Unassigned
Resolution: Fixed Votes: 2
Labels: logrt, performance, slow
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Causes
causes ZBX-13253 Log file monitoring starts from byte ... Closed
Duplicate
is duplicated by ZBX-7564 in case of missing logfiles agent sen... Closed

 Description   

Sometimes Zabbix agent slow process log files because of some of used functions and logic (agent scans directory for each iteration and send only one line from log file) are not quite good.

Some of information which should help improve performance of log monitoring on Linux systems:

1. opendir sometimes can be slow. We can use getdents directly: http://man7.org/linux/man-pages/man2/getdents.2.html
2. Instead of checking folder every time we can use notify for Linux OSs: http://www.ibm.com/developerworks/linux/library/l-inotify/index.html?ca=drs- and http://en.wikipedia.org/wiki/Inotify

Most of interesting is second point. With it we can wait changes of specified directory instead of scan directory.



 Comments   
Comment by Marc [ 2013 Jun 20 ]

Is this related to the fact that a logrt[*] might subsequently open a file ~25k times for one run?
Yesterday I was informed about a Zabbix-Agent process that consumes one CPU core. A quick strace showed lots of open() calls.
I noticed the directory walks mentioned in the description as well but they seem not to consume that much in comparison to the open("logfile") calls.

I'm not in the habit of seeing zabbix_agentd consuming much cpu time

Comment by Marc [ 2013 Jul 04 ]

What about additionally allowing to specify a list of files or file names in log[*]?
We have some applications which switch between fixed (mostly two) files. In such situations a directory walk can be completely avoided.

Edit:
Another thing I wonder about is, why not rotating through log lines until either eof is reached or flood prevention takes place?

Edit:
Previous comment was meant in connection with logrt[*]

Comment by Andris Mednis [ 2014 Jan 17 ]

I'm working to make log file processing faster. "i-notify" is not yet planned to use as there are simpler, non platform-specific optimizations to be done first.

Comment by Andris Mednis [ 2014 Jan 27 ]

For Zabbix 2.0 available in development branch svn://svn.zabbix.com/branches/dev/ZBX-6731.

Changes implemented for "log[]" items:

  • a regular expression for matching log file record is compiled only if the last compilation cannot be reused,
  • new log file records are read into 256 kB buffer in one operation, then regular expression is applied to every record in the buffer and matching records are sent to Zabbix server. No needless repetitive file opening, seeking and data copying.

Changes implemented for "logrt[]" items:

  • regular expression for filename now compiled once per check,
  • all changes for "log[]" items benefit "logrt[]" items performance, too.

These changes benefit Zabbix agent on Microsoft Windows platform as well.

Comment by Andris Zeila [ 2014 Jan 30 ]

(1) file.c:194 The check

while (p < p_end)

Should account for the szbyte value used later in memcmp() functions

Similarly the check at file.c:204

if (p + szbyte < p_end && 0 == memcmp(p + szbyte, lf, szbyte))	/* CR+LF (Windows) */

andris RESOLVED in r42044.

wiper CLOSED

Comment by Andris Zeila [ 2014 Jan 30 ]

(2) Please review my changes in r41977, r41978

andris Thanks for improvement! Reviewed and accepted. I added r42016, r42020 to your r41977.

wiper CLOSED

Comment by Andris Zeila [ 2014 Jan 31 ]

(3) We should use _lseeki64 on Windows platform. I think it would be best to create zbx_lseek() function to avoid having defines in every place lseek has been used like:

#ifdef _WINDOWS
	if (-1L != _lseeki64(f, (__int64)l_size, SEEK_SET))
#else
	if ((off_t)-1 != lseek(f, (off_t)l_size, SEEK_SET))
#endif

Also we might create a zbx_offset_t typedef for offset handling (as off_t is not supported by Windows)

andris RESOLVED in r42118.
andris RESOLVED in r42123 (zbx_lseek() converted from function to macro).

wiper CLOSED

Comment by Andris Zeila [ 2014 Feb 03 ]

Successfully tested

Comment by Alexander Vladishev [ 2014 Feb 03 ]

ZBX-7740 should be closed after a merge into 2.2.

Comment by Alexander Vladishev [ 2014 Feb 03 ]

(4) Cannot compile zabbix_get under Windows.

file.o : error LNK2019: unresolved external symbol _regexp_match_ex referenced in function _zbx_read2 ..\..\..\bin\win32\zabbix_get.exe : fatal error LNK1120: 1 unresolved externals
NMAKE : fatal error U1077: '"C:\Program Files\Microsoft Visual Studio 9.0\VC\Bin\link.exe"' : return code '0x460'

andris RESOLVED in r42223.

wiper CLOSED

Comment by Andris Mednis [ 2014 Feb 11 ]

For Zabbix 2.2 available in development branch svn://svn.zabbix.com/branches/dev/ZBX-6731-22 .

Comment by Andris Zeila [ 2014 Feb 11 ]

(5) Possible memory leak in process_logrt() - regular expression is compiled but not freed is not freed if directory opening (_wfindfirst() in windows) failed.

andris RESOLVED in r42512 (for 2.0) and in r42510 (for v.2.2).

wiper CLOSED

Comment by Andris Mednis [ 2014 Feb 12 ]

(6) Log file truncation is not handled correctly for log[] and logrt[] items.

andris RESOLVED in r42738 (for 2.0) and in r42704 (for v.2.2).

wiper CLOSED

Comment by Andris Zeila [ 2014 Feb 27 ]

Successfully tested.

Comment by richlv [ 2014 Mar 01 ]

(7) definitely whatsnew worthy

andris
For 2.0 documented in https://www.zabbix.com/documentation/2.0/manual/introduction/whatsnew2012?&#daemon_improvements
For 2.2 documented in https://www.zabbix.com/documentation/2.2/manual/introduction/whatsnew223?&#daemon_improvements

sasha Reviewed by wiper. CLOSED

Comment by richlv [ 2014 Mar 01 ]

(8) thanks for a great commit message. among other things it says "This affects also server and proxy" - how exactly, which features are affected ?

andris Function zbx_regexp() now remembers the last compiled regexp and flags. If the same regexp is used in the next invocation of zbx_regexp(), then regexp compilation is skipped. It is used in trigger functions "str", "regexp", "iregexp", "logeventid" and in processing of SNMP traps. Avoiding of regexp recompilation turned out to be very efficient in processing of log files (7 times faster in tests), but it will likely be of minor effect in server/proxy.

<richlv> thanks for the info. according to the discussion, on the server side there would be one "cache" slot per process. as processes would most likely use different regexps (think one trigger, then another), there would be no noticeable performance change. as for snmp trapper, it would most likely also change between regexps (unless only one regexp would be used at all ), so no changes to perf either.

CLOSING

Comment by richlv [ 2014 Mar 01 ]

a copy of the commit message, as it was very informative :

The main goal of the change is to improve performance (less time, fewer system calls,
less CPU usage) of active checks for log[] and logrt[] items. Platform specific methods
like GNU/Linux "i-notify" are not used.

Please note that this change does not modify limits on maximum number of log file records
checked in one check and number of matching records sent to server in one check.

Before this change:
  - new log file records were read record-by-record, reading of each record involved opening,
    seeking, reading and closing of the log file.

  - zbx_regexp() compiled a regular expression every time,

  - for logrt[] items file mask regular expression was compiled for every file in a check.
    The log files were selected first by regular expression (file mask), then by last mtime.

After this change:
  - new log file records are read into 256 kB buffer in one operation, then a regular
    expression is applied to every record in the buffer and matching records are sent
    to Zabbix server. No repetitive file opening, seeking for every record.

  - zbx_regexp() compiles a regular expresion and rememebers it.
    This increases performance if the same regular expression is used several times in
    a row (e.g. for matching log file records). This affects also server and proxy.

  - for logrt[] items file mask regular expression is compiled only once in a check.
    The log files are selected first by the last mtime, then by regular expression (file mask)
    to improve performance.

  - Zabbix agent detects oversized log file records (longer than 256 kB). Only the first 256 kB
    are matched against the regular expression and the rest of the record is ignored.

  - a new Zabbix datatype "zbx_offset_t" and a new function "zbx_lseek()" is added
    to encapsulate "lseek()" differences between UNIX/GNU/Linux and Microsoft Windows,

  - Make files for "zabbix_get" and "zabbix_sender" on Microsoft Windows do not include
    src/libs/zbxcommon/file.c anymore.

andris Additionally for v.2.2 and up:

  • regexp_sub() used for substituting regexp captured groups with parts of output template now remembers the last compiled regular expression,
  • before the change if a log file was not accessible, a log[] item did not go into NOTSUPPORTED state due to a bug.
    Now the log[] item becomes NOTSUPPORTED if the log file is not accessible.

Additionally for v.2.0 and up:

  • processing of logrt[] items now includes a code to minimize data loss (i.e. skipping log file records without analyzing them) in case of setting the system clock back in time (setting the clock ahead of time is harmless in our case).
    However, this code is currently not effective because other parts of agent do not yet detect and adapt to setting the clock back in time.
Comment by Andris Mednis [ 2014 Mar 03 ]

Available for Zabbix 2.2 in svn://svn.zabbix.com/branches/dev/ZBX-6731-22 r43135.

wiper tested 2.2 version

Comment by Andris Mednis [ 2014 Mar 05 ]

Fixed in versions pre-2.0.12 r43102, pre-2.2.3 r43233, pre-2.3.0 r43239.

Comment by Andris Mednis [ 2014 Mar 27 ]

Improved documentation at https://www.zabbix.com/documentation/2.2/manual/introduction/whatsnew223?&#daemon_improvements about conditions when log[] and logrt[] items become NOTSUPPORTED.

Comment by Andris Mednis [ 2014 May 07 ]

Improved documentation at https://www.zabbix.com/documentation/2.0/manual/introduction/whatsnew2012?&#daemon_improvements about conditions when logrt[] items become NOTSUPPORTED.

wiper reviewed.

Comment by Andris Mednis [ 2014 May 22 ]

The fix caused a regression ZBX-8238.

Generated at Thu Apr 25 12:36:23 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.