[ZBX-6731] Log files are slow processed by Agent on Unix Created: 2013 Jun 19 Updated: 2017 Dec 25 Resolved: 2014 Jan 27 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 2.0.7rc1, 2.1.0 |
Fix Version/s: | 2.0.12rc1, 2.2.3rc1, 2.3.0 |
Type: | Incident report | Priority: | Major |
Reporter: | Alexey Pustovalov | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 2 |
Labels: | logrt, performance, slow | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Issue Links: |
|
Description |
Sometimes Zabbix agent slow process log files because of some of used functions and logic (agent scans directory for each iteration and send only one line from log file) are not quite good. Some of information which should help improve performance of log monitoring on Linux systems: 1. opendir sometimes can be slow. We can use getdents directly: http://man7.org/linux/man-pages/man2/getdents.2.html Most of interesting is second point. With it we can wait changes of specified directory instead of scan directory. |
Comments |
Comment by Marc [ 2013 Jun 20 ] |
Is this related to the fact that a logrt[*] might subsequently open a file ~25k times for one run? I'm not in the habit of seeing zabbix_agentd consuming much cpu time |
Comment by Marc [ 2013 Jul 04 ] |
What about additionally allowing to specify a list of files or file names in log[*]? Edit: Edit: |
Comment by Andris Mednis [ 2014 Jan 17 ] |
I'm working to make log file processing faster. "i-notify" is not yet planned to use as there are simpler, non platform-specific optimizations to be done first. |
Comment by Andris Mednis [ 2014 Jan 27 ] |
For Zabbix 2.0 available in development branch svn://svn.zabbix.com/branches/dev/ZBX-6731. Changes implemented for "log[]" items:
Changes implemented for "logrt[]" items:
These changes benefit Zabbix agent on Microsoft Windows platform as well. |
Comment by Andris Zeila [ 2014 Jan 30 ] |
(1) file.c:194 The check
while (p < p_end)
Should account for the szbyte value used later in memcmp() functions Similarly the check at file.c:204 if (p + szbyte < p_end && 0 == memcmp(p + szbyte, lf, szbyte)) /* CR+LF (Windows) */ andris RESOLVED in r42044. wiper CLOSED |
Comment by Andris Zeila [ 2014 Jan 30 ] |
(2) Please review my changes in r41977, r41978 andris Thanks for improvement! Reviewed and accepted. I added r42016, r42020 to your r41977. wiper CLOSED |
Comment by Andris Zeila [ 2014 Jan 31 ] |
(3) We should use _lseeki64 on Windows platform. I think it would be best to create zbx_lseek() function to avoid having defines in every place lseek has been used like: #ifdef _WINDOWS if (-1L != _lseeki64(f, (__int64)l_size, SEEK_SET)) #else if ((off_t)-1 != lseek(f, (off_t)l_size, SEEK_SET)) #endif Also we might create a zbx_offset_t typedef for offset handling (as off_t is not supported by Windows) andris RESOLVED in r42118. wiper CLOSED |
Comment by Andris Zeila [ 2014 Feb 03 ] |
Successfully tested |
Comment by Alexander Vladishev [ 2014 Feb 03 ] |
|
Comment by Alexander Vladishev [ 2014 Feb 03 ] |
(4) Cannot compile zabbix_get under Windows. file.o : error LNK2019: unresolved external symbol _regexp_match_ex referenced in function _zbx_read2 ..\..\..\bin\win32\zabbix_get.exe : fatal error LNK1120: 1 unresolved externals NMAKE : fatal error U1077: '"C:\Program Files\Microsoft Visual Studio 9.0\VC\Bin\link.exe"' : return code '0x460' andris RESOLVED in r42223. wiper CLOSED |
Comment by Andris Mednis [ 2014 Feb 11 ] |
For Zabbix 2.2 available in development branch svn://svn.zabbix.com/branches/dev/ZBX-6731-22 . |
Comment by Andris Zeila [ 2014 Feb 11 ] |
(5) Possible memory leak in process_logrt() - regular expression is compiled but not freed is not freed if directory opening (_wfindfirst() in windows) failed. andris RESOLVED in r42512 (for 2.0) and in r42510 (for v.2.2). wiper CLOSED |
Comment by Andris Mednis [ 2014 Feb 12 ] |
(6) Log file truncation is not handled correctly for log[] and logrt[] items. andris RESOLVED in r42738 (for 2.0) and in r42704 (for v.2.2). wiper CLOSED |
Comment by Andris Zeila [ 2014 Feb 27 ] |
Successfully tested. |
Comment by richlv [ 2014 Mar 01 ] |
(7) definitely whatsnew worthy andris |
Comment by richlv [ 2014 Mar 01 ] |
(8) thanks for a great commit message. among other things it says "This affects also server and proxy" - how exactly, which features are affected ? andris Function zbx_regexp() now remembers the last compiled regexp and flags. If the same regexp is used in the next invocation of zbx_regexp(), then regexp compilation is skipped. It is used in trigger functions "str", "regexp", "iregexp", "logeventid" and in processing of SNMP traps. Avoiding of regexp recompilation turned out to be very efficient in processing of log files (7 times faster in tests), but it will likely be of minor effect in server/proxy. <richlv> thanks for the info. according to the discussion, on the server side there would be one "cache" slot per process. as processes would most likely use different regexps (think one trigger, then another), there would be no noticeable performance change. as for snmp trapper, it would most likely also change between regexps (unless only one regexp would be used at all ), so no changes to perf either. CLOSING |
Comment by richlv [ 2014 Mar 01 ] |
a copy of the commit message, as it was very informative : The main goal of the change is to improve performance (less time, fewer system calls, less CPU usage) of active checks for log[] and logrt[] items. Platform specific methods like GNU/Linux "i-notify" are not used. Please note that this change does not modify limits on maximum number of log file records checked in one check and number of matching records sent to server in one check. Before this change: - new log file records were read record-by-record, reading of each record involved opening, seeking, reading and closing of the log file. - zbx_regexp() compiled a regular expression every time, - for logrt[] items file mask regular expression was compiled for every file in a check. The log files were selected first by regular expression (file mask), then by last mtime. After this change: - new log file records are read into 256 kB buffer in one operation, then a regular expression is applied to every record in the buffer and matching records are sent to Zabbix server. No repetitive file opening, seeking for every record. - zbx_regexp() compiles a regular expresion and rememebers it. This increases performance if the same regular expression is used several times in a row (e.g. for matching log file records). This affects also server and proxy. - for logrt[] items file mask regular expression is compiled only once in a check. The log files are selected first by the last mtime, then by regular expression (file mask) to improve performance. - Zabbix agent detects oversized log file records (longer than 256 kB). Only the first 256 kB are matched against the regular expression and the rest of the record is ignored. - a new Zabbix datatype "zbx_offset_t" and a new function "zbx_lseek()" is added to encapsulate "lseek()" differences between UNIX/GNU/Linux and Microsoft Windows, - Make files for "zabbix_get" and "zabbix_sender" on Microsoft Windows do not include src/libs/zbxcommon/file.c anymore. andris Additionally for v.2.2 and up:
Additionally for v.2.0 and up:
|
Comment by Andris Mednis [ 2014 Mar 03 ] |
Available for Zabbix 2.2 in svn://svn.zabbix.com/branches/dev/ZBX-6731-22 r43135. wiper tested 2.2 version |
Comment by Andris Mednis [ 2014 Mar 05 ] |
Fixed in versions pre-2.0.12 r43102, pre-2.2.3 r43233, pre-2.3.0 r43239. |
Comment by Andris Mednis [ 2014 Mar 27 ] |
Improved documentation at https://www.zabbix.com/documentation/2.2/manual/introduction/whatsnew223?&#daemon_improvements about conditions when log[] and logrt[] items become NOTSUPPORTED. |
Comment by Andris Mednis [ 2014 May 07 ] |
Improved documentation at https://www.zabbix.com/documentation/2.0/manual/introduction/whatsnew2012?&#daemon_improvements about conditions when logrt[] items become NOTSUPPORTED. wiper reviewed. |
Comment by Andris Mednis [ 2014 May 22 ] |
The fix caused a regression |