[#ZBXNEXT-2430] Option to define log monitoring on agent level

[ZBXNEXT-2430] Option to define log monitoring on agent level Created: 2014 Sep 01 Updated: 2016 Jul 27 Resolved: 2016 Jun 27
Status:	Closed
Project:	ZABBIX FEATURE REQUESTS
Component/s:	Agent (G)
Affects Version/s:	2.2.6
Fix Version/s:	3.2.0alpha1

Type:

New Feature Request

Priority:

Major

Reporter:

Marc

Assignee:

Unassigned

Resolution:

Fixed

Votes:

Labels:

log, logmonitoring, logrt, newitemkey, patch, performance

Remaining Estimate:

Not Specified

Time Spent:

Not Specified

Original Estimate:

Not Specified

Attachments:

ZbxNext2430-UseCaseRateLimit.png

ZbxNext2430-ab-2.4.6-2.patch

ZbxNext2430-ab-2.4.6.patch

ZbxNext2430-b-2.4.6.patch

Issue Links:

Duplicate
is duplicated by	~~ZBXNEXT-2830~~	Logfile monitoring COUNT cappability	Closed
is duplicated by	~~ZBXNEXT-2951~~	log items match count for a period of...	Closed
is duplicated by	~~ZBX-7602~~	Invalid pattern (regexp) in log[] and...	Closed

Description

When monitoring a log file one often wants to get an event when a certain condition is met.
Currently every line that should be taken into consideration for generating an event needs to be send to the server first.
This is not only inefficient in terms of data transfer and storage, it may also lead to kill every Zabbix server/database regardless of it's backing.
Limiting the lines per second does not help in every case since even one line per second might be to much and/or it might be necessary to parse tens of thousands lines per second.

What I think of is the possibility to configure a kind of active log item on agent level.
These items might send a possibly configurable value (subexpressions?) when a specific patterns is matched.
This value could then be used to trigger an event based on parsing an enormous amount of data without sending them to the server.

Comments

Comment by richlv [ 2014 Sep 02 ]

how would this differ from regexp parsing on the agent side ?

Comment by Marc [ 2014 Sep 02 ]

~~Think about the need to generate an alarm whenever the substring ERROR is found in a log file.~~
Think about the need to generate one alarm whenever the substring ERROR is found in a log file within a period in time.

The application may log up to 10 thousand lines per second, so to not fall behind reducing maximum lines per second is no option.

Now when an application like this has a serious problem it might log ERROR messages in a similar rate.
Sending them all to the server (and there might be many applications suffering the same problem) just to generate a single event indicating that an error occurred is nonsense. Beside the fact that it may block history syncer (which then affects time-based triggers etc) what leads in a disaster in a Zabbix world.

Being able to catch up with this kind of logs was one of the key moments for Zabbix over the legacy home-brew monitoring solution.
The legacy monitoring agent works completely independent and has to be configured locally - another disadvantage compared to Zabbix.

However, when it comes to log monitoring it beats Zabbix in many ways since it allows to do the work locally and sending only one event when a certain condition is met.
This option is something I'd wish to see in Zabbix too.

I don't think about reinventing a SEC like functionality and putting it into the agent - what would be really cool actually.
What I think of is rather a limited function set (possibly based on .count()) that could be logically interconnected - how this should really look like has to be discussed.

Depending on how this might get implemented it could allow to have one item per log file that covers a massive amount of independent log events in a readable way.
For example the mentioned home-brew agent allows to define a log event by a declaration line followed with a list of include and exclude key words. This could then be repeated arbitrary often per log file.

At the end it should be possible to cover most logging situations without having to send the data to the server, what would be a great improvement to log monitoring in Zabbix.

Comment by Marc [ 2014 Sep 02 ]

Having the possibility to define such agent side logic on server level in a convenient way would be even better.

Comment by richlv [ 2014 Sep 07 ]

sounds like the main problem is lots of ERRORs being logged at a high rate.
i see two options :
a) rate limiting - only send one line per second max, discard others
b) as you mentioned, have something like count() - send the count of matching lines every minute or so, but not the lines themselves
edit:
c) another use case - using the output feature, extracting a substring (like a number), comparing it on the agent level to a threshold and only sending to the server a rate-limited value of 1, or count of matches, or the raw value. this might be a bit too much for the first implementation, though

Comment by Christian Reizlein [ 2015 May 26 ]

that would be awesome feature, both, proposed A and B.
with that you can reduce the heavy load of all log files and just get what you actually care, the count and 1 error only.

Comment by Marc [ 2015 Oct 05 ]

Attached patch ZbxNext2430-b-2.4.6.patch is a quick'n'dirty hack to implement b) counting matches of pattern.
Primarily intended as PoC and not really tested yet!

The patch introduces an optional seventh parameter to influence the log processing type:

log[file,<regexp>,<encoding>,<maxlines>,<mode>,<output>,<proctype>]
proctype - possible values:
standard (default) - standard behavior, count - count matches

Comment by Marc [ 2015 Oct 07 ]

Attached patch ZbxNext2430-ab-2.4.6.patch implements a) rate limiting and b) counting matches of pattern

log[file,<regexp>,<encoding>,<maxlines>,<mode>,<output>,<proctype>]
proctype - possible values:
standard (default) - standard behavior, count - count matches, ratelimit - skip messages beyond <maxlines>

If <maxlines> = 100, then up to 100 messages/lines per second are returned and up to 300 messages/lines per second are skipped each interval. Or in other words 1/4 may be send and 3/4 may be skipped.

Comment by Marc [ 2015 Oct 07 ]

Attached patch ZbxNext2430-ab-2.4.6-2.patch support several rate limits.

proctype - possible values:

standard (default) - standard behavior.
count - count matches.
ratelimit - skip messages beyond <maxlines> until reaching parsing limit (4 x <maxlines>) or EOF.
ratelimit4 - synonym of ratelimit.
ratelimit8 - skip messages until reaching parsing limit of 8 x <maxlines> or EOF.
ratelimit16 - skip messages until reaching parsing limit of 16 x <maxlines> or EOF.

When using ratelimit16 and having <maxlines> = 10, then no more than 10 messages/lines per second are send but up to 150 messages/lines per second are skipped each interval. This should allow to use lower <maxlines> settings but parsing/skipping more log messages than normal.

Edit:
In addition to that a warning is logged to the Zabbix agent's log file when ratelimit has been reached:

144526:20151007:135747.819 Rate limit reached for key 'log[/tmp/zabbix_agentd.log,error,,,,,ratelimit16]'. Skipping data until parsing limit reached or EOF.

By this one can identify that messages are likely not considered due to rate limiting.

Comment by Oleksii Zagorskyi [ 2015 Oct 20 ]

Regarding user friendly naming.
I'd call the suggested item key param "proctype" as for example "logic", because the "proctype" intuitively may be considered as "process type" (thinking about /proc file system) but not "processing type".

Regarding possible values for the param. ratelimit4, ratelimit8, ratelimit16 look like not very easy to understand.
Maybe we could find out some more simple logic/description ?
I'd prefer values which depends on time (in seconds) but not another key parameter (maxlines), at least in description/manual.
Yes, I understand that item update interval is important here.

Comment by Andris Mednis [ 2015 Nov 05 ]

Comments so far mention only enhancements to log[] item. Same for logrt[], too ?

Comment by Marc [ 2015 Nov 05 ]

andris,

Yes, of crouse logrt[] too. I'd even say actually any log alike item. While I personally could easily live without having event log considered yet

Comment by Marc [ 2015 Nov 06 ]

Find below a calculation for the use case of rate limit we made for a real-life scenario.

Comment by Andris Mednis [ 2015 Nov 09 ]

Proposals so far include:

adding a <ratelimitX> parameter to increase the number of parsed lines to
```
ratelimitX * maxlines * refresh_interval
```
instead of current (hardcoded) limit of
```
4 * maxlines * refresh_interval
```

allowing to skip up to

      	(ratelimitX - 1) * maxlines * refresh_interval

lines if log file grows fast.

If something happens and log file starts growing very fast then the first lines may contain a valuable information what happened, the last lines may contain current information, and the middle lines are likely less informative (consequences of root cause from the first lines).
So, instead of spending the whole <maxlines> limit on the first lines, maybe we can spend half of the limit (<maxlines>/2) on the first lines and <maxlines>/2 on the last lines (and skip lines in the middle) to increase chances of getting valuable information ?

Comment by Andris Mednis [ 2015 Nov 09 ]

Another idea - what if we add a parameter which specifies maximum acceptable time delay of logfile parsing ? For example, 30 seconds. Then agent dynamically tunes amount of lines to be skipped to keep up with high intensity log file and a user is not concerned which <ratelimitX> to choose for particular logfile.

Comment by Marc [ 2015 Nov 13 ]

andris,

I presume to [...] spend half of the limit [...] on the first lines and <maxlines>/2 on the last lines [...] we need to write matching lines to a ring buffer of size <maxlines> / 2 after exceeding the initial <maxlines> / 2 matching lines - since we do not know when or whether at all reaching EOF respectively <maxparse>.
Or in short: more compute.

In assumption that a rate limit may likely happen for multiple update intervals subsequently, I don't see a big benefit then. In such cases the first lines of the next update interval are in fact the last lines of the previous update interval.
I see the benefit only in cases where rate limit skips values but <maxparse> is not exceeded respectively EOF is reached.

For the sake of completeness the layman approach I had thought of up to now

Having two additional parameters <logic> and <maxparse>. The former parameter does allow many different log processing methods. Here count and ratelimit. The latter parameter can be useful for the rate limit function as well as log monitoring in general - as it allows to have control over the currently hard coded limitation.

That's to say:

a user has control over maximum lines to return per update interval
and how many lines to parse until aborting.

In addition to that, when having the logic set to <ratelimit>,

the user has control over how many lines to return per update interval
and how many lines to parse until aborting - while skipping all lines between <maxlines> and <maxparse>.

The only downside I see there is the increased count of item parameters and the order of parameters. E.g. I think <maxlines> and <maxparse> should actually be next to each other. On the other hand that's "just" cosmetics.

So I generally see two subjects here:

method of skipping
user interface

In consideration that rate limit is expected to happen for subsequent update intervals, I'd argue it's almost unrelated whether we skip all lines after a limit, split the limit for begin and end, or sending every <n>th line. I believe finally it can't be foreseen where more important information resides.

I believe it's rather important to have a mechanism available to identify that rate limiting took place. It's up to the user then to evaluate whether there was something important skipped.

Your second approach sounds indeed pretty interesting. Can you elaborate on how this could be implemented (from a logical pov)?

Comment by Marc [ 2015 Nov 13 ]

For the record. I would really appreciate to keep the <logic> option resp. to see the count functionality considered too.
First because there has been tickets marked as duplicates and second because I consider an agent level count function as very valuable.

Comment by Andris Mednis [ 2015 Nov 19 ]

Proposal for new log[], logrt[] parameters

1. Add two new parameters - <method> and <maxdelay> - to log[] and logrt[] items:

 log[/path/to/file/file_name,<regexp>,<encoding>,<maxlines>,<mode>,<output>,<method>,<maxdelay>]
 logrt[/path/to/file/regexp_describing_filename_pattern,<regexp>,<encoding>,<maxlines>,<mode>,<output>,<method>,<maxdelay>]

<method> parameter. Type: string. Values:

"lines" - send matching lines to server (default, currently implemented method).
"count" - send count of matching lines to server. New, to be implemented method.

<maxdelay> parameter is maximum acceptable delay in seconds. Type: float. Values:

-1 - standard behaviour, never ignore lines.
>=0 - ignore lines to keep up with fast growing logfile.

2. Currently there are 2 limits on number of logfile lines:

----------------------------------------------------------------------
    /* do not flood Zabbix server if file grows too fast */
    s_count = rate * metric->refresh;      <---- maxlines * refresh_interval
                                           <---- maxlines (MaxLinesPerSecond): min = 1, default = 20, max = 1000
    /* do not flood local system if file grows too fast */
    p_count = 4 * s_count;                 <---- number of processed lines is 4x send-to-server lines
----------------------------------------------------------------------

These limits are not changed.

3. Loop over logfile records as usual until one of "break" conditions is true:

end-of-file reached,
maximum allowed number of send-to-server lines reached (s_count),
maximum allowed number of processed lines reached (p_count).
Collect data: how many bytes were processed finish_size - start_size and "wall-clock" time spent on processing t_proc.

4. If processing finished due to reaching one of limits (s_count or p_count), then calculate:

processing speed (bytes per second) as v = (finish_size - start_size) / t_proc .
current delay (seconds) as t_del = (size - finish_size) / v .

5. If 0 < t_del <= maxdelay , then delay is acceptable, wait the next check, maybe it can catch up next time.

6. If t_del > maxdelay , then ignore lines by "jumping" over them to position pos = size - v * maxdelay .
Most likely we will "land" somewhere in the middle of line - search end of the line, do not analyze it.
Note that we do not even read ignored lines into buffer but calculate approximate position to jump to in a file.
Now we have stopped approximately maxdelay processing seconds away from end-of-file and wait next check.

This is a first draft. Waiting for comments.

Comment by Andris Mednis [ 2015 Nov 24 ]

New ideas:
1. Add only one new parameter - <maxdelay> - to log[] and logrt[] items:

 log[/path/to/file/file_name,<regexp>,<encoding>,<maxlines>,<mode>,<output>,<maxdelay>]
 logrt[/path/to/file/regexp_describing_filename_pattern,<regexp>,<encoding>,<maxlines>,<mode>,<output>,<maxdelay>]

(see description of <maxdelay> in previous comment)

2. Implement "count" functionality as two new items:

 log.count[file,<regexp>,<encoding>,<maxproclines>,<mode>,<maxdelay>]
 logrt.count[/path/to/file/regexp_describing_filename_pattern,<regexp>,<encoding>,<maxproclines>,<mode>,<maxdelay>]

where <maxproclines> - max number of lines per second to be analyzed.

Comment by Marc [ 2015 Nov 24 ]

richlv,

I was thinking of another option "de-duplication" for quite some time and believe it applies to what you've described for d) c).
That's to say, return only unique lines/values per update interval. Variable text could be excluded by using the output parameter.

E.g. log messages like these:

   921:20151124:174529.693 item "host1:user.param[inflexible]" became not supported: Unsupported item key.
   921:20151124:174530.187 item "host1:user.param[inflexible,too]" became not supported: Unsupported item key.
   921:20151124:174531.624 item "host2:user.param[inflexible]" became not supported: Unsupported item key.
   921:20151124:174532.597 item "host3:user.param[flexible]" became not supported: Not supported by Zabbix Agent.

processed in one interval could result in:

Item became not supported because: Unsupported item key.
Item became not supported because: Not supported by Zabbix Agent.

by an item like:

log.dedup[file, "^.{27}item \".*\" (became not supported): (.*)",,,, "Item \1 because: \2"]

Otherwise we have an e) d) now

Comment by richlv [ 2015 Nov 25 ]

okkuv9xh, thanks for that usecase - i was just talking to andris yesterday and trying to think of other needs in this area.
not sure what you refer to as "d)", but your description is very clear and we'll see how that could be included.

Comment by Marc [ 2015 Nov 25 ]

richlv,

Sorry! I was actually referring to c) mentioned in this comment.

c) another use case - using the output feature, extracting a substring (like a number), comparing it on the agent level to a threshold and only sending to the server a rate-limited value of 1, or count of matches, or the raw value.

Comment by Andris Mednis [ 2016 Apr 25 ]

okkuv9xh, one more decision you might be interested to comment on - what to do if log.count[] cannot send result to server. Current decision is added to specs as follows:
"Similarly to log[] and logrt[], the new items log.count[] and logrt.count[] will not advance current position in log file in case of error when sending result to server until communication is restored, unless maxdelay > 0 is specified. Note that this can affect log[] and logrt[] results: for example, one check founds 100 matching lines in a log file, but due to a communication problem this result cannot be sent to server. In the next check the agent counts the same 100 matching lines and also 70 new matching lines and communication is restored. The agent now sends count = 170 as if they were found in one check."

Comment by Aleksandrs Saveljevs [ 2016 May 02 ]

Andris, we may wish to consider fixing ZBXNEXT-1842 together with this issue.

andris ZBXNEXT-1842 seems to be about sending more metadata to server and displaying them in frontend. It was decided to not include ZBXNEXT-1842 into current solution.

Comment by Andris Mednis [ 2016 Jun 02 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-2430-2 .

(Frontend part is not yet implemented but it is one help screen modification away. New parameter 'maxdelay' and new items log.count[] and logrt.count[] are usable in frontend).

Comment by vitalijs.cemeris (Inactive) [ 2016 Jun 02 ]

Added log.count and logrt.count keys to item helper
Added new parameter "maxdelay" to log[] and logrt[]
in r60489

Comment by vitalijs.cemeris (Inactive) [ 2016 Jun 02 ]

(1) Translation strings added:

Number of matching lines since the last check of the item. Returns integer
Number of matching lines since the last check of the item with log rotation support. Returns integer

andris Thanks! Reviewed.

gunarspujats CLOSED

Comment by Andris Mednis [ 2016 Jun 03 ]

Documented in

martins-v Also:

https://www.zabbix.com/documentation/3.2/manual/introduction/whatsnew320#coping_with_fast-growing_log_files

sasha CLOSED

Comment by Andris Mednis [ 2016 Jun 09 ]

(2) In function adjust_position_after_jump():

        /* Searching forward did not find a newline. Now search backwards until 'min_size'. */

        lastlogsize_tmp = lastlogsize_aligned;
        seek_pos = lastlogsize_aligned;

Value of 'lastlogsize_tmp' is never read after assignment.

andris RESOLVED in r60590.

sandis.neilands CLOSED.

Comment by Sandis Neilands (Inactive) [ 2016 Jun 09 ]

(3) Preferably we should avoid reusing function names even for static functions. This complicates full text search and code analysis. We have two instances of destroy_logfile_list().

andris RESOLVED in r60592.

sandis.neilands CLOSED.

Comment by Sandis Neilands (Inactive) [ 2016 Jun 09 ]

(4) In process_log_check() when maxdelay validation fails we print "Invalid seventh parameter". For *.count items it is the sixth parameter.

andris Thanks! RESOLVED in r60595.

sandis.neilands CLOSED.

Comment by Sandis Neilands (Inactive) [ 2016 Jun 10 ]

(5)

Setup

Item key:

log[/tmp/testlog.log,,,1,,,1]

Item update interval: 1 second.

Scenario

1. Add few log lines (ending with newline) to /tmp/testlog.log.

line 1
line 1
line 1

2. Add log line (without newline) to /tmp/testlog.log.

line 2222222222222

3. Extend the previous lines with additional '2's (still, don't add newline).

line 222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222222

Expected

Zabbix server gets either the whole 'line 2...' or no part of it.

Actual

Zabbix server gets the value '2222....'

sandis.neilands Furthermore, consider this scenario. Add "line 1" (no newline). Add "234" to "line 1" (still no newline).

The server will have the following records (2 is missing).

line 1
24

sandis.neilands CLOSED. Turns out that text editors insert newline implicitly. After retesting using hex editor the problem was not reproduced.

Comment by Sandis Neilands (Inactive) [ 2016 Jun 10 ]

(6) Currently the occurrence of jumps is signaled out-of-band via agent's log. This could be problematic since the end-user must now correlate agent's log with the values stored in Zabbix. The user might not even have access to agent's log. Perhaps we should include an in-band signalling as well.

For log and logrt it could be additional log entry (it should be unique enough for the users to be able to set triggers on it).

For *.count - if we want it to still be supported in this case - return the negative value of the counted lines. This way the amount of lines read is still known but also it would indicate that some data was skipped. For triggers, calculated items. - use abs().

sandis.neilands After discussing with andris and alexei it was decided to stay with the current design for now. CLOSED.

Comment by Marc [ 2016 Jun 10 ]

sandis.neilands,
I see your point, but wonder whether it is really reasonable to falsify item values to put information in there that does actually not belong there.
Instead I'd opt for a new/additional information channel(s) (i think something like that was desired in context of trend prediction as well).

Edit:
E.g., ~~ZBXNEXT-103~~

Edit:
Originally I had ZBXNEXT-2972 in mind but it was in context of trigger functions not items.

Edit:
Ha, in fact ZBXNEXT-1842

Comment by richlv [ 2016 Jun 10 ]

putting related but different info in log values hopefully won't be done. on the other hand, one might allow creating companion item. if the item exists, such info is put in it. if not, it is ignored.
that way we would not poison log info, would still get this info, and if some items don't need it, we don't waste extra space

Comment by Gunars Pujats (Inactive) [ 2016 Jun 14 ]

(7) [F] Frontend changes

1. Add new parameter - <maxdelay> - to log[] and logrt[] items.
2. Add two new items: log.count[] and logrt.count[]

vitalijs.cemeris RESOLVED in r60489

gunarspujats CLOSED

Comment by Gunars Pujats (Inactive) [ 2016 Jun 14 ]

(8) [F] include/classes/items/CHelpItems.php:366 - missing function closing bracket,

therefore creating new log.count[] item frontend produces validation error:

Invalid key "log.count[file,<regexp>,<encoding>,<maxproclines>,<mode>,<maxdelay>" for item "log_count" on "rs": unexpected end of key. 
[items.php:644 → CFrontendApiWrapper->update() → CApiWrapper->__call() → CFrontendApiWrapper->callMethod() → CApiWrapper->callMethod() → 
CFrontendApiWrapper->callClientMethod() → CLocalApiClient->callMethod() → call_user_func_array() → CItem->update() → CItemGeneral->checkInput() → 
CApiService::exception() in include/classes/api/services/CItemGeneral.php:325]

vitalijs.cemeris RESOLVED in r60629

gunarspujats CLOSED

Comment by Gunars Pujats (Inactive) [ 2016 Jun 14 ]

(9) [F] include/classes/items/CHelpItems.php:370 - item key logrt.count[] is missing first parameter file_regexp

vitalijs.cemeris RESOLVED in r60629

gunarspujats CLOSED

Comment by Gunars Pujats (Inactive) [ 2016 Jun 14 ]

[F] Frontend tested.

Comment by Andris Mednis [ 2016 Jun 14 ]

Fixed in version:

pre-3.1.0 (trunk) r60632.

Comment by Andris Mednis [ 2016 Jun 15 ]

(10) Coverity detected a memory leak:

** CID 149415:  Resource leaks  (RESOURCE_LEAK)
/src/zabbix_agent/active.c: 1255 in process_log_check()
__________________________________________________________
*** CID 149415:  Resource leaks  (RESOURCE_LEAK)
/src/zabbix_agent/active.c: 1255 in process_log_check()
1249     	}
1250     out:
1251     	zbx_free(encoding_uc);
1252     
1253     	free_request(&request);
1254     

>>>    CID 149415:  Resource leaks  (RESOURCE_LEAK)
>>>    Variable "logfiles_new" going out of scope leaks the storage it points to.

1255     	return ret;
1256     }
1257     
1258     static int	process_eventlog_check(char *server, unsigned short port, ZBX_ACTIVE_METRIC *metric,
1259     		zbx_uint64_t *lastlogsize_sent, char **error)
1260     {

andris RESOLVED in new dev branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-2430 r60649 and r60653.

sandis.neilands CLOSED.

andris Fixed in version:

pre-3.1.0 (trunk) r60654.

Comment by Sandis Neilands (Inactive) [ 2016 Jun 16 ]

Successfully tested.

Comment by Sandis Neilands (Inactive) [ 2016 Jun 17 ]

(11) Tonight's Coverity scan didn't close the issue. There is a memory leak of logfiles_new in another branch. CID 149415 is still open.

Scenario:

configure item logrt.count item with regexp that fails to compile (other possible causes of the leak - failing to close the file descriptor, seek or stat problems)
```
logrt.count["/tmp/zabbix_agentd.log(.[0-9]+)*",(,,,,]	
```

Output from Valgrind:

==6224== 16,965 (16,896 direct, 69 indirect) bytes in 3 blocks are definitely lost in loss record 232 of 233
==6224==    at 0x4C2AB80: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6224==    by 0x4C2CF1F: realloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==6224==    by 0x44DB0F: zbx_realloc2 (misc.c:499)
==6224==    by 0x41961E: add_logfile (logfiles.c:1059)
==6224==    by 0x419F04: pick_logfile (logfiles.c:1196)
==6224==    by 0x419D9C: pick_logfiles (logfiles.c:1292)
==6224==    by 0x415914: make_logfile_list (logfiles.c:1391)
==6224==    by 0x41480A: process_logrt (logfiles.c:2374)
==6224==    by 0x40FA09: process_log_check (active.c:1177)
==6224==    by 0x40EABC: process_active_checks (active.c:1648)
==6224==    by 0x40D739: active_checks_thread (active.c:1762)
==6224==    by 0x443010: zbx_thread_start (threads.c:128)

BTW, the path that Coverity is showing in process_logrt() doesn't lead to the problem - ret must not be SUCCESS, e.g. process_log() --> zbx_read2() must fail.

As to why these problems almost impossible to find analytically: the path through the code to get to this problem contains more than 100 decisions spanning multiple levels of functions. Humans can't follow so many possibilities.

andris RESOLVED in new dev branch svn://svn.zabbix.com/branches/dev/ZBXNEXT-2430 r60681.

sandis.neilands CLOSED.

andris Fixed in version:

pre-3.1.0 (trunk) r60685.

Generated at Wed Jul 02 04:47:16 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.

[ZBXNEXT-2430] Option to define log monitoring on agent level Created: 2014 Sep 01 Updated: 2016 Jul 27 Resolved: 2016 Jun 27