[ZBX-15010] Regex in "log" item causes "cannot compile regular expression" error Created: 2018 Oct 15  Updated: 2024 Apr 10  Resolved: 2018 Nov 12

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 3.4.14, 4.0.0, 4.0.1rc2, 4.2.0alpha1
Fix Version/s: 4.0.2rc1, 4.2.0alpha1, 4.2 (plan)

Type: Problem report Priority: Major
Reporter: beatcracker Assignee: Vjaceslavs Bogdanovs
Resolution: Fixed Votes: 0
Labels: agent, items
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2018-10-15-15-21-12-099.png    
Issue Links:
Causes
caused by ZBX-13730 High CPU / Faliure to Match REGEX Closed
Team: Team C
Team: Team C
Sprint: Sprint 45, Sprint 46, Nov 2018
Story Points: 0.125

 Description   

Steps to reproduce:

  1. Create log item with this regex: 
    (\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+\w+\s+)Error:.+((\r\n|\r|\n)\1{1}.+){0,}
  1. Wait for the item to become unsupported with "cannot compile regular expression" error

Result:

Item configuration:

Agent log:

11880:20181015:120836.677 In process_log() filename:'d:\fake-log\fake.log' lastlogsize:0 mtime:0
 11880:20181015:120836.677 convert_to_utf8() in_size:212 encoding:'UTF-16' codepage:1200
 11880:20181015:120836.678 End of process_log() filename:'d:\fake-log\fake.log' lastlogsize:0 mtime:0 ret:FAIL processed_bytes:0
 11880:20181015:120836.678 End of process_logrt():FAIL
 11880:20181015:120836.678 active check "log["d:\fake-log\fake.log","(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+\w+\s+)Error:.+((\r\n|\r|\n)\1{1}.+){0,}","UTF-16"]" is not supported: cannot compile regular expression
 11880:20181015:120836.679 In process_value() key:'S001WDSD001:log["d:\fake-log\fake.log","(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+\w+\s+)Error:.+((\r\n|\r|\n)\1{1}.+){0,}","UTF-16"]' value:'cannot compile regular expression'

Expected:

Log item should be able to use regex and parse log file.

Notes

This regex is interned to parse multi-line log messages and it's works in Zabbix regex tester (Administration -> General -> Regular Expressions) and pcre2test tool.



 Comments   
Comment by beatcracker [ 2018 Oct 15 ]

The agent version is: Zabbix 4.0.0 (revision 85308) (Windows)

Comment by Andris Mednis [ 2018 Oct 15 ]

Agent looks at one log line at a time. I don't think it can do multi-line.

Comment by beatcracker [ 2018 Oct 15 ]

ZBXNEXT-968 has some mentions of multi-line regex in web scenarios context. I don't know about agent though, could be wishful thinking.

Comment by beatcracker [ 2018 Oct 15 ]

Well, I've found ZBXNEXT-1428, so probably no multi-line for me. Should compile anyway, though...

Comment by Glebs Ivanovskis [ 2018 Oct 16 ]

Seriously, why does this pattern result in error? beatcracker, what is your PCRE library version?

Comment by beatcracker [ 2018 Oct 17 ]

@cyclone , I've no idea, this is Windows agent. I've tested v3.4.6 and the latest 4.0 from official download page.

Comment by Edgars Melveris [ 2018 Oct 22 ]

I can confirm, that the expression does not compile on linux agent also.

pcre version on my instance:

[root@localhost ~]# rpm -qa | grep pcre
pcre-8.32-17.el7.x86_64

Also, i have tried to test what causes the compilation problem and it's this

\1

so this doesn't compile:

(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+\w+\s+)Error:.+((\r\n|\r|\n)\1{1}.+){0,}

and this does:

(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+\w+\s+)Error:.+((\r\n|\r|\n){1}.+){0,}

Should this be fixed?

Comment by Glebs Ivanovskis [ 2018 Oct 22 ]

I believe this could be the issue with PCRE:

Version 8.38 23-November-2015
-----------------------------

1.  If a group that contained a recursive back reference also contained a
    forward reference subroutine call followed by a non-forward-reference
    subroutine call, for example /.((?2)(?R)\1)()/, pcre_compile() failed to
    compile correct code, leading to undefined behaviour or an internally
    detected error. This bug was discovered by the LLVM fuzzer.

2.  Quantification of certain items (e.g. atomic back references) could cause
    incorrect code to be compiled when recursive forward references were
    involved. For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/.
    This bug was discovered by the LLVM fuzzer.

Seems to work (I mean compile) on my Ubuntu 16.04:

$ apt list | grep libpcre3

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

libpcre3/xenial,now 2:8.38-3.1 amd64 [installed]
libpcre3-dbg/xenial 2:8.38-3.1 amd64
libpcre3-dev/xenial,now 2:8.38-3.1 amd64 [installed,automatic]
libpcre32-3/xenial,now 2:8.38-3.1 amd64 [installed,automatic]
$ zabbix_agentd -t 'vfs.file.regexp[/tmp/trunk/ChangeLog,"(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+\w+\s+)Error:.+((\r\n|\r|\n)\1{1}.+){0,}"]'
vfs.file.regexp[/tmp/trunk/ChangeLog,"(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+\w+\s+)Error:.+((\r\n|\r|\n)\1{1}.+){0,}"] [s|]

I think that pre-compiled Windows agents could use a newer version of PCRE library. Version 8.38 (which solves this particular problem) is almost three years old.

Comment by Viktors Tjarve [ 2018 Oct 26 ]

In version 3.4 significant changes where done to how regex's are processed. Changes where introduced in ZBX-13730 (r83296) and that is the moment when it is changed how regex's are compiled.

(\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+\w+\s+)Error:.+((\r\n|\r|\n)\1{1}.+){0,}

Regex above contains number of parenthesized sub-expressions delimited by backslash. Earlier function regcomp() with flag REG_NOSUB would return 0 - successful completion. Now pcre_compile() with somewhat similar flag PCRE_NO_AUTO_CAPTURE returns NULL which represents failed compilation of the same regex.

Comment by Glebs Ivanovskis [ 2018 Oct 26 ]

Dear viktors.tjarve, I'm not sure you're pointing to the root cause. I don't think that flags are to blame, because according to pcreposix man page, REG_NOSUB will set PCRE_NO_AUTO_CAPTURE flag:

REG_NOSUB

The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed for compilation to the native function. In addition, when a pattern that is compiled with this flag is passed to regexec() for matching, the nmatch and pmatch arguments are ignored, and no captured strings are returned.

So the flags are basically the same before and after ZBX-13730. In addition, look at my test above, where the regexp seems to compile fine with trunk... Or isn't vfs.file.regexp reporting compilation errors?

Comment by Viktors Tjarve [ 2018 Oct 26 ]

Hi Gleb,

regex compilation using pcre_compile() results with error: reference to non-existent subpattern
I've added some additional log entries (line 2,3,4,5 in the first example) and I get this:

  8508:20181026:154555.503 In process_log() filename:'/tmp/trunk/ChangeLog2' lastlogsize:321928 mtime:0
  8508:20181026:154555.503 In regexp_sub(), string: 2018-10-26 15:44:57 Error: 10, pattern: (\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+\w+\s+)Error:.+((\r\n|\r|\n)\1{1}.+){0,}
  8508:20181026:154555.503 CRE_NO_AUTO_CAPTURE is defined
  8508:20181026:154555.503 flags: 4098
  8508:20181026:154555.503 End of regexp_sub(), returning FAIL, error: reference to non-existent subpattern
  8508:20181026:154555.503 End of process_log() filename:'/tmp/trunk/ChangeLog2' lastlogsize:321928 mtime:0 ret:FAIL processed_bytes:0
  8508:20181026:154555.503 End of process_logrt():FAIL
  8508:20181026:154555.503 suppressing log(rt)(.count) processing error #1: cannot compile regular expression

Now I change the code just to not set the PCRE_NO_AUTO_CAPTURE flag. Everything else stays the same and I get:

  8914:20181026:154845.653 In process_log() filename:'/tmp/trunk/ChangeLog2' lastlogsize:321928 mtime:0
  8914:20181026:154845.653 In regexp_sub(), string: 2018-10-26 15:44:57 Error: 10, pattern: (\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+\w+\s+)Error:.+((\r\n|\r|\n)\1{1}.+){0,}
  8914:20181026:154845.653 flags: 2
  8914:20181026:154845.653 End of regexp_sub(), SUCCEED final, out: (null)
  8914:20181026:154845.653 In regexp_sub(), string: 2018-10-26 15:47:57 Error: 11, pattern: (\d{4}-\d{2}-\d{2}\s+\d{2}:\d{2}:\d{2}\.\d{2}\s+\w+\s+)Error:.+((\r\n|\r|\n)\1{1}.+){0,}
  8914:20181026:154845.653 End of regexp_sub(), SUCCEED final, out: (null)
  8914:20181026:154845.653 End of process_log() filename:'/tmp/trunk/ChangeLog2' lastlogsize:321988 mtime:0 ret:SUCCEED processed_bytes:60
  8914:20181026:154845.653 End of process_logrt():SUCCEED

It seems to me that PCRE_NO_AUTO_CAPTURE for pcre_compile() does not act the same way as REG_NOSUB for regcomp().

The thing with zabbix_agentd -t... not failing or giving any errors is that during log processing first two errors are suppress (for some reason that I haven't understood yet).

Comment by Glebs Ivanovskis [ 2018 Oct 26 ]

Thank you for the explanation! The situation looks trickier than I have thought...

Comment by Vjaceslavs Bogdanovs [ 2018 Nov 08 ]

Available in:
4.0.2rc1 r86659
4.2.0alpha1 (trunk) r86660

Comment by Glebs Ivanovskis [ 2018 Nov 08 ]

Hm...

flags ^= PCRE_NO_AUTO_CAPTURE;

There is no difference between ^= and |= in this case thanks to

if (0 != (flags & PCRE_NO_AUTO_CAPTURE))

but |= seems to be a more conventional way to set a flag.

According to docs

a backslash followed by a digit greater than 0 (and possibly further digits) is a back reference

so <= in

'0' <= *offset

is incorrect.

Just curiosity... Have you tested if named back references work without PCRE_NO_AUTO_CAPTURE flag being set?

Comment by Vjaceslavs Bogdanovs [ 2018 Nov 08 ]

Thank you for your submission, I added changes of digit greater than zero as you suggested. As for the XOR operator, flag should be removed and not set.

As for the curiosity, yes, it was checked and worked like a charm even with PCRE_NO_AUTO_CAPTURE flag set.

Comment by Glebs Ivanovskis [ 2018 Nov 08 ]

Ah, right... Sorry about misleading suggestions. But this just emphasizes that it's a bit difficult to understand the intended meaning of XOR. Idiomatic way to unset the flag would be

flags &= ~PCRE_NO_AUTO_CAPTURE;

Oh, I've understood where my confusion comes from. The comment says:

then reset PCRE_NO_AUTO_CAPTURE bit

I was reading it as "re-set".

Comment by Andris Mednis [ 2018 Nov 09 ]

Maybe it is better to change comment to "...set PCRE_NO_AUTO_CAPTURE bit to 0." ?
("reset ... bit" could be more popular in electronics).

Generated at Sat Apr 20 11:20:31 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.