[ZBX-8512] web.page.regexp not parsing whole output of page Created: 2014 Jul 22  Updated: 2017 May 30  Resolved: 2014 Aug 01

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G)
Affects Version/s: 2.2.5
Fix Version/s: 2.2.6rc1, 2.3.3

Type: Incident report Priority: Critical
Reporter: Thorsten Kohlhepp Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: regexps, webcheck
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File web_page_regexp.patch    

 Description   

The web.page.regexp is not parsing the whole output for the regular expression. It only checks the first line for the regular expression. This happened when updating to 2.2.5
-%code%-
zabbix_get -s x10468.rz2012.adm.denic.de -k web.page.regexp[10.121.50.145,business-de/public/domain/denic.de,8811,".*"]
HTTP/1.1 200 OK
-%code%-

correct output would be
-%code%-
zabbix_get -s x10468.rz2012.adm.denic.de -k web.page.get[10.121.50.145,business-de/public/domain/denic.de,8811]
HTTP/1.1 200 OK
Server: Apache-Coyote/1.1
X-Request-UUID: AUdevdge-SmXh-info.de.rz.1
Content-Location: http://10.121.50.145/business-de/public/domain/denic.de
Content-Type: application/xml
Content-Length: 713
Date: Tue, 22 Jul 2014 15:43:25 GMT
Connection: close

<domain-registration domain="denic.de"><status>CONNECT</status><changed>2013-03-01T10:56:26+01:00</changed><detailed-contacts-in-role href="./denic.de/detailed-contacts-in-role"></detailed-contacts-in-role><tech-c><contact href="../contact/tech/DENIC-1000006-DBS"></contact></tech-c><zone-c><contact href="../contact/zone/DENIC-1000006-DBS"></contact></zone-c><nameservers><nameserver owner="denic.de." host="ns1.denic.de."><glue>2a02:568:121:6:2:0:0:2</glue><glue>81.91.170.1</glue></nameserver><nameserver owner="denic.de." host="ns2.denic.de."><glue>193.171.255.36</glue></nameserver><nameserver owner="denic.de." host="ns3.denic.de."><glue>87.233.175.19</glue></nameserver></nameservers></domain-registration>
-%code%-



 Comments   
Comment by richlv [ 2014 Jul 22 ]

zabbix returns only the matched line and does not match across line breaks - looks like working as intended to me.
did you observe different behaviour with other versions ?

Comment by Thorsten Kohlhepp [ 2014 Jul 22 ]

in 2.2.4 it has been different. The output above has never been changed. In 2.2.4 I used the regexp "status.*CONNECT.*status" and I got the matched string. In 2.2.5 I get nothing.

Comment by Thorsten Kohlhepp [ 2014 Jul 23 ]

Is there any way to match the regex across line breaks?

Comment by richlv [ 2014 Jul 28 ]

i tested with current trunk, 2.2.5 and 2.2.4 - behaviour was exactly the same.
are you sure you did not mix it up with web.page.get[] key ?

Comment by Andras Fabian [ 2014 Jul 29 ]

I see a similar behavior with some of our web.page.regexp checks, where a previously working check now doesn't sends back any results (even though, by manually checking the page content everything is fine ... and nothing changed).

We have lately updated from 2.2.3 to 2.2.5 and exactly since the update, the checks are broken.

Now I have quickly compared the source code for WEB_PAGE_REGEXP and I see substantial change in there! I didn't completely understand the new code, but it seems to go for a a multi line parsing approach.

In 2.2.3, there was:

 
	if (SYSINFO_RET_OK == get_http_page(hostname, path, port_number, buffer, ZBX_MAX_WEBPAGE_SIZE))
		ptr = zbx_regexp_sub(buffer, regexp, output);

Whereas in 2.2.5 we now have this:

 
	if (SYSINFO_RET_OK == get_http_page(hostname, path, port_number, buffer, ZBX_MAX_WEBPAGE_SIZE))
	{
		for (s = buffer, p = s; '\0' != *s; s++)
		{
			if ('\n' == *s)
			{
				if (s > p && '\r' == *(s - 1))
					*(s - 1) = '\0';
				else
					*s = '\0';

				if (NULL != (ptr = zbx_regexp_sub(p, regexp, output)))
					break;

				p = s + 1;
			}
		}
	}

I would dare to say, that quite likely this upgrade did break something in the behavior of web.page.regexp checks!

Comment by richlv [ 2014 Jul 29 ]

changes in ZBX-8248 might be related

Comment by Alexander Vladishev [ 2014 Jul 29 ]

We cannot reproduce the issue. 2.2.3, 2.2.4 and 2.2.5 are working equally.

For example:

$ sbin/zabbix_agentd -t agent.version
agent.version                                 [s|2.2.3]
$ sbin/zabbix_agentd -t web.page.regexp[www.zabbix.com,/,,"\<li\>.*product.*\<li\>"]
web.page.regexp[www.zabbix.com,/,,\<li\>.*product.*\<li\>] [s|li><a class="drop highlight" href="product.php">Product Overview</a></li]

$ sbin/zabbix_agentd -t agent.version
agent.version                                 [s|2.2.4]
sbin/zabbix_agentd -t web.page.regexp[www.zabbix.com,/,,"\<li\>.*product.*\<li\>"]
web.page.regexp[www.zabbix.com,/,,\<li\>.*product.*\<li\>] [s|li><a class="drop highlight" href="product.php">Product Overview</a></li]

$ sbin/zabbix_agentd -t agent.version
agent.version                                 [s|2.2.5]
$ sbin/zabbix_agentd -t web.page.regexp[www.zabbix.com,/,,"\<li\>.*product.*\<li\>"]
web.page.regexp[www.zabbix.com,/,,\<li\>.*product.*\<li\>] [s|li><a class="drop highlight" href="product.php">Product Overview</a></li]

Please attach outputs from your environment.

Thank you.

Comment by Andras Fabian [ 2014 Jul 29 ]

Well, I will try ... I did a wget at the site which I want to check, and which doesn't work.

So, the checking key is this (as you can se, we just look for "ALLES OK"):

web.page.regexp[{$HOSTNAME},/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,{$PORT},"ALLES OK"]

And the data I get with WGET is this (but just pasting it in, might destroy some "invisible" characters - but I can tell, that the entire string has no line breaks etc.):

<html><body><table border="1"><tr><th><font color="green">ALLES OK</font></th></tr><tr><td>&nbsp</td></tr><tr><td><font color=#0000FF><b>JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob</b></font></td></tr><tr><td><b>MESSAGE:</b>Service net.atrada.mops.frontends.jobs.ArtistTagJob wurde ausgefuehrt in 74 ms<br><b>START-DATUM:</b>2014-07-29 16:00:00.203<br><b>STOP-DATUM:</b>2014-07-29 16:00:00.277<br><b>MAX. LAUFEN BIS:</b>Fri Jul 26 16:00:00 CEST 2024<br><b>NÄCHSTE AUSFÜHRUNGS-UHRZEIT:</b>16:15<br><b>IST NÄCHSTE AUSFÜHRUNGS-UHRZEIT IN TIME-SPAN:</b>false<br><b>ENABLED_ON_INSTANCES:</b>nbg-webdemo03<br><b>LAST_EXEC_ON_INSTANCE:</b>nbg-webdemo03<br><b>LAST_CHANGE_SERVICESTATUS:</b>2014-07-29 16:00:00.327<br><b>NEXT-EXCECUTION-DATUM:</b>2014-07-29 16:15:00.0<br><b>IS SERVICE ACTIVE:</b><font color="green">JA</font><br><b>LETZTER JOB-STATUS:</b><font color="green">OK</font><br><font color="green"><b>GESAMTSTATUS:KEINE FEHLER<br></b></td></tr></table><!-- <Seitenende> --></body></html>

And even though we all can see, that it contains "ALLES OK" ... the check doesn't matches it.

Comment by richlv [ 2014 Jul 29 ]

and what's the item key you tested ?
did you try it with zabbix_get ?

Comment by Alexander Vladishev [ 2014 Jul 30 ]

With your example it works fine on our environment.

$ sbin/zabbix_agentd -t agent.version
agent.version                                 [s|2.2.5]

$ sbin/zabbix_agentd -t 'web.page.regexp[localhost,zbx-8512.html,,ALLES OK]'
web.page.regexp[localhost,zbx-8512.html,,ALLES OK] [s|ALLES OK]

$ sbin/zabbix_agentd -t 'web.page.get[localhost,zbx-8512.html]'
web.page.get[localhost,zbx-8512.html]  [t|HTTP/1.1 200 OK
Date: Wed, 30 Jul 2014 06:57:45 GMT
Server: Apache/2.2.22 (Ubuntu)
Last-Modified: Wed, 30 Jul 2014 06:49:23 GMT
ETag: "4c9b3f-3ed-4ff638e70a2b9"
Accept-Ranges: bytes
Content-Length: 1005
Vary: Accept-Encoding
Connection: close
Content-Type: text/html

<html><body><table border="1"><tr><th><font color="green">ALLES OK</font></th></tr><tr><td>&nbsp</td></tr><tr><td><font color=#0000FF><b>JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob</b></font></td></tr><tr><td><b>MESSAGE:</b>Service net.atrada.mops.frontends.jobs.ArtistTagJob wurde ausgefuehrt in 74 ms<br><b>START-DATUM:</b>2014-07-29 16:00:00.203<br><b>STOP-DATUM:</b>2014-07-29 16:00:00.277<br><b>MAX. LAUFEN BIS:</b>Fri Jul 26 16:00:00 CEST 2024<br><b>NÄCHSTE AUSFÜHRUNGS-UHRZEIT:</b>16:15<br><b>IST NÄCHSTE AUSFÜHRUNGS-UHRZEIT IN TIME-SPAN:</b>false<br><b>ENABLED_ON_INSTANCES:</b>nbg-webdemo03<br><b>LAST_EXEC_ON_INSTANCE:</b>nbg-webdemo03<br><b>LAST_CHANGE_SERVICESTATUS:</b>2014-07-29 16:00:00.327<br><b>NEXT-EXCECUTION-DATUM:</b>2014-07-29 16:15:00.0<br><b>IS SERVICE ACTIVE:</b><font color="green">JA</font><br><b>LETZTER JOB-STATUS:</b><font color="green">OK</font><br><font color="green"><b>GESAMTSTATUS:KEINE FEHLER<br></b></td></tr></table><!-- <Seitenende> --></body></html>]

Please attach an output of this commands.

sbin/zabbix_agentd -t 'agent.version'

sbin/zabbix_agentd -t 'web.page.get[<hostname>,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,<port>]'

sbin/zabbix_agentd -t 'web.page.regexp[<hostname>,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,<port>,ALLES OK]'

<hostname> and <port> should be replaced with correct values

If you want to retrieve whole line you can try to extend the regular expression.

For example:

"^.*ALLES OK.*$"
Comment by Andras Fabian [ 2014 Jul 30 ]

Hi guys,

Here I am back with some details you have requested:

  • /var/software/zabbix-agent/sbin/zabbix_agentd -t 'agent.version'
    agent.version                                 [s|2.2.5]
    
  • /var/software/zabbix-agent/sbin/zabbix_agentd -t 'web.page.get[*****************,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80]'
    web.page.get[******************,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80] [t|HTTP/1.1 200 OK
    Server: Apache-Coyote/1.1
    Expires: Thu, 01 Jan 1970 00:00:00 GMT
    Cache-Control: no-cache
    pragma: no-cache
    P3P: CP='NOI NAV'
    Set-Cookie: atrada.test.cookie=******************%23%24; Domain=******************; Path=/
    Content-Type: text/html;charset=UTF-8
    X-Cacheable: YES
    Content-Length: 126
    Accept-Ranges: bytes
    Date: Wed, 30 Jul 2014 07:04:08 GMT
    X-Varnish: 437894188 437894187
    Via: 1.1 varnish
    Connection: close
    X-Age: 71
    
    <html><body><table border="1"><tr><th><font color="green">ALLES OK</font></th></tr></table><!-- <Seitenende> --></body></html>]
    
  • /var/software/zabbix-agent/sbin/zabbix_agentd -t 'web.page.regexp[******************,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80,ALLES OK]'
    web.page.regexp[******************,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80,ALLES OK] [s|]
    

As you can see in the last example, the returned String is empty!

And no, I do not need to have the whole line returned (we really only need to see if a given string is present in a web page).

Comment by Alexander Vladishev [ 2014 Jul 30 ]

Yes, I see.

This problem is reproducible only with 2.2.5?

Can you attach an output of these commands with agent version 2.2.4 or 2.2.3.

Comment by Andras Fabian [ 2014 Jul 30 ]

OK, here we go. I quickly compiled zabbix_agent 2.2.3 and 2.2.4

  • 2.2.3
    • zabbix_agent/zabbix_agentd -t 'agent.version'
      agent.version                                 [s|2.2.3]
      
    • zabbix_agent/zabbix_agentd -t 'web.page.regexp[*****************,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80,ALLES OK]'
      web.page.regexp[*****************,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80,ALLES OK] [s|ALLES OK]
      
  • 2.2.4
    • zabbix_agent/zabbix_agentd -t 'agent.version'
      agent.version                                 [s|2.2.4]
      
    • zabbix_agent/zabbix_agentd -t 'web.page.regexp[*****************,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80,ALLES OK]'
      web.page.regexp[*****************,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80,ALLES OK] [s|ALLES OK]
      

So, obviously it worked until 2.2.4, but stopped doing so in 2.2.5.

Comment by Alexander Vladishev [ 2014 Jul 30 ]

Thank you.

We will try to fix it.

What platform you use on Zabbix agent side?

uname -a
Comment by Andras Fabian [ 2014 Jul 30 ]

Its - at the moment - Ubuntu 12.04:

Linux nbg-webdemo03 3.2.0-65-generic #98-Ubuntu SMP Wed Jun 11 20:27:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Comment by richlv [ 2014 Jul 30 ]

haven't seen it suggested yet...
could you please enable debuglevel 4 (to make it less verbose, disable active checks and make sure nothing else connects to that agent), then request regexp item with zabbix_get

Comment by Andras Fabian [ 2014 Jul 30 ]

Hmm ... its a bit complicated to make the agent only work for this test. Werll, what I managed to get back in the log (with debug on):

 24459:20140730:134603.708 Requested [web.page.regexp[********************,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80,ALLES OK]]
 24459:20140730:134603.709 Sending back []

Not much more visible there.

Comment by Arturs Galapovs (Inactive) [ 2014 Jul 31 ]

We were able to reproduce problem on our side. Fix is ready.
@Andras Fabian, can you please apply patch to trunk or 2.2.5 (tested on both) and check this case on your side. Serach for patch in ticket attachments.

To apply patch run this command from zabbix project root directory:

patch p0 -i <path to patch file>/web_page_regexp.patch
Comment by Andras Fabian [ 2014 Jul 31 ]

Yes, it seems to work!

For example this test delivers correct result:

  • /var/software/zabbix-agent/sbin/zabbix_agentd -t 'web.page.regexp[***********************,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80,ALLES OK]'
    web.page.regexp[[***********************,/_,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80,ALLES OK] [s|ALLES OK]
    

I also added it to our live zabbix environment, and all the problematic checks now turned "green" (received the correct response from the tested pages).

So, this patch seems to be good.

Thank you very much!

Comment by richlv [ 2014 Aug 04 ]

could you please briefly mention what was wrong and in which cases would the problem manifest ?

sasha the agent didn't process the last line of the web page content if it didn't terminate by LF character

For example:

Date: Mon, 04 Aug 2014 07:09:37 GMT<CR><LF>             <--+
Server: Apache/2.2.22 (Ubuntu)<CR><LF>                     |
Last-Modified: Wed, 30 Jul 2014 07:44:04 GMT<CR><LF>       |
ETag: "4c9b81-5b0-4ff64520414f8"<CR><LF>                   |
Accept-Ranges: bytes<CR><LF>                               | These lines was processed correctly
Content-Length: 1456<CR><LF>                               |
Vary: Accept-Encoding<CR><LF>                              |
Connection: close<CR><LF>                                  |
Content-Type: text/html<CR><LF>                            |
<CR><LF>                                                <--+
<html><body>Hello world!</body></html>                  <--- This line was ignored

<richlv> great, that's really helpful - thanks

Comment by Arturs Galapovs (Inactive) [ 2014 Aug 05 ]

Fixed in versions pre-2.2.6 r47830 and pre-2.3.3 (trunk) r47831.

Generated at Thu Apr 25 07:17:16 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.