[ZBX-8512] web.page.regexp not parsing whole output of page Created: 2014 Jul 22 Updated: 2017 May 30 Resolved: 2014 Aug 01 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 2.2.5 |
Fix Version/s: | 2.2.6rc1, 2.3.3 |
Type: | Incident report | Priority: | Critical |
Reporter: | Thorsten Kohlhepp | Assignee: | Unassigned |
Resolution: | Fixed | Votes: | 0 |
Labels: | regexps, webcheck | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: | web_page_regexp.patch |
Description |
The web.page.regexp is not parsing the whole output for the regular expression. It only checks the first line for the regular expression. This happened when updating to 2.2.5 correct output would be <domain-registration domain="denic.de"><status>CONNECT</status><changed>2013-03-01T10:56:26+01:00</changed><detailed-contacts-in-role href="./denic.de/detailed-contacts-in-role"></detailed-contacts-in-role><tech-c><contact href="../contact/tech/DENIC-1000006-DBS"></contact></tech-c><zone-c><contact href="../contact/zone/DENIC-1000006-DBS"></contact></zone-c><nameservers><nameserver owner="denic.de." host="ns1.denic.de."><glue>2a02:568:121:6:2:0:0:2</glue><glue>81.91.170.1</glue></nameserver><nameserver owner="denic.de." host="ns2.denic.de."><glue>193.171.255.36</glue></nameserver><nameserver owner="denic.de." host="ns3.denic.de."><glue>87.233.175.19</glue></nameserver></nameservers></domain-registration> |
Comments |
Comment by richlv [ 2014 Jul 22 ] |
zabbix returns only the matched line and does not match across line breaks - looks like working as intended to me. |
Comment by Thorsten Kohlhepp [ 2014 Jul 22 ] |
in 2.2.4 it has been different. The output above has never been changed. In 2.2.4 I used the regexp "status.*CONNECT.*status" and I got the matched string. In 2.2.5 I get nothing. |
Comment by Thorsten Kohlhepp [ 2014 Jul 23 ] |
Is there any way to match the regex across line breaks? |
Comment by richlv [ 2014 Jul 28 ] |
i tested with current trunk, 2.2.5 and 2.2.4 - behaviour was exactly the same. |
Comment by Andras Fabian [ 2014 Jul 29 ] |
I see a similar behavior with some of our web.page.regexp checks, where a previously working check now doesn't sends back any results (even though, by manually checking the page content everything is fine ... and nothing changed). We have lately updated from 2.2.3 to 2.2.5 and exactly since the update, the checks are broken. Now I have quickly compared the source code for WEB_PAGE_REGEXP and I see substantial change in there! I didn't completely understand the new code, but it seems to go for a a multi line parsing approach. In 2.2.3, there was:
if (SYSINFO_RET_OK == get_http_page(hostname, path, port_number, buffer, ZBX_MAX_WEBPAGE_SIZE))
ptr = zbx_regexp_sub(buffer, regexp, output);
Whereas in 2.2.5 we now have this: if (SYSINFO_RET_OK == get_http_page(hostname, path, port_number, buffer, ZBX_MAX_WEBPAGE_SIZE)) { for (s = buffer, p = s; '\0' != *s; s++) { if ('\n' == *s) { if (s > p && '\r' == *(s - 1)) *(s - 1) = '\0'; else *s = '\0'; if (NULL != (ptr = zbx_regexp_sub(p, regexp, output))) break; p = s + 1; } } } I would dare to say, that quite likely this upgrade did break something in the behavior of web.page.regexp checks! |
Comment by richlv [ 2014 Jul 29 ] |
changes in |
Comment by Alexander Vladishev [ 2014 Jul 29 ] |
We cannot reproduce the issue. 2.2.3, 2.2.4 and 2.2.5 are working equally. For example: $ sbin/zabbix_agentd -t agent.version agent.version [s|2.2.3] $ sbin/zabbix_agentd -t web.page.regexp[www.zabbix.com,/,,"\<li\>.*product.*\<li\>"] web.page.regexp[www.zabbix.com,/,,\<li\>.*product.*\<li\>] [s|li><a class="drop highlight" href="product.php">Product Overview</a></li] $ sbin/zabbix_agentd -t agent.version agent.version [s|2.2.4] sbin/zabbix_agentd -t web.page.regexp[www.zabbix.com,/,,"\<li\>.*product.*\<li\>"] web.page.regexp[www.zabbix.com,/,,\<li\>.*product.*\<li\>] [s|li><a class="drop highlight" href="product.php">Product Overview</a></li] $ sbin/zabbix_agentd -t agent.version agent.version [s|2.2.5] $ sbin/zabbix_agentd -t web.page.regexp[www.zabbix.com,/,,"\<li\>.*product.*\<li\>"] web.page.regexp[www.zabbix.com,/,,\<li\>.*product.*\<li\>] [s|li><a class="drop highlight" href="product.php">Product Overview</a></li] Please attach outputs from your environment. Thank you. |
Comment by Andras Fabian [ 2014 Jul 29 ] |
Well, I will try ... I did a wget at the site which I want to check, and which doesn't work. So, the checking key is this (as you can se, we just look for "ALLES OK"):
web.page.regexp[{$HOSTNAME},/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,{$PORT},"ALLES OK"]
And the data I get with WGET is this (but just pasting it in, might destroy some "invisible" characters - but I can tell, that the entire string has no line breaks etc.): <html><body><table border="1"><tr><th><font color="green">ALLES OK</font></th></tr><tr><td> </td></tr><tr><td><font color=#0000FF><b>JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob</b></font></td></tr><tr><td><b>MESSAGE:</b>Service net.atrada.mops.frontends.jobs.ArtistTagJob wurde ausgefuehrt in 74 ms<br><b>START-DATUM:</b>2014-07-29 16:00:00.203<br><b>STOP-DATUM:</b>2014-07-29 16:00:00.277<br><b>MAX. LAUFEN BIS:</b>Fri Jul 26 16:00:00 CEST 2024<br><b>NÄCHSTE AUSFÜHRUNGS-UHRZEIT:</b>16:15<br><b>IST NÄCHSTE AUSFÜHRUNGS-UHRZEIT IN TIME-SPAN:</b>false<br><b>ENABLED_ON_INSTANCES:</b>nbg-webdemo03<br><b>LAST_EXEC_ON_INSTANCE:</b>nbg-webdemo03<br><b>LAST_CHANGE_SERVICESTATUS:</b>2014-07-29 16:00:00.327<br><b>NEXT-EXCECUTION-DATUM:</b>2014-07-29 16:15:00.0<br><b>IS SERVICE ACTIVE:</b><font color="green">JA</font><br><b>LETZTER JOB-STATUS:</b><font color="green">OK</font><br><font color="green"><b>GESAMTSTATUS:KEINE FEHLER<br></b></td></tr></table><!-- <Seitenende> --></body></html> And even though we all can see, that it contains "ALLES OK" ... the check doesn't matches it. |
Comment by richlv [ 2014 Jul 29 ] |
and what's the item key you tested ? |
Comment by Alexander Vladishev [ 2014 Jul 30 ] |
With your example it works fine on our environment. $ sbin/zabbix_agentd -t agent.version agent.version [s|2.2.5] $ sbin/zabbix_agentd -t 'web.page.regexp[localhost,zbx-8512.html,,ALLES OK]' web.page.regexp[localhost,zbx-8512.html,,ALLES OK] [s|ALLES OK] $ sbin/zabbix_agentd -t 'web.page.get[localhost,zbx-8512.html]' web.page.get[localhost,zbx-8512.html] [t|HTTP/1.1 200 OK Date: Wed, 30 Jul 2014 06:57:45 GMT Server: Apache/2.2.22 (Ubuntu) Last-Modified: Wed, 30 Jul 2014 06:49:23 GMT ETag: "4c9b3f-3ed-4ff638e70a2b9" Accept-Ranges: bytes Content-Length: 1005 Vary: Accept-Encoding Connection: close Content-Type: text/html <html><body><table border="1"><tr><th><font color="green">ALLES OK</font></th></tr><tr><td> </td></tr><tr><td><font color=#0000FF><b>JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob</b></font></td></tr><tr><td><b>MESSAGE:</b>Service net.atrada.mops.frontends.jobs.ArtistTagJob wurde ausgefuehrt in 74 ms<br><b>START-DATUM:</b>2014-07-29 16:00:00.203<br><b>STOP-DATUM:</b>2014-07-29 16:00:00.277<br><b>MAX. LAUFEN BIS:</b>Fri Jul 26 16:00:00 CEST 2024<br><b>NÄCHSTE AUSFÜHRUNGS-UHRZEIT:</b>16:15<br><b>IST NÄCHSTE AUSFÜHRUNGS-UHRZEIT IN TIME-SPAN:</b>false<br><b>ENABLED_ON_INSTANCES:</b>nbg-webdemo03<br><b>LAST_EXEC_ON_INSTANCE:</b>nbg-webdemo03<br><b>LAST_CHANGE_SERVICESTATUS:</b>2014-07-29 16:00:00.327<br><b>NEXT-EXCECUTION-DATUM:</b>2014-07-29 16:15:00.0<br><b>IS SERVICE ACTIVE:</b><font color="green">JA</font><br><b>LETZTER JOB-STATUS:</b><font color="green">OK</font><br><font color="green"><b>GESAMTSTATUS:KEINE FEHLER<br></b></td></tr></table><!-- <Seitenende> --></body></html>] Please attach an output of this commands. sbin/zabbix_agentd -t 'agent.version' sbin/zabbix_agentd -t 'web.page.get[<hostname>,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,<port>]' sbin/zabbix_agentd -t 'web.page.regexp[<hostname>,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,<port>,ALLES OK]' <hostname> and <port> should be replaced with correct values If you want to retrieve whole line you can try to extend the regular expression. For example: "^.*ALLES OK.*$" |
Comment by Andras Fabian [ 2014 Jul 30 ] |
Hi guys, Here I am back with some details you have requested:
As you can see in the last example, the returned String is empty! And no, I do not need to have the whole line returned (we really only need to see if a given string is present in a web page). |
Comment by Alexander Vladishev [ 2014 Jul 30 ] |
Yes, I see. This problem is reproducible only with 2.2.5? Can you attach an output of these commands with agent version 2.2.4 or 2.2.3. |
Comment by Andras Fabian [ 2014 Jul 30 ] |
OK, here we go. I quickly compiled zabbix_agent 2.2.3 and 2.2.4
So, obviously it worked until 2.2.4, but stopped doing so in 2.2.5. |
Comment by Alexander Vladishev [ 2014 Jul 30 ] |
Thank you. We will try to fix it. What platform you use on Zabbix agent side? uname -a |
Comment by Andras Fabian [ 2014 Jul 30 ] |
Its - at the moment - Ubuntu 12.04:
Linux nbg-webdemo03 3.2.0-65-generic #98-Ubuntu SMP Wed Jun 11 20:27:07 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
|
Comment by richlv [ 2014 Jul 30 ] |
haven't seen it suggested yet... |
Comment by Andras Fabian [ 2014 Jul 30 ] |
Hmm ... its a bit complicated to make the agent only work for this test. Werll, what I managed to get back in the log (with debug on): 24459:20140730:134603.708 Requested [web.page.regexp[********************,/_service/dmd_service.ml?category=JobManager.net.atrada.mops.frontends.jobs.ArtistTagJob&searchmode=strict&view=nagios,80,ALLES OK]] 24459:20140730:134603.709 Sending back [] Not much more visible there. |
Comment by Arturs Galapovs (Inactive) [ 2014 Jul 31 ] |
We were able to reproduce problem on our side. Fix is ready. To apply patch run this command from zabbix project root directory: patch p0 -i <path to patch file>/web_page_regexp.patch |
Comment by Andras Fabian [ 2014 Jul 31 ] |
Yes, it seems to work! For example this test delivers correct result:
I also added it to our live zabbix environment, and all the problematic checks now turned "green" (received the correct response from the tested pages). So, this patch seems to be good. Thank you very much! |
Comment by richlv [ 2014 Aug 04 ] |
could you please briefly mention what was wrong and in which cases would the problem manifest ? sasha the agent didn't process the last line of the web page content if it didn't terminate by LF character For example: Date: Mon, 04 Aug 2014 07:09:37 GMT<CR><LF> <--+ Server: Apache/2.2.22 (Ubuntu)<CR><LF> | Last-Modified: Wed, 30 Jul 2014 07:44:04 GMT<CR><LF> | ETag: "4c9b81-5b0-4ff64520414f8"<CR><LF> | Accept-Ranges: bytes<CR><LF> | These lines was processed correctly Content-Length: 1456<CR><LF> | Vary: Accept-Encoding<CR><LF> | Connection: close<CR><LF> | Content-Type: text/html<CR><LF> | <CR><LF> <--+ <html><body>Hello world!</body></html> <--- This line was ignored <richlv> great, that's really helpful - thanks |
Comment by Arturs Galapovs (Inactive) [ 2014 Aug 05 ] |
Fixed in versions pre-2.2.6 r47830 and pre-2.3.3 (trunk) r47831. |