[ZBX-12195] Script output is contaminated with log messages Created: 2017 May 18  Updated: 2017 Aug 04  Resolved: 2017 Aug 04

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G), Proxy (P), Server (S)
Affects Version/s: 3.2.5, 3.2.6
Fix Version/s: 3.0.11rc1, 3.2.8rc1, 3.4.0alpha2, 3.4 (plan)

Type: Problem report Priority: Major
Reporter: Vladimir Kushnir Assignee: Unassigned
Resolution: Fixed Votes: 1
Labels: alertscripts, logging, scripts, system.run, userparameters
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

docker image zabbix/zabbix-server-mysql:alpine-3.2-latest


Attachments: PNG File contamination remains.png     PNG File screenshot-test_item.png     PNG File screenshot-test_latest_data.png    
Team: Team C
Sprint: Sprint 9, Sprint 10, Sprint 11, Sprint 12
Story Points: 1

 Description   

Hello.

  • I have script:
    json_data.sh
    #!/bin/bash
    # Generate JSON data for zabbix
    # json_data.sh "field1 field2 ... fieldn" "data1 data2 ... datan"
    
    declare -i i
    fields=$1
    data=($2)
    json=""
    i=0
    while [ $i -lt ${#data[*]} ]; do
        row=""
        for f in $fields; do
            row+="\"{#$f}\":\"${data[$i]}\","
            i+=1
        done
        json+="{${row%,}},"
    done
    echo -e "{\"data\": [${json%,}]}"
    
  • Macros
    {$IPDESCR}="88.151.176.41 SKY 93.185.69.98 KTC1 93.185.69.102 KTC2"
  • Discovery rule:
    Type: External check
    Key: json_data.sh["IP DESCR", {$IPDESCR}]
    And i always have error Value should be a JSON object.

for testing i create TEXT item with key json_test.sh["IP DESCR", {$IPDESCR}]
ln -s json_data.sh json_test.sh
In history i have

2017-05-18 10:23:00 1153:20170518:072300.562 zbx_popen(): executing script
{"data": [{"{#IP}":"88.151.176.41","{#DESCR}":"SKY"},{"{#IP}":"93.185.69.98","{#DESCR}":"KTC1"},{"{#IP}":"93.185.69.102","{#DESCR}":"KTC2"}]}

So result is valid JSON object. But why i have this error?



 Comments   
Comment by Glebs Ivanovskis (Inactive) [ 2017 May 18 ]

Indeed, JSON looks good.

Is LLD rule executed on server itself or on a proxy? If yes, what is exact version of proxy? I'm asking because ZBX-11855 may lead to such consequences, 3.2.4 is affected.

Could you set DebugLevel=4 or increase it in runtime for pollers (preferable) with -R log_level_increase=poller to see what do they get from the script? Actually, you can create a regular item instead of LLD rule with the same key to see what Zabbix gets.

Comment by Vladimir Kushnir [ 2017 May 19 ]

I get this in logs:

  120:20170518:090354.711 In DCconfig_get_poller_items() poller_type:0
   120:20170518:090354.711 End of DCconfig_get_poller_items():1
   120:20170518:090354.711 In substitute_key_macros() data:'json_data.sh["IP DESCR", {$IPDESCR}]'
   120:20170518:090354.711 In substitute_simple_macros() data:'{$IPDESCR}'
   120:20170518:090354.711 In DCget_user_macro() macro:'{$IPDESCR}'
   120:20170518:090354.711 End of DCget_user_macro()
   120:20170518:090354.711 End substitute_simple_macros() data:'88.151.176.41 SKY 93.185.69.98 KTC1 93.185.69.102 KTC2'
   120:20170518:090354.711 End of substitute_key_macros():SUCCEED data:'json_data.sh["IP DESCR", 88.151.176.41 SKY 93.185.69.98 KTC1 93.
185.69.102 KTC2]'
   120:20170518:090354.711 In get_value() key:'json_data.sh["IP DESCR", {$IPDESCR}]'
   120:20170518:090354.711 In get_value_external() key:'json_data.sh["IP DESCR", {$IPDESCR}]'
   120:20170518:090354.711 In zbx_popen() command:'/usr/lib/zabbix/externalscripts/json_data.sh "IP DESCR" "88.151.176.41 SKY 93.185.69.
98 KTC1 93.185.69.102 KTC2"'
   120:20170518:090354.711 End of zbx_popen():7
   120:20170518:090354.714 In zbx_waitpid()
   120:20170518:090354.714 zbx_waitpid() exited, status:0
   120:20170518:090354.714 End of zbx_waitpid():223
   120:20170518:090354.714 End of get_value_external():SUCCEED
   120:20170518:090354.714 End of get_value():SUCCEED
   120:20170518:090354.714 In activate_host() hostid:10111 itemid:25434 type:10
   120:20170518:090354.714 End of activate_host()
   120:20170518:090354.714 In lld_process_discovery_rule() itemid:25434
   120:20170518:090354.714 query [txnlev:0] [select hostid,key_,state,evaltype,formula,error,lifetime from items where itemid=25434]
   120:20170518:090354.715 In substitute_simple_macros() data:'30'
   120:20170518:090354.715 query [txnlev:0] [select item_conditionid,macro,value from item_condition where itemid=25434]
   120:20170518:090354.716 In lld_rows_get()
   120:20170518:090354.716 End of lld_rows_get():FAIL
   120:20170518:090354.716 End of lld_process_discovery_rule()
   120:20170518:090354.716 End of get_values():1
Comment by Glebs Ivanovskis (Inactive) [ 2017 May 19 ]

Ah, unfortunately, poller does not log what it got.

Can you create an ordinary item with this key and see Latest Data?

Comment by Glebs Ivanovskis (Inactive) [ 2017 May 19 ]

Oh, this is interesting. Thank you!

Comment by Vladimir Kushnir [ 2017 May 19 ]

Even if put in script just

echo '{"data": [{"{#IP}":"88.151.176.41","{#DESCR}":"SKY"},{"{#IP}":"93.185.69.98","{#DESCR}":"KTC1"},{"{#IP}":"93.185.69.102","{#DESCR}":"KTC2"}]}'

I get same error.

Comment by Glebs Ivanovskis (Inactive) [ 2017 May 19 ]

Sorry, this you actually provided this very information from the very beginning in issue description:

In history i have

2017-05-18 10:23:00 1153:20170518:072300.562 zbx_popen(): executing script
{"data": [{"{#IP}":"88.151.176.41","{#DESCR}":"SKY"},{"{#IP}":"93.185.69.98","{#DESCR}":"KTC1"},{"{#IP}":"93.185.69.102","{#DESCR}":"KTC2"}]}

So the log message gets somehow mixed with the output of the script. Probably we do not flush write buffers before calling script or something like this. Which DebugLevel you normally use?

Comment by Glebs Ivanovskis (Inactive) [ 2017 May 19 ]

Dear Kvantum, I'd suggest to change LogType=console to something different or decrease DebugLevel to 3 as a temporary workaround.

Comment by Vladimir Kushnir [ 2017 May 19 ]

Thank you very much. It working fine right now.

Comment by Vladimir Kushnir [ 2017 May 19 ]

I also check versions from 3.2.0 to 3.2.6. It bug appear in all versions.

Comment by Glebs Ivanovskis (Inactive) [ 2017 May 19 ]

Fix for version 3.0 is available in development branch svn://svn.zabbix.com/branches/dev/ZBX-12195 revision 68344.

Bug affected agent, proxy and server starting from 3.0. Basically all places, where scripts are used: external scripts, media scripts (extra stuff in debug log), system.run[] (except system.run[...,nowait]), UserParameter‍s and even system.sw.packages. To reproduce you need to set LogType=console and DebugLevel>=4.

Comment by Vjaceslavs Bogdanovs [ 2017 Jun 09 ]

(1) [SPG] Current solution prevents contamination only for successful execution (no errors from execl). Result of the execl should be checked to prevent contamination:

And the other thing is exit(EXIT_SUCCESS); on execl error (exit code should be changed to EXIT_FAILURE).

glebs.ivanovskis RESOLVED in r69985, but I wouldn't say I'm happy about the solution. Failures of dup(), dup2() and close() are not handled and overall this looks like an overkill for an unlikely situation when something is wrong with /bin/sh.

vjaceslavs CLOSED

glebs.ivanovskis It was expected that Coverity won't be happy either, CID 165818 and CID 165817 were reported. But given how many similar "Error handling issues" we have... Fixing them all here is impossible, fixing a couple of them without figuring out a common approach would be inconsistent. I think we should leave them for code audit at some point. Reported ZBX-12406.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Jul 10 ]

(2) Not directly related, but there is a bug in how we handle waitpid() interrupts:

$ zabbix_get -s localhost -k 'system.run[sleep 1]'& zabbix_agentd -R log_level_decrease
[3] 1865
zabbix_agentd [1866]: command sent successfully
[2]   Done                    zabbix_get -s localhost -k 'system.run[sleep 1]'
ZBX_NOTSUPPORTED: Timeout while executing a shell script.

However, this should probably be a part of ZBX-10042, WON'T FIX.

Comment by Glebs Ivanovskis (Inactive) [ 2017 Jul 19 ]

Fixed in pre-3.0.11rc1 r70347, pre-3.2.8rc1 r70348, pre-3.4.0alpha2 (trunk) r70349.

Generated at Fri Mar 29 15:25:12 EET 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.