[ZBX-12594] Not possible to ignore exit codes of scripts which may result in unsupported state Created: 2017 Aug 24  Updated: 2024 Apr 10  Resolved: 2017 Oct 11

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 3.4.0
Fix Version/s: 3.4.3rc1, 4.0.0alpha1, 4.0 (plan)

Type: Problem report Priority: Major
Reporter: sles Assignee: Vladislavs Sokurenko
Resolution: Fixed Votes: 5
Labels: exitcodes, scripts, unsupported, userparameters
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos 7


Issue Links:
Duplicate
is duplicated by ZBX-12740 Issue with Zabbix agent 3.4 UserParam... Closed
Sub-task
part of ZBXNEXT-1380 Zabbix server should check the return... Closed
Team: Team A
Sprint: Sprint 17, Sprint 18
Story Points: 1

 Description   

Since 3.4 exit code for scripts is checked, it means that some scripts will now become unsupported, for example grep:

"Normally the exit status is 0 if a line is selected, 1 if no lines were selected, and 2 if an error occurred."

So it is expected that some commands should not become unsupported on some error codes.

UserParameter examples are also broken.

Current workaround to make default example work is to change following user parameter:

mailq

UserParameter=unix_mail.queue,mailq | grep -v "Mail queue is empty" | grep -c '^[0-9A-Z]'

To:

UserParameter=unix_mail.queue,mailq | grep -v "Mail queue is empty" | grep -c '^[0-9A-Z]' || true

MySQL

UserParameter=mysql.ping,mysqladmin -uroot ping | grep -c alive

To:

UserParameter=mysql.ping,mysqladmin -uroot ping | grep alive | wc -l


 Comments   
Comment by Vladislavs Sokurenko [ 2017 Aug 24 ]

thank you for the report, could you please provide steps to reproduce ?
Is it UserParameter ?

Comment by sles [ 2017 Aug 24 ]

Yes, it is UserParameter,
windows server 2016, zabbix agent 3.4.0

UserParameter=tcp.p8530,cmd.exe /C netstat -n -p TCP | find /c ":8530"

Command itself returns 0:
C:\>netstat -n -p TCP | find /c ":8530"
0

if no connections.

But
C:\zabbix>zabbix_agentd.exe -c C:\zabbix\zabbix_agentd.conf -t tcp.p8530
tcp.p8530 zabbix_agentd.exe [4804]: Warning: 0

[m|ZBX_NOTSUPPORTED] [0
]

So problem is on agent, not server side, sorry for wrong component, please change it.

What is strange here is that non-zero values are OK.

btw, just echo is OK:

UserParameter=zero,cmd.exe /C "echo 0"

C:\zabbix>zabbix_agentd.exe -c C:\zabbix\zabbix_agentd.conf -t zero
zero [t|0]

Thank you!

vso thanks, please also do

zabbix_agentd --version
Comment by sles [ 2017 Aug 24 ]

C:\zabbix>zabbix_agentd.exe --version
zabbix_agentd Win64 (service) (Zabbix) 3.4.0
Revision 71462 22 August 2017, compilation time: Aug 21 2017 11:54:32

Comment by Vladislavs Sokurenko [ 2017 Aug 24 ]

Please try printing exit code after you execute command manually

echo Exit Code is %errorlevel%
Comment by sles [ 2017 Aug 24 ]

Another port, because there are connections on 8530:

C:\zabbix>netstat -n | find /c ":8531" & echo %errorlevel%
0
1
vso
Command/script execution changes

Due to command/script exit code check introduction in Zabbix 3.4, alertscripts can be executed multiple times if their exit code is different from 0. Previously configured items with user parameters executed by Zabbix server, external check items, and system.run items which exit code is not 0 may become “Not supported” due to additional checks for exit code, behavior of the items with “nowait” flag is not changed though.

Comment by sles [ 2017 Aug 24 ]

btw, it works on another server usnig older agent:

C:\zabbix>zabbix_agentd.exe --version
zabbix_agentd Win64 (service) (Zabbix) 3.0.0
Revision 58455 15 February 2016, compilation time: Feb 15 2016 14:20:25

result of command is the same

C:\zabbix>netstat -ano | find "190:81" | find /c "ESTAB" & echo %errorlevel%
0
1
vso yes, because it's new feature so you can spot if your script has failed.

Comment by sles [ 2017 Aug 24 ]

Well, and how return previous behavior?
vso you can silence it by doing something like exit 0

Comment by Vladislavs Sokurenko [ 2017 Aug 24 ]

No indication of a bug. Closing as Won't Fix.

Comment by sles [ 2017 Aug 24 ]

script is not failed!
it is just find result if nothing is found, the same is with grep...

Comment by sles [ 2017 Aug 24 ]

It IS bug!

Comment by sles [ 2017 Aug 24 ]

btw, why are you editing my comments instead of reply?

Comment by Glebs Ivanovskis (Inactive) [ 2017 Aug 24 ]

Is there a newline in script's output?

Comment by sles [ 2017 Aug 24 ]

One can't rely on errorlevel as script success or fail, this is wrong.

Comment by Vladislavs Sokurenko [ 2017 Aug 24 ]

please try if exit 0 helps and get back to us, thanks !

Comment by sles [ 2017 Aug 24 ]

yes, exit 0 "solves" issue, but this is workaround, not solution-
by introducing this "feature" you break many existing scripts, because many shell command return 1 or -1 errorlevel, while returning good result.

Comment by sles [ 2017 Aug 24 ]

I'd like to add - if you add new features good idea is provide compatibility.
In this case this can be global parameter to have new or old behaviour and in each UserParameter to override global.

Really, I don't see any reasons to retry script, one who needs this can write script which do such retries internally, in this script.
But, if you needs this to improve zabbix user's life and make such scripts easier to write, why you are making this life difficult by breaking old scripts?

Thank you!

Comment by Vjaceslavs Bogdanovs [ 2017 Aug 25 ]

Well, I understand your point, but it was decided to make things simpler and not to introduce additional item and configuration params.

Upgrade notes:

Due to command/script exit code check introduction in Zabbix 3.4, alertscripts can be executed multiple times if their exit code is different from 0. Previously configured items with user parameters executed by Zabbix server, external check items, and system.run items which exit code is not 0 may become “Not supported” due to additional checks for exit code, behavior of the items with “nowait” flag is not changed though.

So this is a feature and not a bug as it is working the way it was planned.

If you really think that better solution would be introducing additional params to item and/or config, then you can create a feature request for that.

Comment by sles [ 2017 Aug 25 ]

Bug is not always wrong implementation, sometimes bug is wrong idea/way to implement
As I said- you break compatibility.
You can close this as won't fix , though, I have nothing to add but you'll have more complains soon
I'll stay with older agents for now.

Comment by Andrea Biscuola (Inactive) [ 2017 Sep 18 ]

REOPENED

This bug cause items to be "not supported" for exit statuses that are different than 0 BUT are NOT error conditions. For example:

EXIT STATUS
The grep utility exits with one of the following values:

0 One or more lines were selected.
1 No lines were selected.
>1 An error occurred.

This is clearly part of the POSIX standard and zabbix does not respect this (possible) behavior. Also, grep is actually not the only standard utility having this possibility
I personally think that it must be corrected by:

  • Allowing the user to disable the additional exit value check or
  • Implementing a more fine-grained control for what exit codes must be considered errors.
Comment by Dmitry Verkhoturov [ 2017 Sep 18 ]

3.4.1 source code have UserParameters examples which are broken that way - grep -c will return exit code 1 in case of line not found:

➜ zabbix$ egrep "UserParameter.grep.\s-\S*c" . -r
./conf/zabbix_agentd/userparameter_mysql.conf:UserParameter=mysql.ping,HOME=/var/lib/zabbix mysqladmin ping | grep -c alive
./conf/zabbix_agentd/userparameter_examples.conf:UserParameter=unix_mail.queue,mailq | grep -v "Mail queue is empty" | grep -c '^[0-9A-Z]'

Comment by Frederic Perrouin [ 2017 Sep 18 ]

As a workaround for unix_mail.queue userparameter, I put this: UserParameter=unix_mail.queue,mailq | grep -v "Mail queue is empty" | grep -c '^[0-9A-Z]' || true.
So in case we have 0 as a return of mailq, cmd will return true, so no more ZBX_NOTSUPPORTED.

Comment by richlv [ 2017 Sep 18 ]

i'm with abs on this one - it breaks a lot of existing implementations and adds an unnecessary burden on the user.
ability to specify "success exit codes" in the item key would be perfect, not sure how feasible.

Comment by sles [ 2017 Sep 18 ]

> it breaks a lot of existing implementations and adds an unnecessary burden on the user.

to not break existing previous ( pre 3.4) behaviour has to be default.
new behaviour ( with or without list of exit codes) can be turned on by (optional) parameter.

thank you!

Comment by Andrea Biscuola (Inactive) [ 2017 Sep 19 ]

vso "Not well behaved scripts" is incorrect. As the POSIX and SUS specs does not give a final rule but a set of conventions regarding the exit status, also documenting the exit status of all the standard utilities. Please correct it in something like: "Not possible to ignore exit status different than 0 which result in unsupported state even if the script is correct."

Comment by Vitaly Zhuravlev [ 2017 Sep 25 ]

Another example where smartctl utility returns non zero exit code when harddrive is failed:
https://github.com/v-zhuravlev/zbx-smartctl/issues/38

When smartctl -H detects a drive failure it returns an error code greater 0. (see man smartctl section RETURN VALUES).
The zabbix-agent treats this case as command failure and returns ZBX_NOTSUPPORTED to the server.

Comment by Vladislavs Sokurenko [ 2017 Sep 26 ]

Suggested solution is to introduce EnableExitCodeChecks parameter in configuration file of zabbix server, proxy and agent. On Server/Proxy it will control external checks, while on agent it will take care of user parameter exit code checks. New flag wexit should be added to system.run mode, it will enable exit code checks while wait mode will not check for exit code.

Comment by Vladislavs Sokurenko [ 2017 Oct 04 ]

What if we revert exit code checks for external checks, system.run and user parameters ?
Media and user script should however check exit code.

Comment by Alexander Vladishev [ 2017 Oct 04 ]

Yes, please revert these changes. External checks, system.run and user parameters should not support check of exit code.

Comment by Vjaceslavs Bogdanovs [ 2017 Oct 09 ]

Server side tested

Comment by Vladislavs Sokurenko [ 2017 Oct 09 ]

Fixed in:

  • pre-3.4.3rc1 r73309
  • pre-4.0.0alpha1 (trunk) r73310
Comment by Vladislavs Sokurenko [ 2017 Oct 09 ]

Fixed system.run, user parameter and external check not to become unsupported when exit code is different than zero
Exit code checks will only be performed in custom alert scripts, remote commands and user scripts executed on zabbix server and proxy.

Comment by sles [ 2017 Oct 19 ]

3.4.3 is out, but only 3.4.0 precompiled agent for windows is available from downloads....

Thank you!

Comment by Daniel Daniel [ 2018 Aug 08 ]

Hi, pls, why aren't there newer versions of agents for non-Windows platforms available to download in https://www.zabbix.com/download_agents ?

Generated at Fri Apr 26 10:17:28 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.