| [ZBX-12969] Agent user scripts merging stderr with stdout has bad consequences Created: 2017 Nov 01 Updated: 2017 Nov 02 Resolved: 2017 Nov 01 | |
| Status: | Closed | 
| Project: | ZABBIX BUGS AND ISSUES | 
| Component/s: | Agent (G) | 
| Affects Version/s: | 3.2.7 | 
| Fix Version/s: | None | 
| Type: | Incident report | Priority: | Major | 
| Reporter: | Telford Tendys | Assignee: | Unassigned | 
| Resolution: | Duplicate | Votes: | 0 | 
| Labels: | UserParameters, agent, mysql, ping, stderr | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: | CentOS-7 using packaged Zabbix | ||
| Attachments: |  screenshot-1.png | ||||||||||||
| Issue Links: | 
 | ||||||||||||
| Story Points: | 5 | ||||||||||||
| Description | 
| The package includes a config file: /etc/zabbix/zabbix_agentd.d/userparameter_mysql.conf It defines a UserParameter called mysql.ping which makes a good example of why this is so broken. If MySQL is working properly, the command "mysqladmin ping" will generate the output "mysqld is alive" and pipe over to "grep -c alive" which in turn produces a numeric output "1" sent back on stdout. This is the real output that the Zabbix server is looking for. That's lovely and it gives the sysadmin a nice warm feeling that everything is in good order. The common template also includes a trigger to report "MySQL server is down" when an output "0" is detected. Unfortunately, here's what happens when the MySQL server actually does go down: the command "mysqladmin ping" prints a bunch of errors to stderr... hardly surprising since there's a problem. These error messages are merged with the output and sent back to the Zabbix server. They don't look like any sort of number so the server decides no data is available and thus it does not bother to trigger that "MySQL server is down" report and hence nobody gets any notifications. This is a fundamental flaw with every type of shell command that is defined as UserParameter in any config. We must expect that the entire purpose of a monitoring system is that there will be times when something has gone wrong, therefore any command at any time should be presumed to be sending random values to stderr. These random values might be understood by a human, but they cannot be sent to the Zabbix server if we want any confidence in our alert reporting. Thus, potentially all UserParameter scripts are broken (including the examples provided as part of the standard package) unless someone edits the script to redirect stderr off to /dev/null or possibly to some logfile elsewhere. Steps to reproduce: 
 Result: 
 NOTE: Correct behaviour should be for Zabbix agent to only send stdout back to the Zabbix server, but if you don't want to throw away the stderr then just log it locally, presumably to the local agent log file. If the agent is smart with buffering those two streams separately, might be possible to guarantee that stdout gets sent FIRST to the Zabbix server and then perhaps send stderr AFTER but this more complex handling of buffers is also more likely to fail in unexpected ways. At very least, if you really must keep stderr then make sure your standard example scripts and templates are tested properly. | 
| Comments | 
| Comment by Telford Tendys [ 2017 Nov 01 ] | 
| Links to old tickets.Old issue regarding stderr in user parameter scripts:  Discussion of documentation relating to this:  Example of working server[root@z-server ~]# zabbix_get -s 192.0.2.20 -k mysql.ping 1 Example of server down and broken response[root@z-server ~]# zabbix_get -s 192.0.2.20 -k mysql.ping mysqladmin: connect to server at 'localhost' failed error: 'Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)' Check that mysqld is running and that the socket: '/var/lib/mysql/mysql.sock' exists! 0 | 
| Comment by Vladislavs Sokurenko [ 2017 Nov 01 ] | 
| Can you please show your user parameter and trigger ? I don’t see why you wouldn’t get an alert with default example that grep for alive keyword | 
| Comment by Telford Tendys [ 2017 Nov 01 ] | 
| UserParameter setting (out of zabbix package)UserParameter=mysql.ping,HOME=/var/lib/zabbix mysqladmin ping | grep -c alive RPM Package details (from yum)Installed Packages Name : zabbix-agent Arch : x86_64 Version : 3.2.7 Release : 1.el7 Size : 1.3 M Repo : installed From repo : zabbix Summary : Zabbix Agent URL : http://www.zabbix.com/ License : GPLv2+ Description : Zabbix agent to be installed on monitored systems. Yum repo in use[zabbix] name = zabbix baseurl = http://repo.zabbix.com/zabbix/3.2/rhel/7/x86_64/ gpgcheck=0 Trigger expression viewed in browser (from standard template)NOTE: this same template is used as export example in Zabbix, here is the same expression in XML:     <triggers>
        <trigger>
            <expression>{Template App MySQL:mysql.ping.last(0)}=0</expression>
            <recovery_mode>0</recovery_mode>
            <recovery_expression/>
            <name>MySQL is down</name>
            <correlation_mode>0</correlation_mode>
            <correlation_tag/>
            <url/>
            <status>0</status>
            <priority>2</priority>
            <description/>
            <type>0</type>
            <manual_close>0</manual_close>
            <dependencies/>
            <tags/>
        </trigger>
    </triggers>
See full details in zabbix documentation here – https://www.zabbix.com/documentation/3.4/manual/xml_export_import/templates | 
| Comment by Vladislavs Sokurenko [ 2017 Nov 01 ] | 
| I am sorry, could you also please attach latest data screenshot for this item ? | 
| Comment by Vladislavs Sokurenko [ 2017 Nov 01 ] | 
| similar issue  | 
| Comment by Vladislavs Sokurenko [ 2017 Nov 01 ] | 
| Closing as duplicate of  | 
| Comment by Paul Williamson [ 2017 Nov 02 ] | 
| This is a serious issue. The Zabbix expects a string of '0' or '1' to be returned from mysql.ping. When this does not happen Zabbix gets a rubbish value and does not generate an alert. i.e. Alerts are not generate when mysql goes down from the standard configuration. | 
| Comment by Vladislavs Sokurenko [ 2017 Nov 02 ] | 
| Related task  | 
| Comment by Paul Williamson [ 2017 Nov 02 ] | 
| It may help debug problems but it stops the alert from being triggered according to your default settings. The trigger for mysql down is: ```{Template App MySQL:mysql.ping.last(0)} =0 i.e. If mysql.ping does not return 0 you will get this trigger. When mysql is down the NEVER happens because the stderr is included in the value: ``` Gives you: ``` So the 0 at the end. Since the trigger is expecting a 0 and it gets everything it does not work. Telford has also explained this above but you have failed to understand his issue. |