[ZBX-4675] server can send incomplete configuration data to a proxy Created: 2012 Feb 18  Updated: 2017 May 30  Due: 2014 Apr 17  Resolved: 2014 Apr 09

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Server (S)
Affects Version/s: 1.8.9
Fix Version/s: 2.0.12rc1, 2.2.4rc1

Type: Incident report Priority: Minor
Reporter: richlv Assignee: Unassigned
Resolution: Fixed Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by ZBX-8033 Zabbix Server/Proxy can lost configur... Closed

 Description   

if a database query fails, server shouldn't send incomplete configuration data to the proxy.

for example, if a query fails like this :

5384:20120218:065603.077 [Z3005] query failed: [126] Incorrect key file for table '/var/tmp/mysql.CerISn/#sql_ee7_0.MYI'; try to repair it [select t.itemid,t.type,t.snmp_community,t.snmp_oid,t.snmp_port,t.hostid,t.key_,t.delay,t.status,t.value_type,t.trapper_hosts,t.units,t.multiplier,t.delta,t.snmpv3_securityname,t.snmpv3_securitylevel,t.snmpv3_authpassphrase,t.snmpv3_privpassphrase,t.formula,t.logtimefmt,t.templateid,t.valuemapid,t.delay_flex,t.params,t.ipmi_sensor,t.data_type,t.authtype,t.username,t.password,t.publickey,t.privatekey from items t,hosts r where t.hostid=r.hostid and r.proxy_hostid=5002 and r.status in (0,1) and t.status in (0,1,3) and t.type in (0,7,1,4,6,12,2,3,9,10,11,13,14) order by t.itemid]

proxy gets host information, but without any items.

if any of the queries for proxy configuration dataset fail, server should either retry them, or, if not successful, refuse to send config data to the proxy.

this has happened twice in a test setup during a few hours for different reasons.



 Comments   
Comment by Alexei Vladishev [ 2012 Feb 18 ]

We are talking about corrupted database here. In this case Zabbix must stop.

What's the point in retrying it million of times or sending wrong (empty) data set to the proxies?

Comment by richlv [ 2012 Feb 18 ]

hmm. the problem here was that there was a working proxy, collecting data, then server sent to it partial configuration data - hosts only, no item information. the query did work later, but for an hour the proxy was not collecting any information.

to clarify, if any of the queries to gather proxy config data fails, server should just refuse to send any data (not empty data !) to the proxy. partial data is worse than having the proxy work with whatever it had before.

Comment by Alexei Vladishev [ 2012 Feb 18 ]

That's what I am saying, if a query fails due to unrecoverable problem the server must stop.

Comment by richlv [ 2012 Feb 19 ]

how would we know for sure that it's unrecoverable ? in this case it was full disk, which had automatically resolved by the time proxy asked for the config again...

Comment by Oleksii Zagorskyi [ 2012 Feb 19 ]

At the page http://www.zabbix.com/documentation/1.8/manual/about/installation_and_upgrade
we can find two notes:

1.3.3.5 For version 1.8.8
In some cases hosts and proxies with identical name might have appeared in the Zabbix database. Starting with 1.8.8, Zabbix server will shut down if it detects such a situation.
1.3.3.6 For version 1.8.9
The shutdown upon detection of duplicate hosts, introduced in 1.8.8, has been removed.

In the 1.8.8 we had several bad feedback, and in the 1.8.9 we decided to not stop the server.
As conclusion of that example I'm considering this case as similar and I vote to NOT stop zabbix server but somehow to fix this issue.

Comment by Alexei Vladishev [ 2012 Feb 19 ]

>how would we know for sure that it's unrecoverable ?

We describe all recoverable cases (DB is down, it's being shutting down, etc) and assume that the rest in unrecoverable.

>As conclusion of that example I'm considering this case as similar and I vote to NOT stop zabbix server but somehow to fix this issue.

In this case we have corrupted database, CORRUPTED! Do you really want some part of functionality continue to work, while another would wait until the problem is resolved and work incorrectly?

No way, we must stop immediately.

Comment by richlv [ 2012 Feb 20 ]

1. enumerating goodness seems to be a sensible approach - it's up to mysql to decide which conditions it might recover from
2. do we do that now (listing recoverable cases) ? judging by the server continuing, i'd guess not (although there were some issues where we did specific things upon receiving specific codes from the db) ?
3. in any case, this must be documented, including all db codes we recognise as recoverable.
4. in this specific case, i'd agree that we should shut down. even though mysql was able to recover here, it's error message did not imply that it could
5. but what if the item query would have got a recoverable error message - i suspect we would still have sent incorrect (partial) data to the proxy ? so, irregardless of what we do when receiving this specific code, we absolutely must ensure that we do not send config data to proxy if any one of config gathering queries fails (and is not retried successfully) - whether it's shutting down or refusing to answer the proxy doesn't matter for this logic

depending on the answers, new issues might be needed to cover it all

Comment by Alexei Vladishev [ 2012 Feb 24 ]

I think this issue can be closed. None of 1-4 is a bug except 5, which is a generic problem not related specifically to proxy handling code.

Comment by richlv [ 2012 Feb 24 ]

hmm. the user visible problem is that incomplete configuration data is sent to the proxy. specific things that should be done :

a) server should stop when receiving unrecoverable error from the database;
b) all codes that we consider recoverable should be documented for all databases;
c) in case of receiving recoverable error from the db we must ensure that we do not send config data to that proxy.

while a might be out of scope, a new issue must be created if so. i believe b should be documented while implementing/fixing a.

but c is a problem with the proxy handling code, i suspect that we would still send invalid data to the proxy if the error would be recoverable...

Comment by richlv [ 2013 Jun 21 ]

some doc draft at https://www.zabbix.com/documentation/2.2/manual/appendix/database_error_handling - should be reviewed & completed

Comment by dimir [ 2014 Apr 08 ]

So, currently no database error can cause any Zabbix daemon to stop. We decided to keep it that way. We will change 2 places: updating configuration cache from the database (both of server and proxy) and generating configuration for proxy, to mind the database errors and in case of error keep the old configs.

Not touching the cache is easy. For the proxy config it's a bit trickier. Currently it seems we never send an error. In case active proxy is requesting configuration from the server and an error occurs on the server we log the error on the server and close the connection. So the proxy doesn't get to know what happened. Should we keep it that way?

<richlv> if a proxy connects and sends a name we don't recognise, we return an error - for example, see "failed" responses at https://www.zabbix.org/wiki/Docs/protocols/zabbix_proxy/2.0

can't we just do that, possibly with a bit more detailed message ?

also, what about gathering config data for a passive proxy, and failing - do we log that properly ?

dimir In 2.0 in case of active proxy configuration request we just close the connection without sending anything back. On the proxy side in case there is zero answer to proxy config request we log:

Cannot obtain configuration data from server. Proxy host name might not be matching that on the server.

with the warning level.

In 2.2 we send an "error" response with error description:

Cannot obtain configuration data from server. info:"proxy "Zabbix proxy" not found"
Comment by dimir [ 2014 Apr 08 ]

Rich, that list is correct. These are so called "recoverable" MySQL errors, in which case we continue executing the SQL query until we get no error (with delay of 10 seconds). With other databases there is no list, we decide if that is a "recoverable" error on case by case.

"Unrecoverable" database errors are treated differently in different places of the code, but the transactions are rolled back explicitly.

<richlv> i don't like the "errors are treated differently in different places of the code" part - we should document all of that. if it can not be reasonably documented because it's too complicated, it's too complicated to be bugfree, too...

Comment by dimir [ 2014 Apr 09 ]

Fixed for 2.0 in development branch svn://svn.zabbix.com/branches/dev/ZBX-4675 .

Comment by Andris Zeila [ 2014 Apr 14 ]

Successfully tested, please review r44388

Comment by dimir [ 2014 Apr 15 ]

(1) [PS] Separate dev branch was created for fixing the issue in 2.2 as there is a difference in handling proxy configuration. I. e. in case of error getting configuration from the database we send the actual error to the active proxy. The error will just mention the database table that caused the error.

Fixed in svn://svn.zabbix.com/branches/dev/ZBX-4675-2.2

wiper Successfully tested. Please review changes in r44464, r44475

dimir CLOSED

Comment by dimir [ 2014 Apr 15 ]

As to the doc changes, I guess it's only about whatsnew?

Comment by dimir [ 2014 Apr 16 ]

Doc changes: https://www.zabbix.com/documentation/2.0/manual/introduction/whatsnew2012#daemon_improvements

Comment by dimir [ 2014 Apr 16 ]

Fixed in pre-2.0.12 r44412, pre-2.2.4 r44495, pre-2.3.0 r44497.

Generated at Fri Apr 26 09:59:26 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.