[ZBX-1916] there is no way to reset Host status Created: 2010 Feb 02  Updated: 2022 Oct 08  Resolved: 2015 Dec 21

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Proxy (P), Server (S)
Affects Version/s: 1.8, 1.8.1
Fix Version/s: 3.0.0alpha5

Type: Incident report Priority: Major
Reporter: Vilius Šumskas Assignee: Zabbix Development Team
Resolution: Fixed Votes: 26
Labels: availability, hosts
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Doesn't really matter.


Issue Links:
Duplicate
is duplicated by ZBX-10444 host status stays red after all items... Closed
is duplicated by ZBX-5012 wrong availability status in hosts Closed
is duplicated by ZBX-5932 can't remove agent interface Closed

 Description   

Currently there is no way to reset Host status in Availability column. For example if you configured an item with type "Zabbix agentd" for a host that doesn't really have Zabbix agent installed, and later deleted that item and created new one with "SNMP v2" type, there would be Red Z icon and Green SNMP icon in Availability column. This conduses the eye, as if there is something wrong with the host, but it's not. There should be a way to reset the status.



 Comments   
Comment by richlv [ 2010 Feb 02 ]

not sure where to best handle this, but maybe a method could be provided in frontend.
until then, hackish solution by hostid (can be obtained from url when editing host) or hostname :

update hosts set available='0' and ipmi_available='0' and snmp_available='0' where hostid='<host id>';

or

update hosts set available='0' and ipmi_available='0' and snmp_available='0' where host='target_host';

Comment by Vilius Šumskas [ 2010 Feb 02 ]

Or better yet, refresh status automatically when settings for the items are changed.

Comment by richlv [ 2010 Feb 02 ]

yes, that's a better possibility (assuming it does not decrease performance - for example, deleting any item might require checking whether any other items of that type are still left and only then updating availability status)

Comment by Oleksii Zagorskyi [ 2012 Jan 25 ]

Bingo! I have the same opinion as Rich in previous comment - the frontend could be improved for such cases.

Comment by Raymond Kuiper [ 2012 Aug 30 ]

I think an easy solution would be to include 'Reset status' in the dropdown menu at the bottom of the "configuration, hosts" pages.
Then selecting some or all hosts, selecting the option in the dropdown and clicking the button could execute the SQL code Richlv mentioned for those hosts.

Comment by richlv [ 2012 Aug 30 ]

sounds reasonable. other possibilities - have 'reset status' in host properties; have 'reset status' link when clicking on individual status icons (would have to be designed to work with tooltips as well). the benefit of icons would be the ability to reset individual status, not all of them

hmm, in host properties this could also be done on the status icons at the top of the form

Comment by Raymond Kuiper [ 2012 Aug 30 ]

I like the icon link idea
But I'd rather have a simple solution fairly quick instead of waiting a long time for a complicated solution.

Comment by richlv [ 2013 Mar 12 ]

theoretically, zabbix server also could periodically check whether there are any enabled items of a specific type, and reset the availability if not. this might be desirable, and would not require manual work or frontend changes

for now, a simple scheduled script could even be created =)

Comment by Lukas S [ 2014 Dec 09 ]

Please, could you tell me if there is any new solution for this problem? I've got only active checks on running agent and my zabbix show me red icon with error (passive checks are disabled) . How about simple scheduled script, could you give me some details?

Comment by viktorkho [ 2015 Jul 31 ]

> How about simple scheduled script, could you give me some details?

We use something like this to clear errors in ZS v2.4:

mysql --user ${MYSQL_USER} ${DBNAME} -e \
"update hosts set error='', ipmi_error='', snmp_error='', jmx_error='' where status = 0;"

It is rather easy to use it in Scripts (in UI) with {$HOST.HOST} passed to it as additional SQL-filter (e.g. "where host = {$HOST.HOST}")

Of course - be careful, try it on test instance first.

PS:
Also we use smth like this:

... "update hosts set disable_until=0, available=0, errors_from=0, lastaccess=0, ipmi_disable_until=0, ipmi_available=0, snmp_disable_until=0, snmp_available=0, jmx_disable_until=0, jmx_available=0, jmx_errors_from=0 where status = 0;"

to clear all errors too

Comment by Andris Zeila [ 2015 Sep 07 ]

Specification at https://www.zabbix.org/wiki/Docs/specs/ZBX-1916

Comment by Andris Zeila [ 2015 Sep 28 ]

Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-1916

Comment by Sandis Neilands (Inactive) [ 2015 Oct 09 ]

(1) Documentation updates in "Unreachable/unavailable host settings" page:

  • describe the new conditions for changing host availability;
  • state explicitly that host availability is not tracked for active Zabbix agent checks (maybe mention this also in "Internal checks" page?).

Maybe we should also update "Internal checks page" to reflect the changes of this improvement.

martins-v Updated documentation:

Please review.

wiper Looks good to me.

martins-v Improved a bit more as suggested by sandis.neilands:

sandis.neilands Looks good. CLOSED.

Comment by Sandis Neilands (Inactive) [ 2015 Oct 13 ]

(2) Several minor fixes in r56094:56123. Please review.

wiper Thanks, I changed one function header comment slightly in r56135, please check.

sandis.neilands Looks good. CLOSED.

Comment by Sandis Neilands (Inactive) [ 2015 Oct 13 ]

(3) Check the zbx_vector_uint64_pair_...() API. Using pointers to 'pair' local variable of DCreset_hosts_availability() in DCupdate_hosts_availability() will lead to at least functional errors if not crashes.

CLOSED. I was looking at implementation of zbx_vector_ ## _id ## _append() instead of zbx_vector ## __id ## _append_ptr() where the pointer IS dereferenced.

Comment by Sandis Neilands (Inactive) [ 2015 Oct 13 ]

(4) DBend_multiple_update() is missing from DCupdate_hosts_availability(). This will cause problems with Oracle DBs.

wiper RESOLVED in r56136

sandis.neilands CLOSED.

Comment by Sandis Neilands (Inactive) [ 2015 Oct 15 ]

(5) We should decide and document what are the semantics of availability column in the Configuration - Hosts page.

#1. When disabling a host the segments will remain unchanged until server synchronizes configuration cache with the DB. Semantics: the availability column represents the server's view of the host's availability.

  • internal checks (availability) match values in availability column;
  • server actually continues polling items until sync;
  • when the user enables the host the server updates availability only after polling starts (e.g. after sync).

#2. Alternatively the front-end could always show grey availability for disabled hosts (regardless of how it is in the DB). Semantics: after configuration sync this host will not be available. This matches the semantics of the status column.

Document? The first option (how it is implemented now) honestly represents DB state but can be confusing. The best option would be to document how to interpret the status and availability columns after this improvement is finished:

  • disabled, but available - host was disabled, but configuration has not yet reached server cache so server continues working as before until sync;
  • disabled, not available - server has disabled the host;
  • enabled, not available - either server has not yet synchronized configuration or hasn't yet polled the relevant items;
  • enabled, available - server has commenced polling.

Thoughts?

martins-v I added a short version of the above to https://www.zabbix.com/documentation/3.0/manual/web_interface/frontend_sections/configuration/hosts#reading_host_availability. You may also notice that part of (1) about setting unknown host status has been moved in here where, I think, it is more useful. Please review.

sandis.neilands Agree. CLOSED.

Comment by Sandis Neilands (Inactive) [ 2015 Oct 15 ]

(6) In DCupdate_hosts_availability(): if DBcommit() failed then there is no point in doing DBrollback(). In case of txn_error zbx_db_commit() will rollback itself. Otherwise txn_level remains unchanged only in case of ZBX_DB_DOWN - most likely the rollback will fail as well.

wiper RESOLVED in r56211

sandis.neilands CLOSED. Hopefully we understood correctly how this works with Oracle.

Comment by Sandis Neilands (Inactive) [ 2015 Oct 16 ]

(7) If the proxy is down when host is changed to be monitored by proxy then availability remains the same until receiving availability from proxy.

We have the same problem even if proxy is up but hasn't got the new configuration yet (the default interval is an hour). In that case - server will not be monitoring the host any more but neither will proxy. The same will happen when changing host from one proxy to another.

wiper Now the host availability status will also be reset during configuration sync if:

  • host was assigned to a (different) proxy, host was removed from proxy
  • no hearbeat was received from hosts's proxy during maximum hearbeat period

RESOLVED in r56474

sandis.neilands REOPENED. We should have some tolerance (1 minute?) for delayed heartbeats.

wiper RESOLVED in r56636

sandis.neilands CLOSED. Added a small comment in r56646.

Comment by Sandis Neilands (Inactive) [ 2015 Oct 22 ]

(8) Take a look at Coverity CID 118919 which is a false positive (FP).

process_host_availability()        // Junk JSON without availability data received from proxy.
--DChost_update_availability()     // 'availability' - NULL pointer dereference? No.

It is a FP because Coverity didn't understand that 'availability' is always updated together with 'availability_num'. DChost_update_availability() will not enter the loop if 'availability_num' is 0.

However it is possible to maliciously cause excessive cache locking on Zabbix server with garbage JSON from "proxy" (telnet or netcat).

Can we do this?

Index: src/libs/zbxdbhigh/proxy.c
===================================================================
--- src/libs/zbxdbhigh/proxy.c	(revision 56322)
+++ src/libs/zbxdbhigh/proxy.c	(working copy)
@@ -1717,7 +1717,8 @@
 
 	DBcommit();
 
-	DChost_update_availability(availability, availability_num);
+	if (0 < availability_num)
+		DChost_update_availability(availability, availability_num);
 out:
 	zbx_free(availability);
 	zbx_free(tmp);

wiper RESOLVED in r56470

sandis.neilands CLOSED.

Comment by Sandis Neilands (Inactive) [ 2015 Nov 09 ]

(9) For consistency: use SEC_PER_HOUR also in definition of CONFIG_PROXYCONFIG_FREQUENCY in server not only proxy.

RESOLVED in r56613.

wiper CLOSED

Comment by Sandis Neilands (Inactive) [ 2015 Nov 09 ]

(10) Possible race condition. If some slow check returns after the host has been switched to proxy then it could be that the availability for that interface is changed in the DB (e.g. it will not be unknown) until the next data is available from proxy.

wiper RESOLVED in r56635

sandis.neilands CLOSED. Added comments in r56648.

wiper thanks

Comment by Andris Zeila [ 2015 Nov 10 ]

(11) When proxy fails to send host availability data to server it does not revert internal host availability cache and this availability update will be lost.

This and few other related bugs were reported in separate issue ZBX-10066. Fixing it will require redesign how host availability updates are sent by proxy and possible also how they are stored in server/proxy.

CLOSED

Comment by Andris Zeila [ 2015 Nov 11 ]

Released in:

  • pre-3.0.0alpha5 56662
Comment by Andris Zeila [ 2015 Nov 11 ]

(12) Documentation:

martins-v will rework this a bit to make it more clear.

martins-v Somewhat reworded and expanded: https://www.zabbix.com/documentation/3.0/manual/introduction/whatsnew300#host_availability_improvements. Please review for accuracy.

sandis.neilands Very nice, thanks! CLOSED.

wiper Yes, thanks

Generated at Thu Apr 25 17:38:29 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.