[ZBX-3469] when snmp host is unavailable, all triggers change to unknown Created: 2011 Jan 27 Updated: 2019 Jun 04 Resolved: 2011 Jul 22 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G) |
Affects Version/s: | 1.8.4 |
Fix Version/s: | 1.8.6, 1.9.5 (alpha) |
Type: | Incident report | Priority: | Major |
Reporter: | matthias zeilinger | Assignee: | dimir |
Resolution: | Fixed | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
solaris, server 1.8.x |
Issue Links: |
|
Description |
since i use zabbix server 1.8.x: |
Comments |
Comment by richlv [ 2011 Mar 02 ] |
most likely only triggers not using time based functions should change their state upon host becoming unreachable |
Comment by dimir [ 2011 Jun 02 ] |
This problem is reproducible however it is not only related to snmp but any item type. E. g. if you have snmp and Zabbix agent items for the host and you stop Zabbix agent snmp item trigger will become UNKNOWN for some time too. This period of time is short (around a second) but still it's a bug. |
Comment by richlv [ 2011 Jun 02 ] |
duration of unknowns probably depend on the amount of items being monitored and their intervals. often unknown state can be observed for 30 seconds or so, even for triggers with nodata() function |
Comment by matthias zeilinger [ 2011 Jun 03 ] |
i saw that in zabbix 2.0 the "unknown" trigger state isn´t used, so i think this problem is fixed, but could you please test. if yes, i will wait for the new version. |
Comment by dimir [ 2011 Jun 03 ] |
In latest 1.8 it's reproducible. The fix is awaiting review and testing. |
Comment by dimir [ 2011 Jun 03 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-3469 . |
Comment by richlv [ 2011 Jun 05 ] |
(2) does the fix also solve the following scenario ? agent.ping item is used along with some other items and nodata() trigger is created against it. if host becomes unavailable, this trigger goes into an unknown state for a brief period, which we do not expect to happen. <dimir> Nope, it's not. Will add that fix shortly. <dimir> RESOLVED in commit r20088 <richlv> what is the new logic ? if we have 3 passive agent items and multiple triggers with different functions (nodata(), last(), last()+nodata(), time()...), which triggers will become unknown ? <richlv> thanks. to clarify... <richlv> otherwise sounds reasonable... so far. see below. <dimir> This is how we see it, would be nice to know your visions on the matter. Now a few more questions. There must be an easier way to handle all this. <richlv> if a trigger references one item with nodata() and another with avg(60), will it only become unknown when avg() item is missing data for 60 seconds ? <dimir> Regarding nodata() + avg(60), I guess so. As far as I know currently any trigger referencing UNKNOWN item becomes UNKNOWN (which is handled in nextchecks as I understood). As for the latter case I think that is a good solution, yes. Should we handle all that in a different ZBX which will fix handling unknown status of triggers or are you comfortable of doing it here? <dimir> RESOLVED in r20173 . The logic is as follows: set UNKNOWN for triggers that reference item of the same type as failed one which does not reference a timebased function. <sasha> CLOSED |
Comment by Alexander Vladishev [ 2011 Jun 05 ] |
Successfully tested! |
Comment by dimir [ 2011 Jun 22 ] |
Thanks to sasha here is the new logic defined. For failed item set all affected triggers to UNKNOWN. Do not set UNKNOWN if any of the following conditions are true:
An item is considered active if all next conditions are true:
|
Comment by richlv [ 2011 Jun 28 ] |
what's the current status of this issue ? where is it planned to be merged ? |
Comment by dimir [ 2011 Jun 29 ] |
Yep, the fix is ready I just haven't tested it yet. Some customer issues interrupted it. Will test/commit today. |
Comment by dimir [ 2011 Jul 08 ] |
Let's try to make the logic more clear: Set trigger status to UNKNOWN if all are true:
An item is considered "active" if all are true:
|
Comment by dimir [ 2011 Jul 08 ] |
Fixed in development branch svn://svn.zabbix.com/branches/dev/ZBX-3469 . |
Comment by dimir [ 2011 Jul 08 ] |
FYI: Useful SQL statement for testing, to see how trigger status changes (in this case condition is triggerid>12999, see the end of the statement): select t.description,case t.value when 0 then 'OK' when 1 then 'PROBLEM' else 'UNKNOWN' END as value,i.description as item,h.hostid,h.host,case h.available when 0 then 'UNKNOWN' when 1 then 'TRUE' else 'FALSE' end as avail,case h.snmp_available when 0 then 'UNKNOWN' when 1 then 'TRUE' else 'FALSE' end as snmp_avail,case ipmi_available when 0 then 'UNKNOWN' when 1 then 'TRUE' else 'FALSE' end as ipmi_avail from items i,functions f,triggers t,hosts h where i.itemid=f.itemid and f.triggerid=t.triggerid and i.hostid=h.hostid and i.status=0 and not i.key_ like 'status' and i.type in (0) and t.status=0 and h.status=0 and t.triggerid>12999;
|
Comment by Alexander Vladishev [ 2011 Jul 19 ] |
Successfully tested! |
Comment by dimir [ 2011 Jul 22 ] |
Fixed in 1.8 r20732:20738, trunk r20751. |
Comment by dimir [ 2011 Oct 25 ] |
Let's try to make the logic even more clear. Let's say an item MYITEM returns error. There is a trigger associated with it. We set that trigger status to UNKNOWN if ALL are true:
|