[ZBX-15761] Triggers with nodata() do not enter a problem state when delayed data is received Created: 2019 Mar 05 Updated: 2019 Mar 22 Resolved: 2019 Mar 22 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Server (S) |
Affects Version/s: | 4.0.5 |
Fix Version/s: | None |
Type: | Incident report | Priority: | Minor |
Reporter: | James Cook | Assignee: | Arturs Lontons |
Resolution: | Won't fix | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Zabbix 4.0.5 |
Description |
To test this scenario:
If you have issues with Zabbix proxies / Zabbix agents / Zabbix Sender where the data is delayed this will effect your nodata() triggers. The actual impacts to us was that we received some old data (snmptrap) that would have actually has the trigger enter a problem state and resulted in an action; however as a result the trigger did not enter a problem state and we missed the event. Its almost like there needs to be a trigger function that returns the items submission time in epochtime ie the equivalent of item.now(); then you could do a trigger like:
The above would cause the trigger to initially fire regardless and then in the next evaluation recover. |
Comments |
Comment by Glebs Ivanovskis [ 2019 Mar 05 ] |
Maybe it should be other way round?
|
Comment by Arturs Lontons [ 2019 Mar 05 ] |
Hi, I'll have to agree with what Glebs said - the trigger item.nodata(300)=1 will fire when the item hasn't received any data for 300 seconds. Please try fixing the trigger and recovery expressions as per our documentation. |
Comment by James Cook [ 2019 Mar 06 ] |
Hi, I have tried the triggers as described. When nodata() functions are evaluated on time based (every 30 seconds) it functions as required. When nodata() functions are evaluated on item submission it does not function as required, as it does not factor in the actual item collection time. I tried explaining this to a colleague who also found it hard to understand until I showed him the explanation using mathematics so here goes.
Let the following be defined:
The following mathematical equations should be used when evaluating nodata() on item submission:
Scenario 1 - Current data is received:
nodata(300) should evaluate as 0 as item data has been received based on item collection time. Scenario 2 - Old data is received:
nodata(300) should evaluate as 1 when as no data has been received based on item collection time. What the above will allow is when old data is submitted the triggers can enter a problem state and upon the next time based nodata() evaluation the triggers can enter an ok state; and as a result no triggers will be missed for old data.
The scenario above came to light when we had an issue with our server which backed up the data on the proxies for over 60 minutes. The issue with the server was resolved with a server restart, allowing data to be shipped from the proxies to the server. The old data from up to 60 minutes prior was used when evaluating trigger expressions with nodata() functions. At this point the server evaluated it using the current time ie there has been no data for 300 seconds as it was collected over 30 minutes ago, so lets stay ok (problem: nodata(300)=0, recovery: nodata(300)=1).
This caused the trigger never to fire and we missed the snmptrap from 30 minutes ago and any other triggers with nodata() that fits the time frames. This is an issue as we have triggers based on nodata() that cause actions to create incident records in our service management system.
Regards James
|
Comment by Glebs Ivanovskis [ 2019 Mar 06 ] |
Yes, as far as I recall, nodata() always uses actual time, regardless of item timestamps and what caused trigger recalculation. In makes sense in case of absence of data, but in case of delayed data it can be tricky to interpret results. But it is very unlikely that behaviour of nodata() will be changed, it is a fundamental trigger function and too many people rely on current behaviour. Perhaps, you can use something like count()<>0, semantically it is the same as nodata()=0, but count() respects item timestamp. However, you need to take into account that as I speak count() is not a time-based trigger function, so you need to throw something like ... or 0 * now() into trigger expression to get it evaluated every 30 seconds without any data flowing in. |
Comment by James Cook [ 2019 Mar 07 ] |
Hi Glebs, Bingo - Thanks for your advice and I have tested confirmed the following: Expression = {AAA:test.integer.count(300)}<>0 (We want to fire every time data comes in ie snmptraps) Recovery = {AAA:test.integer.nodata(300)}=1 (We only want to come good after receiving nodata for a period) Testing this it works for old/new data and when nodata is received. So thinking about this when nodata(300) = 1; this will always work as were saying we have received no data, however nodata(300)=0 will not work for old data, where count(300) will do the trick as it respects item submission time. I would think lots of people may not know about this so I will post your solution on my Zabbix forum thread. This can be closed. Cheers James
|