[ZBXNEXT-1891] Implicit trigger dependency when monitored via proxy Created: 2013 Sep 07  Updated: 2024 Apr 10  Resolved: 2020 May 15

Status: Closed
Project: ZABBIX FEATURE REQUESTS
Component/s: Proxy (P), Server (S)
Affects Version/s: 2.0.8
Fix Version/s: 5.0.0beta2, 5.0 (plan)

Type: Change Request Priority: Trivial
Reporter: Jean Baptiste Favre Assignee: Michael Veksler
Resolution: Fixed Votes: 85
Labels: None
Σ Remaining Estimate: Not Specified Remaining Estimate: Not Specified
Σ Time Spent: Not Specified Time Spent: Not Specified
Σ Original Estimate: Not Specified Original Estimate: Not Specified

Attachments: PDF File SPEC-DEV-1891.pdf    
Issue Links:
Causes
causes ZBX-18418 active nodata() triggers assigned to ... Closed
Duplicate
Sub-task
depends on ZBX-17650 Wrong formula used for zabbix[proxy,,... Closed
Sub-Tasks:
Key
Summary
Type
Status
Assignee
ZBXNEXT-5790 Frontend changes for implicit trigger... Change Request (Sub-task) Closed Gregory Chalenko  
Team: Team C
Sprint: Sprint 62 (Mar 2020), Sprint 63 (Apr 2020), Sprint 64 (May 2020)
Story Points: 4

 Description   

When you monitor many servers through differents proxies, you need to be able to set up trigger dependency against proxy.
But, as far as I know, there's no way to automatically get the dependency setted up.

First point: since some servers are monitored by a proxy and some are not (typically zabbix SQL servers), I just can't set up dependency inside the template or I'll have to have one template per service per proxy.

Second point: as far as I know, there are no macro available which represent the proxy monitoring the host and even if there is, I don't know how to implement following rule:
"For a given trigger, if the host is monitored by a proxy then trigger depends on proxy availability; if the host is not monitored by a proxy, then there are no dependency against proxy."



 Comments   
Comment by Maxim Krušina [ 2013 Sep 13 ]

Exactly. My proxy just faild and my mailbox is flooded with messages...

Comment by Oleg Ivanivskyi [ 2013 Sep 13 ]

Possible workaround - use item key "zabbix[host,agent,available]" and trigger expression "Template App Zabbix Agent:zabbix[host,agent,available].last(0)}#1" for trigger "Zabbix agent on

{HOST.NAME}

is unreachable for 5 minutes". It is better than "agent.ping.nodata" because it is more significant and precise metric (https://www.zabbix.com/documentation/2.0/manual/config/items/itemtypes/internal). But, the type of agent must be "passive" (I don `t know if this is possible in your environment).

Comment by Jean Baptiste Favre [ 2013 Sep 13 ]

@Oleg.Ivanivskyi

That is only a workaround. I already use zabbix[host,agent,available]
More, and this is part of the problem, I heavily use trappers for performance reasons. In that case, the only way I know to ensure we got fresh data is to check 'nodata'.

Problem is, when you monitor a host through a proxy and that proxy goes down, you'll get alert for every single service of every single host monitored by that proxy, not only zabbix-agent.

Comment by Jean Baptiste Favre [ 2013 Oct 01 ]

Could be considered as a specific case of ZBXNEXT-46

Comment by Filipe Paternot [ 2014 Feb 25 ]

This is crucial for distributed setup with complex network scenarios.

It should be most used on enterprise setup and, even if this development should be quite simple (creating a new MACRO), the importance should be considered a rather important issue.

I would greatly enjoy this feature.

Comment by Maxim Krušina [ 2014 Mar 12 ]

Yep, for example our Zabbix server is located in server housing location, so out of office (coz there is best connectivity and our webservers, so we need to monitor SLA withou dependency on our office connection). But in our office, there is proxy, which is monitoring about 70% of devices, so adding dependencies manually is a bit hell.

Comment by Marc [ 2014 Mar 29 ]

Might ZBXNEXT-46 help here too? It would suppress the whole host though.

Comment by Mehmet Ali Buyukkarakas [ 2014 Nov 30 ]

It is a highly important thing for MSP companies like our. If a proxy fails, we receive an alert storm. Why to not develop a solution for this issue ?
Mehmet

Comment by Marc [ 2014 Nov 30 ]

mbuyukkarakas, there's always the option of (co-) sponsoring a feature to speed things up

Comment by Nicola V [ 2014 Dec 10 ]

Chiming in. It would be a very welcome feature.
Did anyone come up with an API-based solution, in the meantime? I'm not a dev but I'd love to look into the API. It's not so easy to tackle this issue.

Comment by YuriyS [ 2015 Nov 05 ]

Is this feature not implemented yet? It'll be very helpful.

Comment by Marc [ 2015 Nov 05 ]

foboss,

according to the roadmap it's unfortunatelly not planned yet.

Comment by Sheikh Rezwanur Rahman [ 2018 Jan 25 ]

it will be relay help for me .currently i have reeving lots of mail when proxy goes down

Comment by Andreas Drbal [ 2018 Jan 25 ]

This feature will be monumental if provided out of the box. Please look into this at a heightened level.

Comment by Ronald Rood [ 2018 Jan 25 ]

Personally I found it hard to believe this is not out of the box available since in the distributed nature of zabbix this is a requirement. Of something in the zabbix infrastructure is not reachable, like a proxy, all alerts of targets behind the proxy become irrelevant.
So the prio of this request should be raised a little. I can imagine that the implementation of this feature is not very challenging but the result is very welcome.

Comment by Ilya Kruchinin [ 2018 Jul 25 ]

There is a solution here: https://www.zabbix.com/forum/zabbix-troubleshooting-and-problems/48911-trigger-dependencies-zabbix-proxy-down

Using template nesting and macro inheritance.

Then you simply map your hosts to the template with the right macro for "PROXY_HOST", done.

Comment by Ilya Kruchinin [ 2018 Jul 25 ]

There might be another, simpler solution (but I haven't checked it). Simply creating a trigger with a supported MACRO for PROXY.NAME as documented in https://www.zabbix.com/documentation/4.0/manual/appendix/macros/supported_by_location?s[]=macros 

Comment by Oleh [ 2019 Jun 25 ]

6 years of bug.
 - Recommendations: use a proxy, we will do preliminary processing on it.
 - And if it go down? What should we do with the .nodata () HOST alerts?
 - Hmm... Suffer.

Seriously?
Maybe just add “Proxy is UP” to the Actions property?

Comment by Alexei Vladishev [ 2020 Feb 07 ]

Does it affect only triggers having nodata() function?

Comment by Ronald Rood [ 2020 Feb 07 ]

I guess so but also the Queue .... think about the x items missing data for more than y minutes ...

For me most important is the nodata triggers.

Comment by Alexei Vladishev [ 2020 Feb 10 ]

I think it might be implemented the following way. By default, nodata(sec) will take into account not only proxy availability but also proxy lastaccess time and for how long data is delayed on proxy side. So, it will return 1 (no data) if there is no data for a period of time of "sec" + "proxy delay" + ("current server time" - "proxy lastaccess time"). In this case nodata() will be much less sensitive and respect when proxy was seen last time and how much data is still unsent.

For those who wants nodata() to work as before I would introduce second optional parameter "strict". In this case, for example, nodata(10m, strict) will report about missing data regardless of proxy availability immediately if there is no data for the last 10 minutes.

What do you think? Does it makes sense? If yes, we will try to fit this solution into Zabbix 5.0.

Comment by Ronald Rood [ 2020 Feb 10 ]

It would be great if it also takes the queue into account so yes, this looks very promising.

Comment by Marco Hofmann [ 2020 Feb 10 ]

We had some internal discussion after your question, if adjustments to the way nodata works are sufficient. And at first I thought there are several cases where this is not sufficient, but as it turns out, when you think about the fundamental case of "proxy down", the only thing that bugs us, are the several false positive zabbix agent nodata trigger, that disguise the real problem. Cases I thought of:

  • Zabbix Proxy down -> Several Windows Server / Linux Server Zabbix agents report nodata -> This would be solved by your suggestion, especially without any changes to our configuration if I get you correctly.
  • Several Windows Server / Linux Server down results in double Trigger ICMP check failed & agent down -> Not relevant to Zabbix proxy, a normal Trigger dependency is sufficient.
  • A net.tcp.service check from Zabbix Server to a HTTPS service, hosted in the location of the Zabbix Proxy.
    The Zabbix Proxy goes down -> Check still does work.
    The Internet connection is down, check does not work, which is true.
    This is no case relevant to Zabbix Proxy.
  • And then there is the infamous example of a Switch or Hypervisor going down, which results in several sub-systems going down, like several physical or virtual Windows Server / Linux Server. Even if the Zabbix Proxy is up/or down, this doesn't trigger this case here, but ZBXNEXT-46

So after all, regarding this very ZBXNEXT, I can yet so far only think about nodata regarding a Zabbix Proxy outage.

AFAIK there was already a way to configure @Alexei suggestion by something like this:

{Template Module Zabbix agent 4.4:zabbix[host,agent,available].max({$AGENT.TIMEOUT})}=0 and {Template App Zabbix Proxy 4.4:zabbix[proxy,{HOST.HOST},lastaccess].fuzzytime({$AGENT.TIMEOUT})}=1

But if I remember correctly, this had one HUGE drawback. As soon as the Proxy would come back online, the "lastaccess" timestamp would recover BEFORE the "nodata" Trigger and therefore fire each nodata trigger immediatly after the Proxy sends a hearbeat, which will be active until the Agents report back online. This has to be considered if nodata gets this new feature.

Comment by Alexei Vladishev [ 2020 Feb 10 ]

starko, please read carefully what I wrote above. Nodata() will take into account "proxy delay", in other words if there is unsent data in proxy buffer for certain period of time, this time period will be used in calculations of nodata(). Therefore when proxy comes back nodata triggers will not be fired up immediately.

Also, older internal check zabbix[proxy,"proxy name",lastaccess] and new zabbix[proxy,"proxy name", delay] (returning for how long we have unsent history in seconds) could be used to report about proxy-side performance and availability issues.

Comment by Filipe Paternot [ 2020 Feb 10 ]

As per Alexei and Marco discussion above, I'd say it has got everything to work right.

When a proxy goes down the only issue is about nodata triggers (the other triggers remain "paused" since there is no update), and the change affect just that. If that is the new default behavior, its even better. Having nodata(time, "lazy") as the new default and nodata(time, strict) as suggested makes a lot of sense to me.

The addition of zabbixi[proxy,"proxy name", delay] is a nice outcome here too.

Comment by Alexei Vladishev [ 2020 Feb 12 ]

Thanks for the feedback! All right, I am including it into Zabbix 5.0 roadmap, should be implemented in one of alpha releases.

Comment by Marco Hofmann [ 2020 Feb 20 ]

Will the specs be attached to this ZBXNEXT when they are ready?

Comment by Alexei Vladishev [ 2020 Feb 20 ]

starko, high level specification is attached.

Comment by dimir [ 2020 Feb 20 ]

Regarding the attached document, it wasn't clear (at least to me) what exactly is the proxy_delay so I'll try to explain it here.

proxy_delay is the new value that is planned to be introduced and it will be calculated and sent by proxy to server during normal data exchange. The server will keep that value in its cache and update it every time a new value is received from the proxy. How is it calculated on a proxy side?

A proxy is writing collected values to proxy_history table to be sent later to the server. However, we want to know how much time it takes for the written value to be picked up for sending to server. That time is the proxy_delay and in order to know it we need to introduce a new field in proxy_history table (the field name still discussed but I personally like write_clock). This will be stored for each value.

So before sending another collection of values proxy will look for the unsent ones in a history table, select the oldest write_clock and send the difference between that and current time as proxy_delay (single value in communication data).

Comment by Michael Veksler [ 2020 Apr 24 ]

Available in:

Documentation updated:

Comment by Joel Lord [ 2020 Aug 18 ]

I was recently tasked with two things: upgrade our zabbix infrastructure to 5.0 (done last week) and figure out how to set up trigger dependencies so that when we lose contact with a proxy we don't get mailbombed with alerts.  Then I found this and thought I was going to have an easy day.  To test I stopped one of the proxies for 5 minutes (monitoring our development environments and conveniently at one end of an unreliable internet connection) and waited.

I received my usual heap of "Zabbix agent on <host> is unreachable for 5 minutes" alerts and associated emails, so it appears that the nodata trigger suppression did not work.

We're running master and proxy servers at 5.0.2 on FreeBSD.  Hosts are FreeBSD, Ubuntu, Windows, possibly some others I'm forgetting.

Does this depend on anything I might have missed in the upgade?  Is there a schema change to the database that I need to apply on the proxy or on the master for this to work?  I've gone over the upgrade instructions again and haven't found anything that seems to apply here.

Generated at Fri Apr 26 03:09:17 EEST 2024 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.