[ZBX-15624] Allow "system.localtime" metric be collected in active mode Created: 2019 Jan 25 Updated: 2024 Apr 10 Resolved: 2022 Jul 19 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Agent (G), Server (S) |
Affects Version/s: | 4.0.3 |
Fix Version/s: | 6.4 (plan) |
Type: | Problem report | Priority: | Trivial |
Reporter: | Constantin Oshmyan | Assignee: | Aleksejs Sestakovs |
Resolution: | Won't fix | Votes: | 16 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified |
Attachments: |
![]() ![]() ![]() ![]() |
||||||||||||||||
Issue Links: |
|
||||||||||||||||
Team: | |||||||||||||||||
Sprint: | Sprint 58 (Nov 2019), Sprint 59 (Dec 2019), Sprint 60 (Jan 2020), Sprint 61 (Feb 2020), Sprint 62 (Mar 2020), Sprint 63 (Apr 2020), Sprint 64 (May 2020), Sprint 65 (Jun 2020), Sprint 66 (Jul 2020), Sprint 67 (Aug 2020), Sprint 68 (Sep 2020), Sprint 69 (Oct 2020), Sprint 70 (Nov 2020), Sprint 71 (Dec 2020), Sprint 72 (Jan 2021), Sprint 73 (Feb 2021), Sprint 74 (Mar 2021), Sprint 75 (Apr 2021), Sprint 76 (May 2021), Sprint 77 (Jun 2021), Sprint 78 (Jul 2021), Sprint 79 (Aug 2021), Sprint 80 (Sep 2021), Sprint 81 (Oct 2021), Sprint 82 (Nov 2021), Sprint 83 (Dec 2021), Sprint 84 (Jan 2022), Sprint 85 (Feb 2022), Sprint 86 (Mar 2022), Sprint 87 (Apr 2022), Sprint 88 (May 2022), Sprint 89 (Jun 2022), Sprint 90 (Jul 2022) | ||||||||||||||||
Story Points: | 1 |
Description |
cyclone in this comment describes how Zabbix server processes the fuzzytime() function at the moment. The key moment is: the current value of item is compared with timestamp of this value. As result, it makes useless trigger expressions like "host:system.localtime.fuzzytime(SomeValue)" if this metric is collected by agent in active mode, as value and timestamp should be equal "by design" in this case. At the same time, documentation always has the following description for trigger function fuzzytime():
Key moment: "differs from the Zabbix server time" but anything else (including value's timestamp)! I believe that the behaviour of this trigger function should be modified according to documentation. In result, we could use this metric by active-mode agents, too. I understand that it could cause to some problems (for example, false positives if some values have been buffered and delayed by active-mode agents and intermediate proxies), but these problems much more predictable and understandable than the current behaviour. The possibility to monitor time deviation using the standard active-mode agent is another very important reason. |
Comments |
Comment by Constantin Oshmyan [ 2019 Jan 25 ] | ||||||||||||
Sorry, I mean this comment. <dimir> Fixed directly in description. | ||||||||||||
Comment by Glebs Ivanovskis [ 2019 Jan 26 ] | ||||||||||||
I was thinking about that issue a bit and came to conclusion that Zabbix can provide an internal item which would return an estimated difference between Zabbix server and the host. Under the hood there can be different mechanisms. For agent with passive checks this could be the good old system.localtime, but for active only agents time difference can be estimated using timestamp agent sends with collected data. | ||||||||||||
Comment by Constantin Oshmyan [ 2019 Jan 28 ] | ||||||||||||
cyclone, thank you for your comment. Unfortunately, I don't quite understand your vision at the moment. Could you describe your proposal in more detail, please? At the same time, I see that this ZBX has been moved onto different project (ZBXNEXT). | ||||||||||||
Comment by Arturs Lontons [ 2019 Jan 28 ] | ||||||||||||
Hi, You're correct - this is an erroneous behavior by the fuzzytime function. Moved the issue back to bugs. | ||||||||||||
Comment by Constantin Oshmyan [ 2019 Jan 28 ] | ||||||||||||
Thank you, guys! | ||||||||||||
Comment by Glebs Ivanovskis [ 2019 Jan 30 ] | ||||||||||||
What I have in mind is similar to zabbix[host,agent,available] which gives you availability of Zabbix agent on the host without the need to actually use agent.ping. It uses data obtained during other passive checks to give you the answer straight away. New item would be called zabbix[host,,localtime] or similar and it would be able to replace system.localtime in fuzzytime() triggers no matter what kind of items this host has (active or passive). Zabbix has all the information needed to give this information. | ||||||||||||
Comment by Constantin Oshmyan [ 2019 Jan 31 ] | ||||||||||||
Again moved to ZBXNEXT? | ||||||||||||
Comment by dimir [ 2019 Jan 31 ] | ||||||||||||
Let me try to summarize: Trigger function fuzzytime() when used with system.localtime as active check became useless after - Thus, It looks to me that this issue asks for fixing a regression introduced in
| ||||||||||||
Comment by dimir [ 2019 Jan 31 ] | ||||||||||||
dotneft if you agree, please move this issue back to "ZABBIX BUGS AND ISSUES (ZBX)". | ||||||||||||
Comment by Constantin Oshmyan [ 2019 Feb 08 ] | ||||||||||||
Sorry, what is the current state of this issue? | ||||||||||||
Comment by dimir [ 2019 Feb 08 ] | ||||||||||||
This issue needs more votes to get attention. | ||||||||||||
Comment by Ingus Vilnis [ 2019 Feb 11 ] | ||||||||||||
This is a regression from | ||||||||||||
Comment by dimir [ 2019 Feb 11 ] | ||||||||||||
Thank you, ingus.vilnis, for thinking that I decide here something at all. But nevertheless, trying to push this issue forward this is what I was told, and yes, I mentioned it is a regression. | ||||||||||||
Comment by Ingus Vilnis [ 2019 Feb 11 ] | ||||||||||||
Dimir, I know and that was not what I meant. And I really appreciate every input you give to the community both here and in forum. I am just disappointed seeing the way how such issues are treated here lately. Collecting votes for fixing a regression... Therefore if those are Zabbix rules now then could someone explain them - are devs fixing only top X voted issues or any above Y votes or else. But leaving this aside, how much effort does it take to fix the system.localtime active check problem? Is that a big rework now? | ||||||||||||
Comment by dimir [ 2019 Feb 11 ] | ||||||||||||
I've quickly checked the code and it looks to me that this is not a "one-liner" fix. All active agent values come with the value timestamp. In case of system.localtime the value timestamp becomes useless. So how to approach fixing that?
I see solution 3. as the only acceptable one. Providing the new item and documenting the way fuzzytime() should be used. And instructions for users that suffer from this regression on how to redesign their trigger expressions. | ||||||||||||
Comment by Constantin Oshmyan [ 2019 Feb 11 ] | ||||||||||||
dimir, thank you for a status update and your comments.
1) this way will restore functionality of pair "system.localtime" – "fuzzytime()" for active-mode agents;
From my point of view, the best way is the combination of (2) (just now, ASAP) and (3) (in future, some time). | ||||||||||||
Comment by Glebs Ivanovskis [ 2019 Feb 14 ] | ||||||||||||
4. system.localtime executed by active agent can connect to server/proxy and query its local time, this will be symmetric to system.localtime as passive check. But this will require server/proxy to support new kind of request, therefore fix can be implemented only in major release. Is system.localtime such an important item? How often do fuzzytime() trigger fire in real life? Maybe I'm spoiled, but I have not ever seen a significant clock deviation on a machine connected to the Internet with NTP service running. Usually it's under one second and hence cannot be detected by system.localtime + fuzzytime(). | ||||||||||||
Comment by Glebs Ivanovskis [ 2019 Feb 14 ] | ||||||||||||
Speaking of fuzzytime() documentation and "differs from the Zabbix server time", same applies to date():
and time():
Date and time won't be "current" if evaluated for a value with old timestamp (e.g. sent via zabbix_sender -T). | ||||||||||||
Comment by richlv [ 2019 Feb 14 ] | ||||||||||||
system.localtime is a very good "safety net" check. If something did not alert about an NTP-related problem, checking for a significant time deviation will at least catch it later. Has happened/helped way too many times | ||||||||||||
Comment by Glebs Ivanovskis [ 2019 Feb 14 ] | ||||||||||||
What do you usually put into fuzzytime() parameter? | ||||||||||||
Comment by richlv [ 2019 Feb 15 ] | ||||||||||||
Usually checking for 10-30 seconds off - as mentioned, a safety net for something going pretty wrong on more sensitive checks. | ||||||||||||
Comment by Ingus Vilnis [ 2019 Feb 15 ] | ||||||||||||
Agree with rich. 30 seconds usually. I remember a case when local NTP server went nuts and suddenly was 40 minutes off as were all the servers relying on it. Luckily that Zabbix was using a different NTP server and could alert. | ||||||||||||
Comment by dimir [ 2019 Feb 15 ] | ||||||||||||
Question to cyclone. You say new item zabbix[host,,localtime] "would be able to replace system.localtime in fuzzytime() triggers no matter what kind of items this host has (active or passive). Zabbix has all the information needed to give this information". Isn't server missing the needed information (agent localtime) in case of passive checks only? Do you mean in this case server would connect to agent and request localtime? | ||||||||||||
Comment by Constantin Oshmyan [ 2019 Feb 15 ] | ||||||||||||
In my practice this trigger really helped me several times. Typical use case: wrong timezone settings; it is sometimes imperceptibly as local time seems to be OK, but UNIX timestamp differs 2-3 hours. Some examples:
NTP excellently processes small time differences (several seconds, even several minutes). However, if time gap is big enough (exact threshold depends on settings, but typically – about 15 – 20 minutes), an NTP client believe that "the NTP server has problem", "it is unreliable", as result – time synchronization is turned off completely. Unfortunately, without monitoring this state could be non-discovered a long time.
10-15 minutes for the system.localtime. As written above, smaller time drift could be successfully corrected by NTP; but if time difference is bigger – it requires an attention. | ||||||||||||
Comment by Glebs Ivanovskis [ 2019 Feb 16 ] | ||||||||||||
Answer to dimir.
Yes, I mean that server can do system.localtime check if needed. Just like it reuses system.run for remote commands in action operations. | ||||||||||||
Comment by Glebs Ivanovskis [ 2019 Feb 16 ] | ||||||||||||
Thank you for your stories! They are supporting my belief that you don't actually need to know system time on a host, you just need to know if it is completely out of hand. system.localtime + fuzzytime() are used as a combo and should be replaced with a different combo, sort of: {host:is.system.clock.completely.out.of.sync[30s].last()}=1 In this case Zabbix does not have to measure system.localtime precisely, which is the main challenge of current system.localtime + fuzzytime() design. | ||||||||||||
Comment by Jonybat [ 2019 Sep 16 ] | ||||||||||||
Is something like this in the plan for 4.4? Another use case to have in mind, monitoring the clock of a zabbix proxy host that has no way of syncing with NTP source. In this case, not even passive checks work, because the timestamp of the system.localtime item is set by the proxy itself. If the proxy host's agent has no way to communicate with the zabbix server, it is not even possible to work around it with a duplicate host monitored by server. I vote for the zabbix[host,,localtime] that was suggested before, if the item would support both being monitored by server or proxy, just like the current internal check items. | ||||||||||||
Comment by Glebs Ivanovskis [ 2019 Sep 16 ] | ||||||||||||
Do I live in 21st century?! | ||||||||||||
Comment by Jonybat [ 2019 Sep 17 ] | ||||||||||||
Yes, in the 21st century there are still organizations with really strict security requirements or huge inertia to have some network changes done. So yea, that could be a work around, but unfortunately not one that works in any of my scenarios. | ||||||||||||
Comment by Glebs Ivanovskis [ 2019 Sep 24 ] | ||||||||||||
There was an independent discussion of the same issue in
| ||||||||||||
Comment by Vjacheslav Ryzhevskiy [ 2019 Sep 26 ] | ||||||||||||
There is other usecase when this function is usefull. When I want to check time difference on zabbix proxy. Proxy must be monitored by itself. So any passive check has proxy timestamp. So fuzzytime will return 1 for system.localtime on this proxy, and trigger will not fire up if proxy time is different from server time | ||||||||||||
Comment by Constantin Oshmyan [ 2019 Oct 14 ] | ||||||||||||
Sorry, what is the current status of this issue? Thanks in advance! | ||||||||||||
Comment by Andrejs Tumilovics [ 2019 Nov 11 ] | ||||||||||||
Mini specificationSummarysystem.localtime key timestamp should not be altered for active checks. Such solution is more like a hack. To calculate time difference between server and agent, we may:
Periodic requests will produce extra load to server. Moreover, we already have periodic active check synchronization requests, which could do the stuff. Simple time difference calculation { "request": "active checks", "host": "<host name>" } Active agent configuration data response (added "clock" and "ns"): { "response": "success", "data": [{ "key": "system.localtime[diff]", "delay": 5, "lastlogsize": 0, "mtime": 0 } ], "clock": 1573467492, "ns": 78211329 } + fuzzytime() trigger function is not needed any more. - This solution does not take into account transmission dealys. To address transmission delays (which could be up to Timeout), we can apply below time correction trick. Corrected: { "request": "active checks", "host": "<host name>", "clock": 1573467492, "ns": 78211329 } Active agent configuration data response (added "clock", "ns" and "timediff"): { "response": "success", "data": [{ "key": "system.localtime[diff]", "delay": 5, "lastlogsize": 0, "mtime": 0 } ], "clock": 1573467492, "ns": 78211329, "timediff": 12345 } + fuzzytime() trigger function is not needed any more. - Transmission delay correction might be inaccurate. Backward compatibilityOlder versions of agent ignore time fields in active agent configuration data response, so, existing functionality is not affected. Acceptance criteria
What's affected
Documentation changes
Use casesCase 1
Observations:
Sign off:
| ||||||||||||
Comment by Constantin Oshmyan [ 2019 Nov 11 ] | ||||||||||||
Idea seems good, but some moments unclear for me:
| ||||||||||||
Comment by Andrejs Tumilovics [ 2019 Nov 11 ] | ||||||||||||
how should it work for a passive mode (what is an algorithm?)
whether new metric system.localtime[diff] could be negative (or is it an absolute value only) what is Tagent_diff on the corrected.png picture how to manage time differencies in both directions for all scenarios (active/passive modes, time on agent is too fast or too slow) | ||||||||||||
Comment by Constantin Oshmyan [ 2019 Nov 11 ] | ||||||||||||
Yes, I understand that. I'm trying to point a week places of this proposal. At the moment I can see at least the following:
| ||||||||||||
Comment by Andrejs Tumilovics [ 2019 Nov 15 ] | ||||||||||||
This is a good point about passive - only configuration.
So, to make fuzzytime working, we need to modify timestamp in active mode. | ||||||||||||
Comment by Constantin Oshmyan [ 2019 Nov 15 ] | ||||||||||||
There is a third way: modify an implementation of fuzzytime() trigger function according to its description, as it was proposed initially during this ticket opening However, if it's too troublesome for any reason, about the same effect could be reached by timestamp correction for the values obtained via active-mode agent. My comments here are only following:
| ||||||||||||
Comment by Andrejs Tumilovics [ 2019 Nov 28 ] | ||||||||||||
There is no good solution for making sytem.localtime with fuzzytime() to correctly reflect time difference in active mode. So, we came up with proposal to introduce a new item system.timediff[<NTP server address>]. Please share your vision regarding new item proposal. | ||||||||||||
Comment by dimir [ 2019 Nov 28 ] | ||||||||||||
Probably with 2 parameters: [<NTP server address>, <NTP server port>]. | ||||||||||||
Comment by Jonybat [ 2019 Nov 28 ] | ||||||||||||
I'll take whatever you have to get some sort of time/clock monitoring in active mode. The timestamp correction in 3.x worked well enough for me, so if this is more reliable, even better. But see my concern in comment-368771 | ||||||||||||
Comment by dimir [ 2019 Nov 28 ] | ||||||||||||
Thanks for the feedback, jonybat. Just to summarize, the difference in setting up monitoring of agent time when in active mode:
New item will be available for both passive/active agents. | ||||||||||||
Comment by Andrejs Tumilovics [ 2019 Dec 10 ] | ||||||||||||
constantin.oshmyan could you please share your opinion on above proposal. | ||||||||||||
Comment by Constantin Oshmyan [ 2019 Dec 10 ] | ||||||||||||
atumilovics, thank you for your efforts in advancement of this issue. My main opinion stay the same, as it was expressed here and here.
In this case, as I see, this bug has been "solved" in another way: just documented ("documented bug is a feature"). It disappoints me a bit. At the same time, some discussion is still exist, it's great
So, if it is possible to have some another solution, simpler for deployment, it could be better. | ||||||||||||
Comment by Andrejs Tumilovics [ 2020 Jan 23 ] | ||||||||||||
Fuzzytime algorithmData collection Preprocessing Trigger evaluation Fuzzytime recalculate_triggers() DCconfig_get_triggers_by_itemids() // most recent item ts from trigger expression evaluate_expressions() substitute_functions() zbx_populate_function_items() // func.timespec = trigger.timespec zbx_evaluate_item_functions() evaluate_function() // ts = func->timespec evaluate_FUZZYTIME() // diff(metric.value - metric.ts) calculation zbx_substitute_functions_results() evaluate() // evaluate trigger result raise PROBLEM or resolve How we can calculate host time difference without using NTP.Note that below solutions do not allow to detect network packet latency. Solutions:1) Hack system.localtime (in active mode) on agent side in such a way that it will not have "clock" field in "agent data" packet. So, server/proxy will put his timestamp on this metric. As a result, system.localtime will always have time stamp of server/proxy.
Cons:
2) Override system.localtime time stamp on server/proxy, when active "agent data" is received.
3) Internal check Glebs' proposed solution
4) Similar to above solution, we may add new metric (system.timediff) which will store time difference between server/proxy and agent. It will not need fuzzytime() trigger function.
Cons:
ConclusionAt the moment there is no clean solution for this issue.
|