[ZBX-15624] Allow "system.localtime" metric be collected in active mode Created: 2019 Jan 25  Updated: 2024 Apr 10  Resolved: 2022 Jul 19

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Agent (G), Server (S)
Affects Version/s: 4.0.3
Fix Version/s: 6.4 (plan)

Type: Problem report Priority: Trivial
Reporter: Constantin Oshmyan Assignee: Aleksejs Sestakovs
Resolution: Won't fix Votes: 16
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File corrected.png     PNG File simple.png     PNG File timestamp_correction.png     XML File zbx_export_hosts.xml    
Issue Links:
Duplicate
is duplicated by ZBX-16659 broken system.localtime in active mod... Closed
Sub-task
part of ZBX-4500 "fuzzytime" should compare with the t... Open
Team: Team B
Sprint: Sprint 58 (Nov 2019), Sprint 59 (Dec 2019), Sprint 60 (Jan 2020), Sprint 61 (Feb 2020), Sprint 62 (Mar 2020), Sprint 63 (Apr 2020), Sprint 64 (May 2020), Sprint 65 (Jun 2020), Sprint 66 (Jul 2020), Sprint 67 (Aug 2020), Sprint 68 (Sep 2020), Sprint 69 (Oct 2020), Sprint 70 (Nov 2020), Sprint 71 (Dec 2020), Sprint 72 (Jan 2021), Sprint 73 (Feb 2021), Sprint 74 (Mar 2021), Sprint 75 (Apr 2021), Sprint 76 (May 2021), Sprint 77 (Jun 2021), Sprint 78 (Jul 2021), Sprint 79 (Aug 2021), Sprint 80 (Sep 2021), Sprint 81 (Oct 2021), Sprint 82 (Nov 2021), Sprint 83 (Dec 2021), Sprint 84 (Jan 2022), Sprint 85 (Feb 2022), Sprint 86 (Mar 2022), Sprint 87 (Apr 2022), Sprint 88 (May 2022), Sprint 89 (Jun 2022), Sprint 90 (Jul 2022)
Story Points: 1

 Description   

cyclone in this comment describes how Zabbix server processes the fuzzytime() function at the moment. The key moment is: the current value of item is compared with timestamp of this value. As result, it makes useless trigger expressions like "host:system.localtime.fuzzytime(SomeValue)" if this metric is collected by agent in active mode, as value and timestamp should be equal "by design" in this case.

At the same time, documentation always has the following description for trigger function fuzzytime():

Checking how much an item value (as timestamp) differs from the Zabbix server time.

Key moment: "differs from the Zabbix server time" but anything else (including value's timestamp)!

I believe that the behaviour of this trigger function should be modified according to documentation. In result, we could use this metric by active-mode agents, too.

I understand that it could cause to some problems (for example, false positives if some values have been buffered and delayed by active-mode agents and intermediate proxies), but these problems much more predictable and understandable than the current behaviour. The possibility to monitor time deviation using the standard active-mode agent is another very important reason.



 Comments   
Comment by Constantin Oshmyan [ 2019 Jan 25 ]

Sorry, I mean this comment.

<dimir> Fixed directly in description.

Comment by Glebs Ivanovskis [ 2019 Jan 26 ]

I was thinking about that issue a bit and came to conclusion that Zabbix can provide an internal item which would return an estimated difference between Zabbix server and the host. Under the hood there can be different mechanisms. For agent with passive checks this could be the good old system.localtime, but for active only agents time difference can be estimated using timestamp agent sends with collected data.

Comment by Constantin Oshmyan [ 2019 Jan 28 ]

cyclone, thank you for your comment. Unfortunately, I don't quite understand your vision at the moment. Could you describe your proposal in more detail, please?

At the same time, I see that this ZBX has been moved onto different project (ZBXNEXT).
However, I believe that difference between documentation and real behaviour is a bug that should be fixed.
Probably, developing some new internal item could be useful. Maybe. In this case, it will be really new feature request (for ZBXNEXT).
In any case, fixing wrong behaviour (discrepancy with documentation) should be fixed, independently of new functionality.

Comment by Arturs Lontons [ 2019 Jan 28 ]

Hi,
Thanks for reporting the issue.

You're correct - this is an erroneous behavior by the fuzzytime function. Moved the issue back to bugs. 
Meanwhile, the suggestion that Glebs mentioned could be implemented via a ZBXNEXT feature request.

Comment by Constantin Oshmyan [ 2019 Jan 28 ]

Thank you, guys!
If there will be some feature request in ZBXNEXT about this topic – link it, please, to this issue also.

Comment by Glebs Ivanovskis [ 2019 Jan 30 ]

What I have in mind is similar to zabbix[host,agent,available] which gives you availability of Zabbix agent on the host without the need to actually use agent.ping. It uses data obtained during other passive checks to give you the answer straight away. New item would be called zabbix[host,,localtime] or similar and it would be able to replace system.localtime in fuzzytime() triggers no matter what kind of items this host has (active or passive). Zabbix has all the information needed to give this information.

Comment by Constantin Oshmyan [ 2019 Jan 31 ]

At the same time, I see that this ZBX has been moved onto different project (ZBXNEXT).

You're correct - this is an erroneous behavior by the fuzzytime function. Moved the issue back to bugs.

Again moved to ZBXNEXT?

Comment by dimir [ 2019 Jan 31 ]

Let me try to summarize:

Trigger function fuzzytime() when used with system.localtime as active check became useless after -ZBX-12957-. The reason being it would always return 0.

Thus, It looks to me that this issue asks for fixing a regression introduced in ZBX-12957: bring back comparison of agent time against server local time. New features, including the ones mentioned by cyclone, should be created as separate issues.

 

Comment by dimir [ 2019 Jan 31 ]

dotneft if you agree, please move this issue back to "ZABBIX BUGS AND ISSUES (ZBX)".

Comment by Constantin Oshmyan [ 2019 Feb 08 ]

Sorry, what is the current state of this issue?

Comment by dimir [ 2019 Feb 08 ]

This issue needs more votes to get attention.

Comment by Ingus Vilnis [ 2019 Feb 11 ]

This is a regression from ZBX-12957. Dimir, why do you suggest voting in this case at all? But if you do, please tell how many votes are then needed to get any attention?

Comment by dimir [ 2019 Feb 11 ]

Thank you, ingus.vilnis, for thinking that I decide here something at all. But nevertheless, trying to push this issue forward this is what I was told, and yes, I mentioned it is a regression.

Comment by Ingus Vilnis [ 2019 Feb 11 ]

Dimir, I know and that was not what I meant. And I really appreciate every input you give to the community both here and in forum. 

I am just disappointed seeing the way how such issues are treated here lately. Collecting votes for fixing a regression... Therefore if those are Zabbix rules now then could someone explain them - are devs fixing only top X voted issues or any above Y votes or else. 

But leaving this aside, how much effort does it take to fix the system.localtime active check problem? Is that a big rework now?

Comment by dimir [ 2019 Feb 11 ]

I've quickly checked the code and it looks to me that this is not a "one-liner" fix. All active agent values come with the value timestamp. In case of system.localtime the value timestamp becomes useless. So how to approach fixing that?

  1. Override the value timestamp on the server side? The easiest, but it's a hack.
  2. Modify fuzzytime() function to use server time instead of value timestamp when dealing with active item? Bit more complicated, but still a hack, plus it won't give real difference.
  3. Quoting cyclone above "New item would be called zabbix[host,,localtime] or similar and it would be able to replace system.localtime in fuzzytime() triggers no matter what kind of items this host has (active or passive). Zabbix has all the information needed to give this information." This sounds good. But what about regression? It would not be easy to provide the upgrade patch that fixes the issue for users.

I see solution 3. as the only acceptable one. Providing the new item and documenting the way fuzzytime() should be used. And instructions for users that suffer from this regression on how to redesign their trigger expressions.

Comment by Constantin Oshmyan [ 2019 Feb 11 ]

dimir, thank you for a status update and your comments.

Modify fuzzytime() function to use server time instead of value timestamp when dealing with active item? Bit more complicated, but still a hack, plus it won't give real difference.

1) this way will restore functionality of pair "system.localtime" – "fuzzytime()" for active-mode agents;
2) it will just correspond to currently documented (a long time, by the way!) behaviour.
As to me, it is real difference.

So how to approach fixing that?

From my point of view, the best way is the combination of (2) (just now, ASAP) and (3) (in future, some time).
It will provide Zabbix users possibility to work now in their usual manner, and, at the same time, to migrate to some other methodology of time synchronization monitoring when: a) this methodology is ready; b) users are ready.
Key moment: users should have choice: when and how to modify their environment.

Comment by Glebs Ivanovskis [ 2019 Feb 14 ]

4. system.localtime executed by active agent can connect to server/proxy and query its local time, this will be symmetric to system.localtime as passive check. But this will require server/proxy to support new kind of request, therefore fix can be implemented only in major release.

Is system.localtime such an important item? How often do fuzzytime() trigger fire in real life? Maybe I'm spoiled, but I have not ever seen a significant clock deviation on a machine connected to the Internet with NTP service running. Usually it's under one second and hence cannot be detected by system.localtime + fuzzytime().

Comment by Glebs Ivanovskis [ 2019 Feb 14 ]

Speaking of fuzzytime() documentation and "differs from the Zabbix server time", same applies to date():

Current date in YYYYMMDD format.

and time():

Current time in HHMMSS format.

Date and time won't be "current" if evaluated for a value with old timestamp (e.g. sent via zabbix_sender -T).

Comment by richlv [ 2019 Feb 14 ]

system.localtime is a very good "safety net" check. If something did not alert about an NTP-related problem, checking for a significant time deviation will at least catch it later. Has happened/helped way too many times

Comment by Glebs Ivanovskis [ 2019 Feb 14 ]

What do you usually put into fuzzytime() parameter?

Comment by richlv [ 2019 Feb 15 ]

Usually checking for 10-30 seconds off - as mentioned, a safety net for something going pretty wrong on more sensitive checks.

Comment by Ingus Vilnis [ 2019 Feb 15 ]

Agree with rich. 30 seconds usually. I remember a case when local NTP server went nuts and suddenly was 40 minutes off as were all the servers relying on it. Luckily that Zabbix was using a different NTP server and could alert.

Comment by dimir [ 2019 Feb 15 ]

Question to cyclone. You say new item zabbix[host,,localtime] "would be able to replace system.localtime in fuzzytime() triggers no matter what kind of items this host has (active or passive). Zabbix has all the information needed to give this information". Isn't server missing the needed information (agent localtime) in case of passive checks only? Do you mean in this case server would connect to agent and request localtime?

Comment by Constantin Oshmyan [ 2019 Feb 15 ]

Is system.localtime such an important item? How often do fuzzytime() trigger fire in real life?

In my practice this trigger really helped me several times. Typical use case: wrong timezone settings; it is sometimes imperceptibly as local time seems to be OK, but UNIX timestamp differs 2-3 hours.

Some examples:

  • Workstation really had wrong timezone. It looked like correct local time, but some monitoring parameters (like vfs.file.time[someFile] returning UNIX timestamp) in fact did not work as expected.
  • Server's hardware clock was set as "Local time", OS' settings accordingly (HW clock is "local time"). It works successfully several months; but after some maintenance (OS patching) server reboots and suddenly OS time differs one hour. Cause: DST state has been changed, but server's BIOS does not support DST differences. Monitoring allows to reveal this problem ASAP. (Yes, I agree that it is bad practice: servers should use UTC for their HW clock to avoid this problem, but how to ensure that if you are not informed?)
  • Virtual server is located on VMware infrastructure. In some cases (cloning, transferring VM onto another hypervisor, updating virtual hardware version) it was occurred that VM's BIOS settings were reset, and "hardware" clock of this VM was set to local time instead of UTC.

Maybe I'm spoiled, but I have not ever seen a significant clock deviation on a machine connected to the Internet with NTP service running.

NTP excellently processes small time differences (several seconds, even several minutes). However, if time gap is big enough (exact threshold depends on settings, but typically – about 15 – 20 minutes), an NTP client believe that "the NTP server has problem", "it is unreliable", as result – time synchronization is turned off completely. Unfortunately, without monitoring this state could be non-discovered a long time.

What do you usually put into fuzzytime() parameter?

10-15 minutes for the system.localtime. As written above, smaller time drift could be successfully corrected by NTP; but if time difference is bigger – it requires an attention.
Therefore, for me, small delays (due to active mode agent's buffering or intermediate proxy) are not critical.

Comment by Glebs Ivanovskis [ 2019 Feb 16 ]

Answer to dimir.

Do you mean in this case server would connect to agent and request localtime?

Yes, I mean that server can do system.localtime check if needed. Just like it reuses system.run for remote commands in action operations.

Comment by Glebs Ivanovskis [ 2019 Feb 16 ]

Thank you for your stories! They are supporting my belief that you don't actually need to know system time on a host, you just need to know if it is completely out of hand. system.localtime + fuzzytime() are used as a combo and should be replaced with a different combo, sort of:

{host:is.system.clock.completely.out.of.sync[30s].last()}=1

In this case Zabbix does not have to measure system.localtime precisely, which is the main challenge of current system.localtime + fuzzytime() design.

Comment by Jonybat [ 2019 Sep 16 ]

Is something like this in the plan for 4.4?

Another use case to have in mind, monitoring the clock of a zabbix proxy host that has no way of syncing with NTP source. In this case, not even passive checks work, because the timestamp of the system.localtime item is set by the proxy itself. If the proxy host's agent has no way to communicate with the zabbix server, it is not even possible to work around it with a duplicate host monitored by server.

I vote for the zabbix[host,,localtime] that was suggested before, if the item would support both being monitored by server or proxy, just like the current internal check items.

Comment by Glebs Ivanovskis [ 2019 Sep 16 ]

Another use case to have in mind, monitoring the clock of a zabbix proxy host that has no way of syncing with NTP source.

Do I live in 21‍st century?! jonybat, can you run your own NTP server on Zabbix server? You can also install Zabbix agent on Zabbix server and monitor it using passive checks from proxy. This will give you the same difference, but in the opposite direction.

Comment by Jonybat [ 2019 Sep 17 ]

Yes, in the 21st century there are still organizations with really strict security requirements or huge inertia to have some network changes done. So yea, that could be a work around, but unfortunately not one that works in any of my scenarios.

Comment by Glebs Ivanovskis [ 2019 Sep 24 ]

There was an independent discussion of the same issue in ZBX-16659 and here are few points taken from there:

  • system.localtime in active mode is useless and even worse — it is silently useless!
  • It is never too late to make an upgrade patch converting all active system.localtime items to passive mode or disabling them completely.
  • API/frontend should prohibit active system.localtime items.
  • system.localtime should become notsupported if agent is asked to execute it as active check (although this won't be as effective because breaking changes were on server/proxy side and fix will only apply to new agents).
Comment by Vjacheslav Ryzhevskiy [ 2019 Sep 26 ]

There is other usecase when this function is usefull. When I want to check time difference on zabbix proxy. Proxy must be monitored by itself. So any passive check has proxy timestamp. So fuzzytime will return 1 for system.localtime on this proxy, and trigger will not fire up if proxy time is different from server time

Comment by Constantin Oshmyan [ 2019 Oct 14 ]

Sorry, what is the current status of this issue? Thanks in advance!

Comment by Andrejs Tumilovics [ 2019 Nov 11 ]

Mini specification

Summary

system.localtime key timestamp should not be altered for active checks. Such solution is more like a hack.
The idea is to extend system.localtime <type> argument with "diff" option: system.localtime[diff]. New option will return absolute time difference between server and agent in seconds. So, returned value may be used without fuzzytime() trigger function.

To calculate time difference between server and agent, we may:

  • periodically request time from server;
  • extend active agent configuration data response with time fields ("clock" and "ns").

Periodic requests will produce extra load to server. Moreover, we already have periodic active check synchronization requests, which could do the stuff.

Simple time difference calculation

Active agent configuration data request:

{
    "request": "active checks",
    "host": "<host name>"
}

Active agent configuration data response (added "clock" and "ns"):

{
    "response": "success",
    "data": [{
            "key": "system.localtime[diff]",
            "delay": 5,
            "lastlogsize": 0,
            "mtime": 0
        }
    ],
    "clock": 1573467492,
	"ns": 78211329
}

+ fuzzytime() trigger function is not needed any more.

- This solution does not take into account transmission dealys.

To address transmission delays (which could be up to Timeout), we can apply below time correction trick.
However, this require both active agent configuration data request and response change, which is probably not a good option.

Corrected:

Active agent configuration data request (added "clock" and "ns"):

{
    "request": "active checks",
    "host": "<host name>",
    "clock": 1573467492,
	"ns": 78211329
}

Active agent configuration data response (added "clock", "ns" and "timediff"):

{
    "response": "success",
    "data": [{
            "key": "system.localtime[diff]",
            "delay": 5,
            "lastlogsize": 0,
            "mtime": 0
        }
    ],
    "clock": 1573467492,
	"ns": 78211329,
	"timediff": 12345
}

+ fuzzytime() trigger function is not needed any more.
+ Transmission dealy correction applied.

- Transmission delay correction might be inaccurate.

Backward compatibility

Older versions of agent ignore time fields in active agent configuration data response, so, existing functionality is not affected.

Acceptance criteria

  • system.localtime[diff] check returns time difference between server and agent in both active and passive modes.
  • Documentation updated.

What's affected

  • Zabbix server - active agent configuration data response extended with "clock" and "ns" elements.
  • Zabbix agent - system.localtime[diff] key support added.
  • Zabbix agent2 - system.localtime[diff] key support added.

Documentation changes

Use cases

Case 1

  • Run zabbix agent on VM (Host-only Adapter: 192.168.56.102);
    • zabbix_agent.conf: Hostname=<host name>
    • zabbix_agent.conf: ServerActive=192.168.56.1
    • Set system time: hwclock --set --date="2019-11-11 09:11:15" --utc
  • Configure system.localtime active check and trigger {<host name>:system.localtime[diff].last()}<30
  • Configure system.localtime passive check and trigger {<host name>:system.localtime[diff].last()}<60

Observations:

  • Problem is raised for both passive and active items.

Sign off:

Comment by Constantin Oshmyan [ 2019 Nov 11 ]

Idea seems good, but some moments unclear for me:

  • how should it work for a passive mode (what is an algorithm?);
  • whether new metric system.localtime[diff] could be negative (or is it an absolute value only);
  • what is Tagent_diff on the corrected.png picture;
  • how to manage time differencies in both directions for all scenarios (active/passive modes, time on agent is too fast ot too slow).
Comment by Andrejs Tumilovics [ 2019 Nov 11 ]

constantin.oshmyan

how should it work for a passive mode (what is an algorithm?)
For system.localtime[diff] there is no difference wether it's fetched in active or passive mode.
Algorithm is following:

  • Agent periodically asks server for active checks configuration.
  • Along with active checks config. server reports it's timestamp.
  • Agent calculates time difference between his local time and one received from server. Calculated difference is saved in agent (variable) until next active checks config. update.
  • This calculated value is returned in system.localtime[diff].

whether new metric system.localtime[diff] could be negative (or is it an absolute value only)
Good point! Maybe it is good to know if difference is negative. Then, we will need abs() trigger function (ZBXNEXT-3486).

what is Tagent_diff on the corrected.png picture
Tagent_diff = Tlocal - Tserver
Diagram fixed

how to manage time differencies in both directions for all scenarios (active/passive modes, time on agent is too fast or too slow)
We do not change anythig in that context. Zabbix does not serve as NTP server. But what we do provide is way how customer can detect out-of-sync hosts.

Comment by Constantin Oshmyan [ 2019 Nov 11 ]

Zabbix does not serve as NTP server.

Yes, I understand that. I'm trying to point a week places of this proposal. At the moment I can see at least the following:

  • proposed algorithm of time checks is tied to active mode of Zabbix agent. So, I don't see how can it work when Zabbix agent works in passive-only mode.
  • it is very important to handle with a sign of time difference carefully. Just an examle: lets the time on the agent is 5 seconds slow, and the network delays are very low (epsilon squared). According to the 2nd diagram, at the 00:00:00 agent sends the "get active checks" request to server (accompanying request with Tagent=00:00:00). As real time is 5 second later, the server receiving this request (at 00:00:05) calculates: Tserver_diff = Tlocal - Tagent = 00:00:05 - 00:00:00 = 5 seconds. So, it replies to agent with additional settings: Tserver=00:00:05 and Tserver_diff=5 seconds. Upon receiving this reply, the agent calculates: Tagent_diff = Tlocal - Tserver = 00:00:00 - 00:00:05 = -5 seconds, then: Tdiff = abs(Tserver_diff - Tagent_diff) * 2 = abs( (5 seconds) - (-5 seconds) ) * 2 = abs (10 seconds) * 2 = 20 seconds. So, the agent will return "20 (seconds)" as value of system.localtime[diff] metric, that is not true (should be 5). So, I mean, the proposed algorithm is not very accurate.
Comment by Andrejs Tumilovics [ 2019 Nov 15 ]

This is a good point about passive - only configuration.
It pushes me to change my mind on implementation.
Basically, our main problem is that we have host local time in both system.localtime value and timestamp in active mode, while in passive we have server time in timestamp:

  system.localtime value system.localtime timestamp
Passive mode T~agent~ T~server~
Active mode T~agent~ T~agent~

So, to make fuzzytime working, we need to modify timestamp in active mode.
This can be done in two main ways:

  1. Apply timestamp correction, specifically for system.localtime, on agent side
    1. needs time difference calculation as described above
  2. Override timestamp, specifically for system.localtime, on server side
    1. from implementation perspective it is the easiest solution
Comment by Constantin Oshmyan [ 2019 Nov 15 ]

There is a third way: modify an implementation of fuzzytime() trigger function according to its description, as it was proposed initially during this ticket opening

However, if it's too troublesome for any reason, about the same effect could be reached by timestamp correction for the values obtained via active-mode agent. My comments here are only following:
1) for the first approach (on agent side) diagram should be corrected (again problem with the sign): there should be either "Tdiff = Tserver - Tagent", or "timestamp = Tagent - Tdiff";
2) for the second approach (server-side timestamp corrections, that seems more attractive for me also) it should be taken into account that for active-mode communications:

  • timestamp is present for every collected value (time when it was collected) as well as for an entire transmitted JSON (time when it was transmitted);
  • there could be difference between these timestamps (as a collected value could be sent with some delay);
  • there could be a Zabbix proxy between Zabbix agent and Zabbix server (introducing an additional delay).
Comment by Andrejs Tumilovics [ 2019 Nov 28 ]

There is no good solution for making sytem.localtime with fuzzytime() to correctly reflect time difference in active mode.
The proper time difference tracking between server and agent requires protocol change. We cannot accept this now, because it would hit backward-compatibility.

So, we came up with proposal to introduce a new item system.timediff[<NTP server address>].
The new item will return a time difference between agent and specified NTP server.
NTP request and time difference calculation is performed on agent side.
Item is available for both active and passive modes.

Please share your vision regarding new item proposal.

Comment by dimir [ 2019 Nov 28 ]

Probably with 2 parameters: [<NTP server address>, <NTP server port>].

Comment by Jonybat [ 2019 Nov 28 ]

I'll take whatever you have to get some sort of time/clock monitoring in active mode. The timestamp correction in 3.x worked well enough for me, so if this is more reliable, even better.

But see my concern in comment-368771

Comment by dimir [ 2019 Nov 28 ]

Thanks for the feedback, jonybat. Just to summarize, the difference in setting up monitoring of agent time when in active mode:

  item trigger function time: difference between
Before system.localtime fuzzytime() ​agent and server/proxy
After system.timediff[<NTP server host>,<NTP server port>] last() agent and NTP server

New item will be available for both passive/active agents.

Comment by Andrejs Tumilovics [ 2019 Dec 10 ]

constantin.oshmyan could you please share your opinion on above proposal.

Comment by Constantin Oshmyan [ 2019 Dec 10 ]

atumilovics, thank you for your efforts in advancement of this issue.

My main opinion stay the same, as it was expressed here and here.
In summary:

  • if the bug is discovered, this bug should be fixed ASAP;
  • new functionality should be discussed in ZBXNEXT project;
  • any new functionality should allow users (admins) to use this functionality as they will be ready (both users and technology) instead of dictating to use it without alternatives.

In this case, as I see, this bug has been "solved" in another way: just documented ("documented bug is a feature"). It disappoints me a bit.
Instead of fixing the bug, we are discussing an alternatives (what exactly new funcionality could replace our current configuration). It also disappoints me a bit.

At the same time, some discussion is still exist, it's great
The last proposal seems to be attractive; but I'm disturbed that it is a new functionality that will require:

  • existance of NTP server accessible by all clients;
  • new version of Zabbix client deployed on all hosts;
  • configuration changes (and, for a complex networks – setting up: what clients should ask what exactly NTP-servers).

So, if it is possible to have some another solution, simpler for deployment, it could be better.
For example, something like zabbix[host,,localtime] (mentioned by Gleb), that does not require so complex infrastructure changes. Or some other metric that could be able to demonstrate just a time difference between Zabbix Server and Zabbix Agent.

Comment by Andrejs Tumilovics [ 2020 Jan 23 ]

Fuzzytime algorithm

Data collection
Passive:
Server is sending metric request to the agent. Agent is responding with metric value only, no extra data (like time stamp) is supplied. During response parsing, right before pushing new metric into preprocessing queue, metric is labeled with current server/proxy time stamp in get_values() -> zbx_preprocess_item_value(ts).
Active:
Active item data is marked with current time stamp on agent side and JSON "agent data" is passed to server/proxy. Server/proxy gets the data in process_trap(). Then, if "clock" field is missing for some metric, current time is used (we may use this as a trick).

Preprocessing
Metrics are processed by one of preprocessing workers and pushed into history queue.

Trigger evaluation
Zabbix history syncher (on server) periodically recalculates trigger expressions. There are two type of triggers: "timer triggers" (nodata,date,dayofmonth,dayofweek,time,now) and regular item triggers. Timer trigger expressions are evaluated with timespec set to evaluation time (see recalculate_triggers() function). Regular triggers evaluate their expressions with most recent item timespec (for compex trigger expressions, where multiple metrics involved).

Fuzzytime
evaluate_FUZZYTIME function is getting metric value (agent time stamp) and metric collection time stamp (clock). In passive mode, metric time stamp is equal to metric receive time on server/proxy. In active mode, both metric value and its time stamp are collected on agent side, so, they are equal. Then, server checks that metric value and its timestamp difference is within specified bounds.

recalculate_triggers()
	DCconfig_get_triggers_by_itemids()			// most recent item ts from trigger expression
	evaluate_expressions()
		substitute_functions()
			zbx_populate_function_items()		// func.timespec = trigger.timespec
			zbx_evaluate_item_functions()
				evaluate_function()		// ts = func->timespec
					evaluate_FUZZYTIME()	// diff(metric.value - metric.ts) calculation
			zbx_substitute_functions_results()
		evaluate()					// evaluate trigger result
		raise PROBLEM or resolve

How we can calculate host time difference without using NTP. 

Note that below solutions do not allow to detect network packet latency.
In active mode we may only rely on "clock" time stamp in active history data packet received from agent/proxy. This doesn't require protocol changes. (similar time correction approach was reverted in ZBX-12957)
In passive mode we should rely on server/proxy time stamp (existing system.localtime+fuzzytime() combination).
So, two different approaches should be combined.

Solutions:

1) Hack system.localtime (in active mode) on agent side in such a way that it will not have "clock" field in "agent data" packet. So, server/proxy will put his timestamp on this metric. As a result, system.localtime will always have time stamp of server/proxy.
Pros:

  • easy to implement
  • system.localtime + fuzzytime() combination may be used in both active and passive modes

Cons:

  • this is system.localtime + fuzzytime() specific fix, so fuzzytime will not work with other metrics (this should be fixed in doc.)
  • agent upgrade required

2) Override system.localtime time stamp on server/proxy, when active "agent data" is received.
Pros:

  • system.localtime + fuzzytime() combination may be used in both active and passive modes
  • only server/proxy upgrade required
    Cons:
  • this is system.localtime + fuzzytime() specific fix, so fuzzytime will not work with other metrics (this should be fixed in doc.)

3) Internal check Glebs' proposed solution
We calculate host time diff for each agent and expose this value via internal check zabbix[host,localtime].
Problem is that we can calculate such time diff in active monitoring mode only, by processing "agent data" packets (like it was done in ZBX-12957).
For passive mode we would need to implicitly add system.localtime passive metric for each host or modify protocol.
Otherwise, we may keep using system.localtime + fuzzytime() for passive mode and new internal metric zabbix[host,localtime] (without fuzzytime) in active mode.
Pros:
Cons:

  • different ways of getting same result in active or passive mode (different metrics)

4) Similar to above solution, we may add new metric (system.timediff) which will store time difference between server/proxy and agent. It will not need fuzzytime() trigger function.
In passive mode it will calculate its value when server/proxy is getting response from passive agent (get_value_agent()).
In active mode, server/proxy will calculate value when "active data" packet is received (parse_history_data_row_value()).
Pros:

  • same metric for both active and passive modes
  • no fuzzytime() function required

Cons:

  • special processing for new metric on server/proxy
  • needs server/proxy and agent upgrade

Conclusion

At the moment there is no clean solution for this issue.

 

Generated at Fri May 09 05:37:12 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.