[ZBX-10822] After zabbix-server restarts (after significant uptime) all Windows VMs using "system.uptime.change(0)}<0" trigger will start alerting Created: 2016 May 19  Updated: 2024 Apr 10  Resolved: 2019 Apr 05

Status: Closed
Project: ZABBIX BUGS AND ISSUES
Component/s: Templates (T)
Affects Version/s: 3.0.2, 3.0.3
Fix Version/s: 4.4 (plan)

Type: Problem report Priority: Major
Reporter: Ilya Kruchinin Assignee: Michael Veksler
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Debian 8


Attachments: PNG File zabbix2.png     PNG File zabbix3.png    
Issue Links:
Duplicate
Team: Team A
Sprint: Sprint 50 (Mar 2019), Sprint 51 (Apr 2019)
Story Points: 1

 Description   

Since Zabbix server 3.0.0 (up to Zabbix server 3.0.2, there is no 3.0.3 yet in the official repos), if you reboot zabbix server, the following issue is observed:

Condition: zabbix-server has been running for at lease a couple of days (e.g. if you reboot it with minimal downtime - issue will not reproduce).

All Windows hosts with the following trigger will start alerting:
{Template OS Windows:system.uptime.change(0)}<0

However, all Linux hosts with the same trigger will NOT be alerting:
{Template OS Linux:system.uptime.change(0)}<0

The evaluation is somehow broken (e.g. zabbix server POSSIBLY evaluates the trigger for Windows template as OWN uptime, although the item itself is properly configured to pull data from remote Windows Agent (item in Windows Template: system.uptime). I was able to reproduce the issue several times (e.g each time I reboot zabbix-server providing it has been running for some time, e.g. NOT after a fresh reboot).

The uptime for hosts in all templates will continue to ONLY increase (seen in the graphs).

More details below:
Latest values for uptime:
2016-05-19 09:42:31
514802
2016-05-19 09:41:31
514742
2016-05-19 09:40:31
514682
2016-05-19 09:40:31
514619
2016-05-19 09:39:31
514560

Trigger history:
OK 2016-05-19 09:41:31 17m 24s 17m 24s No
PROBLEM 2016-05-19 09:40:31 18m 24s 1m No
OK 2016-05-13 10:45:31 5d 23h 13m 5d 22h 55m No



 Comments   
Comment by Ilya Kruchinin [ 2016 May 19 ]

These are the values recorded

Comment by Ilya Kruchinin [ 2016 May 19 ]

The triggers list.

Comment by Ilya Kruchinin [ 2016 May 19 ]

Once again, I was ONLY able to observe this behaviour on the default Template OS Windows (that ships with Zabbix). No issues observed with the default Template OS Linux.

If you look closely, somehow Zabbix-server is receiving TWO "system.uptime" items for the same time from Windows (see the screenshot or below), e.g.:
TIMESTAMP VALUE
2016-05-19 09:41:31 514742
2016-05-19 09:40:31 514682
2016-05-19 09:40:31 514619
2016-05-19 09:39:31 514560

According to zabbix-server logs (local time on Zabbix-server), it has finished restarting between 09:40:17 and 09:40:31
– Logs begin at Thu 2016-05-19 09:40:17 AEST, end at Thu 2016-05-19 10:09:01 AEST. –
May 19 09:40:17 zabbix systemd-journal[509]: Runtime journal is using 8.0M (max allowed 644.6M, trying to leave 966.9M free of 6.2G available → current limit 644.6M).
May 19 09:40:17 zabbix systemd-journal[509]: Runtime journal is using 8.0M (max allowed 644.6M, trying to leave 966.9M free of 6.2G available → current limit 644.6M).
May 19 09:40:17 zabbix kernel: Initializing cgroup subsys cpuset
......
.......
May 19 09:40:20 zabbix /etc/mysql/debian-start[1372]: Triggering myisam-recover for all MyISAM tables
May 19 09:40:20 zabbix exim4[1114]: Starting MTA: exim4.
May 19 09:40:31 zabbix sshd[2049]: Accepted password for toor from 192.168.1.106 port 55502 ssh2
May 19 09:40:31 zabbix sshd[2049]: pam_unix(sshd:session): session opened for user toor by (uid=0)
May 19 09:40:45 zabbix sudo[2715]: toor : TTY=pts/0 ; PWD=/home/toor ; USER=root ; COMMAND=/bin/bash
May 19 09:40:45 zabbix sudo[2715]: pam_unix(sudo:session): session opened for user root by toor(uid=0)

Comment by Ilya Kruchinin [ 2016 May 19 ]

Looks like the issue headline/title has been trimmed after a comma, and I am unable to correct it.

Please change the title to "After zabbix-server restarts (after significant uptime) all VMs using "system.uptime.change(0)}<0" trigger will start alerting and two values are recorded by Zabbix from Zabbix-agent for the same time.

Comment by Glebs Ivanovskis (Inactive) [ 2016 May 19 ]

Triggers are evaluated based on data in the database, there is no distinction between manually created triggers, templated triggers, LLD triggers, triggers on Windows/Linux hosts, etc.

Apparently, root cause is a duplicate entry:

2016-05-19 09:40:31 514682
2016-05-19 09:40:31 514619

Are these passive or active checks? What is the version of agent?

Is there any chance that clock was adjusted on the server at 09:40?

Comment by Ilya Kruchinin [ 2016 May 20 ]

Hi,
The checks are passive.
Zabbix-agent is v2.4.4 from official repos (zabbix.com)

Not sure about time adjustment. In our environment, Zabbix-server is a VM running on Hyper-V. Theoretically, there may be some time skew due to virtualised environment.
This, however, does not necessarily explain why the issue only occurs with Windows VMs (start alerting), and does not occur with Linux VMs.
It is important to note, though, that for Linux the item (uptime) is checked every 10 minutes, while for Windows - every 1 minute.
The idea you mentioned (time correction) might explain why significant uptime is required (for time skew to build up) before a reboot for the issue to reproduce.

As per your question about time update - I will configure a new item to monitor Zabbix time difference from the host it's running on (and consider implementing NTP).
Let me know if I can provide any other information that can be relevant.

Comment by Glebs Ivanovskis (Inactive) [ 2016 May 20 ]

On Linux system.uptime gets information from a simple sysinfo() call whereas on Windows it's from performance counters. Although performance counters are tricky, this time I think problem is on server side, because timestamps we see are times when passive checks were collected by server.

Let's see what you will find with time difference item (very good idea, actually). This issue may then simply become Template OS Windows flapping-proofing.

Comment by Ilya Kruchinin [ 2016 May 23 ]

I tried creating a calculated item to record the time difference using a trigger as described here:
https://www.zabbix.com/forum/showthread.php?t=21600

E.g.
type: calculated
key: system.localtime.fuzzytime
formula: fuzzytime(system.localtime,65)
Update Interval: 60

But it doesn't seem to work for me. So I couldn't find a way to actually record a graph of time difference.

Anyway, Zabbix-server VM was running for 1.5 days, and I can already see a time skew of almost 4 seconds between the Zabbix VM and the host it's running on.
root@zabbix:~# zabbix_get -s 10.0.0.186 -p 10050 -k 'system.localtime'; date +%s; uptime
1463963369
1463963373
10:29:33 up 1 day, 10:51, 1 user, load average: 1.42, 1.88, 2.01

I guess that explains the issue.
And it looks like the proper solution for me would be either of the following:
1) Enable Hyper-V integration serviced Linux kernel modules by following https://oitibs.com/install-hyper-v-lis-on-debian-8/
OR
2) Configure NTP on Zabbix-server to ensure time is not drifting/skewing

I suppose the ticket can be closed.
Not sure if "NTP/Virtual time sync" should be noted as "best practices" in official Zabbix documentation (it may be a good idea) if Zabbix is run in a virtualized environment.

Comment by Glebs Ivanovskis (Inactive) [ 2016 May 23 ]

I shall leave this issue open because indeed for some (unclear for me) reason Template OS Windows and Template OS Linux have different update intervals for system.uptime and there are no additional measures to prevent trigger flapping.

Generated at Tue May 13 08:25:13 EEST 2025 using Jira 9.12.4#9120004-sha1:625303b708afdb767e17cb2838290c41888e9ff0.