[ZBX-10822] After zabbix-server restarts (after significant uptime) all Windows VMs using "system.uptime.change(0)}<0" trigger will start alerting Created: 2016 May 19 Updated: 2024 Apr 10 Resolved: 2019 Apr 05 |
|
Status: | Closed |
Project: | ZABBIX BUGS AND ISSUES |
Component/s: | Templates (T) |
Affects Version/s: | 3.0.2, 3.0.3 |
Fix Version/s: | 4.4 (plan) |
Type: | Problem report | Priority: | Major |
Reporter: | Ilya Kruchinin | Assignee: | Michael Veksler |
Resolution: | Fixed | Votes: | 0 |
Labels: | None | ||
Remaining Estimate: | Not Specified | ||
Time Spent: | Not Specified | ||
Original Estimate: | Not Specified | ||
Environment: |
Debian 8 |
Attachments: |
![]() ![]() |
||||
Issue Links: |
|
||||
Team: | |||||
Sprint: | Sprint 50 (Mar 2019), Sprint 51 (Apr 2019) | ||||
Story Points: | 1 |
Description |
Since Zabbix server 3.0.0 (up to Zabbix server 3.0.2, there is no 3.0.3 yet in the official repos), if you reboot zabbix server, the following issue is observed: Condition: zabbix-server has been running for at lease a couple of days (e.g. if you reboot it with minimal downtime - issue will not reproduce). All Windows hosts with the following trigger will start alerting: However, all Linux hosts with the same trigger will NOT be alerting: The evaluation is somehow broken (e.g. zabbix server POSSIBLY evaluates the trigger for Windows template as OWN uptime, although the item itself is properly configured to pull data from remote Windows Agent (item in Windows Template: system.uptime). I was able to reproduce the issue several times (e.g each time I reboot zabbix-server providing it has been running for some time, e.g. NOT after a fresh reboot). The uptime for hosts in all templates will continue to ONLY increase (seen in the graphs). More details below: Trigger history: |
Comments |
Comment by Ilya Kruchinin [ 2016 May 19 ] |
These are the values recorded |
Comment by Ilya Kruchinin [ 2016 May 19 ] |
The triggers list. |
Comment by Ilya Kruchinin [ 2016 May 19 ] |
Once again, I was ONLY able to observe this behaviour on the default Template OS Windows (that ships with Zabbix). No issues observed with the default Template OS Linux. If you look closely, somehow Zabbix-server is receiving TWO "system.uptime" items for the same time from Windows (see the screenshot or below), e.g.: According to zabbix-server logs (local time on Zabbix-server), it has finished restarting between 09:40:17 and 09:40:31 |
Comment by Ilya Kruchinin [ 2016 May 19 ] |
Looks like the issue headline/title has been trimmed after a comma, and I am unable to correct it. Please change the title to "After zabbix-server restarts (after significant uptime) all VMs using "system.uptime.change(0)}<0" trigger will start alerting and two values are recorded by Zabbix from Zabbix-agent for the same time. |
Comment by Glebs Ivanovskis (Inactive) [ 2016 May 19 ] |
Triggers are evaluated based on data in the database, there is no distinction between manually created triggers, templated triggers, LLD triggers, triggers on Windows/Linux hosts, etc. Apparently, root cause is a duplicate entry: 2016-05-19 09:40:31 514682 2016-05-19 09:40:31 514619 Are these passive or active checks? What is the version of agent? Is there any chance that clock was adjusted on the server at 09:40? |
Comment by Ilya Kruchinin [ 2016 May 20 ] |
Hi, Not sure about time adjustment. In our environment, Zabbix-server is a VM running on Hyper-V. Theoretically, there may be some time skew due to virtualised environment. As per your question about time update - I will configure a new item to monitor Zabbix time difference from the host it's running on (and consider implementing NTP). |
Comment by Glebs Ivanovskis (Inactive) [ 2016 May 20 ] |
On Linux system.uptime gets information from a simple sysinfo() call whereas on Windows it's from performance counters. Although performance counters are tricky, this time I think problem is on server side, because timestamps we see are times when passive checks were collected by server. Let's see what you will find with time difference item (very good idea, actually). This issue may then simply become Template OS Windows flapping-proofing. |
Comment by Ilya Kruchinin [ 2016 May 23 ] |
I tried creating a calculated item to record the time difference using a trigger as described here: E.g. But it doesn't seem to work for me. So I couldn't find a way to actually record a graph of time difference. Anyway, Zabbix-server VM was running for 1.5 days, and I can already see a time skew of almost 4 seconds between the Zabbix VM and the host it's running on. I guess that explains the issue. I suppose the ticket can be closed. |
Comment by Glebs Ivanovskis (Inactive) [ 2016 May 23 ] |
I shall leave this issue open because indeed for some (unclear for me) reason Template OS Windows and Template OS Linux have different update intervals for system.uptime and there are no additional measures to prevent trigger flapping. |