-
Incident report
-
Resolution: Duplicate
-
Trivial
-
None
-
7.0.8
-
None
Steps to reproduce:
- Add a VMware ESXi host (or vcenter server) to zabbix, and enable VMware Guest to automatically create the VMs as registered hosts. 1x VM exists for testing called `mytestvm.domain.local`
- Verify the host in zabbis is polling, had to wait 1 hour before statistics were collected.
- RDP to the VM in question, launch task manager and observe the system boot-time counter. 147 days+
- Initiate a separate "ping -t" command to the host by IP address. Confirmed stable reply of less than 1ms.
- Perform a VMware vMotion of the Host VM from ESXi host 1 to ESXi host 2. Note RDP session remains connected, and the system uptime remains counting (still several weeks).
- Check Zabbix, a new PROBLEM record is created by the 'VMware Guest / VM has been restarted` trigger. THIS IS A FALSE POSITIVE
Expression reads:
(between(last(/mytestvm.domain.local/vmware.vm.guest.osuptime[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]),1,10m)=1 or between(last(/mytestvm.domain.local/vmware.vm.uptime[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]),1,10m)=1) and last(/mytestvm.domain.local/vmware.vm.powerstate[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]) = 1
Root-Cause:
- During a vMotion operation, the VM's VM-world-ID is recreated on the destination host, which is the direct of this Zabbix template, misinterpreting the vmotion-event as a vm-restart.
- The item `vmware.vm.uptime` metric (which tracks the VM's uptime from the single-hypervisor's perspective) is designed to live only as long as that VM is alive on that host. Thus `this metric is supposed to reset-to-zero` (evidenced by my screenshot below) during the vMotion process, even though the guest OS uptime (as also evidenced) remains counting.
- Zabbix's trigger logic is designed to detect `VM restarts` based on the wrong metric. vmware.vm.uptime during vMotion triggers a false positive.
- The expression in the template trigger reads:
`
(between(last(/VMware Guest/vmware.vm.guest.osuptime[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]),1,10m)=1 or between(last(/VMware Guest/vmware.vm.uptime[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]),1,10m)=1) and last(/VMware Guest/vmware.vm.powerstate[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]) = 1
`
This should be revised to just include:
`
between(last(/VMware Guest/vmware.vm.guest.osuptime[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]),1,10m)=1
and last(/VMware Guest/vmware.vm.powerstate[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]) = 1
`
Evidence:
- duplicates
-
ZBX-25313 Zabbix sends false alarms about VM restarting after it has been migrated to another host
-
- READY TO DEVELOP
-