Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-26054

FALSE POSITIVE: VM has been restarted

XMLWordPrintable

    • Icon: Incident report Incident report
    • Resolution: Duplicate
    • Icon: Trivial Trivial
    • None
    • 7.0.8
    • Templates (T)
    • None

      Steps to reproduce:

      1. Add a VMware ESXi host (or vcenter server) to zabbix, and enable VMware Guest to automatically create the VMs as registered hosts. 1x VM exists for testing called `mytestvm.domain.local`
      2. Verify the host in zabbis is polling, had to wait 1 hour before statistics were collected.
      3. RDP to the VM in question, launch task manager and observe the system boot-time counter.  147 days+
      4. Initiate a separate "ping -t" command to the host by IP address. Confirmed stable reply of less than 1ms.
      5. Perform a VMware vMotion of the Host VM from ESXi host 1 to ESXi host 2. Note RDP session remains connected, and the system uptime remains counting (still several weeks).
      6. Check Zabbix, a new PROBLEM record is created by the 'VMware Guest / VM has been restarted` trigger.  THIS IS A FALSE POSITIVE
        Expression reads:
        (between(last(/mytestvm.domain.local/vmware.vm.guest.osuptime[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]),1,10m)=1 or between(last(/mytestvm.domain.local/vmware.vm.uptime[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]),1,10m)=1) and last(/mytestvm.domain.local/vmware.vm.powerstate[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]) = 1

       

      Root-Cause:

      • During a vMotion operation, the VM's VM-world-ID is recreated on the destination host, which is the direct of this Zabbix template, misinterpreting the vmotion-event as a vm-restart. 
      • The item `vmware.vm.uptime` metric (which tracks the VM's uptime from the single-hypervisor's perspective) is designed to live only as long as that VM is alive on that host. Thus `this metric is supposed to reset-to-zero` (evidenced by my screenshot below) during the vMotion process, even though the guest OS uptime (as also evidenced) remains counting.
      • Zabbix's trigger logic is designed to detect `VM restarts` based on the wrong metric. vmware.vm.uptime during vMotion triggers a false positive.
      • The expression in the template trigger reads:
        `
        (between(last(/VMware Guest/vmware.vm.guest.osuptime[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]),1,10m)=1 or between(last(/VMware Guest/vmware.vm.uptime[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]),1,10m)=1) and last(/VMware Guest/vmware.vm.powerstate[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]) = 1
        `
        This should be revised to just include:
        `
        between(last(/VMware Guest/vmware.vm.guest.osuptime[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]),1,10m)=1
        and last(/VMware Guest/vmware.vm.powerstate[\{$VMWARE.URL},\{$VMWARE.VM.UUID}]) = 1

      Evidence:

            zabbix.dev Zabbix Development Team
            sjackson0109 Simon Jackson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: