Uploaded image for project: 'ZABBIX BUGS AND ISSUES'
  1. ZABBIX BUGS AND ISSUES
  2. ZBX-21703

Zabbix Agent2 is no longer retrieving Windows perfmon counters after a period of time

XMLWordPrintable

    • Icon: Problem report Problem report
    • Resolution: Unresolved
    • Icon: Minor Minor
    • 6.2.3
    • Agent (G)
    • None
    • Windows Server 2019 Standard with Version 10.0.17663.
    • Sprint 103 (Aug 2023), Sprint 104 (Sep 2023), Sprint 105 (Oct 2023), Sprint 106 (Nov 2023), Sprint 107 (Dec 2023), S2401, S24-W6/7, S24-W8/9, S24-W10/11
    • 1

      Description

      We see on a significant number of servers that Zabbix agent2 is no longer retrieving performance counters on windows servers after a period of time. We see this behaviour with different Zabbix agent 2 (v6.0.4, v6.2.1 and v6.2.3). In the Zabbix UI items with performance counters become “not supported”. In Zabbix agent v6.0.4 and v6.2.1 the agent crashes when this happens, in agent v6.2.3 the Zabbix agent keeps running, but those items stays in unsupported state and never resolves.

      On all machines that are having these issues, we’ve concluded that (thus far) the OS Edition is exactly: “Windows Server 2019 Standard” with Version ’10.0.17663’. No other versions are found in our problem scope at this point. In case this changes, we will update this ticket accordingly.

       

      On the impacted assets we can see this error message “The system cannot find message text for message number 0x%1 in the message file for %2.” In the UI:

      On the impacted assets we can see this in de eventlog:


      When we restart the Zabbix agent the issue is resolved, but reoccurs after a period of time. As we have over 1000 assets we need a permanent fix for this behavior.
      Zabbix agent2 availability for one of the impacted assets:

      The issue results also in data gaps for items that are not using perf_counters:

      Steps to reproduce:
      We can't reproduce the issue. Troubleshooting so far:

      Via Zabbix proxy towards an impacted asset – Not working:

      root@proxy004:~# zabbix_get -s 10.10.10.10 -p 20050 -k 'perf_counter_en["\PhysicalDisk(0 C:)\Avg. Disk Write Queue Length",60]' --tls-connect psk --tls-psk-file /tmp/server-test01.psk --tls-psk-identity server-test01-agentZBX_NOTSUPPORTED: The system cannot find message text for message number 0x%1 in the message file for %2.

       Via Zabbix proxy towards an impacted asset: system_run[] to get the same Perf Counter instead of using Zabbix build in function – This works:

      root@proxy004:~# zabbix_get -s 10.10.10.10 -p 20050 -k 'system.run[powershell.exe "Get-Counter -Counter \"\PhysicalDisk(0 C:)\Avg. Disk Read Queue Length\""]' --tls-connect psk --tls-psk-file /tmp/server-test01.psk --tls-psk-identity server-test01-agent
      

      Via powershell fetching counters locally on an impacted asset – this works:

      PS C:\Program Files\Zabbix Agent 2> Get-Counter -Counter "\PhysicalDisk(0 C:)\Avg. Disk Read Queue Length"
       
      Timestamp                 CounterSamples
      ---------                 --------------
      28/09/2022 11:23:06       \\server-test01\physicaldisk(0 c:)\avg. disk read queue length :
                                0
       
      

       

      Via Windows cmd fetching counters locally on impacted asset – this works:

       

      C:\Program Files\Zabbix Agent 2>typeperf "\PhysicalDisk(0 C:)\Avg. Disk Read Queue Length"
       
      "(PDH-CSV 4.0)","\\SERVER-TEST01\PhysicalDisk(0 C:)\Avg. Disk Read Queue Length"
      "09/28/2022 11:33:18.382","0.000000"
      "09/28/2022 11:33:19.385","0.000000"
      "09/28/2022 11:33:20.386","0.000000"
      "09/28/2022 11:33:21.392","0.000000"
      "09/28/2022 11:33:22.395","0.000000"
      "09/28/2022 11:33:23.398","0.000000"
       
      The command completed successfully.
      

       

       

      This issue is seen on all perf_counter and perf_counter_en, we’re only using the out of the box perf_counters of the Zabbix templates:

      • perf_counter_en["\PhysicalDisk(0 C:)\Avg. Disk Write Queue Length",60]
      • perf_counter_en["\PhysicalDisk(0 C:)\Current Disk Queue Length",60]
      • perf_counter_en["\PhysicalDisk(0 C:)\Disk Reads/sec",60]
      • perf_counter_en["\PhysicalDisk(0 C:)\Avg. Disk sec/Read",60]
      • perf_counter_en["\PhysicalDisk(0 C:)\% Disk Time",60]
      • perf_counter_en["\PhysicalDisk(0 C:)\Disk Writes/sec",60]
      • perf_counter_en["\PhysicalDisk(0 C:)\Avg. Disk sec/Write",60]
      • perf_counter_en["\PhysicalDisk(1 D:)\Avg. Disk Read Queue Length",60]
      • perf_counter_en["\PhysicalDisk(1 D:)\Avg. Disk Write Queue Length",60]
      • perf_counter_en["\PhysicalDisk(1 D:)\Current Disk Queue Length",60]
      • perf_counter_en["\PhysicalDisk(1 D:)\Disk Reads/sec",60]
      • perf_counter_en["\PhysicalDisk(1 D:)\Avg. Disk sec/Read",60]
      • perf_counter_en["\PhysicalDisk(1 D:)\% Disk Time",60]
      • perf_counter_en["\PhysicalDisk(1 D:)\Disk Writes/sec",60]
      • perf_counter_en["\PhysicalDisk(1 D:)\Avg. Disk sec/Write",60]
      • perf_counter_en["\Memory\Cache Bytes"]
      • perf_counter_en["\System\Context Switches/sec"]
      • perf_counter_en["\Processor Information(_total)\% DPC Time"]
      • perf_counter_en["\Processor Information(_total)\% Interrupt Time"]
      • perf_counter_en["\Processor Information(_total)\% Privileged Time"]
      • perf_counter_en["\System\Processor Queue Length"]
      • perf_counter_en["\Processor Information(_total)\% User Time"]
      • perf_counter_en["\Memory\Free System Page Table Entries"]
      • perf_counter_en["\Memory\Page Faults/sec"]
      • perf_counter_en["\Memory\Pages/sec"]
      • perf_counter_en["\Memory\Pool Nonpaged Bytes"]
      • perf_counter_en["\System\Threads"]
      • perf_counter_en["\Paging file(_Total)\% Usage"]

      All performance counter queries are failing.

      In attachment:

      • log with debug level 5 of one of the impacted assets

        1. 20231002_Screenshot-ZBX21703.png
          20231002_Screenshot-ZBX21703.png
          11 kB
        2. agent2-generate-WindowsPerfMon-stats.xml
          5 kB
        3. image-2022-09-28-14-54-32-500.png
          image-2022-09-28-14-54-32-500.png
          138 kB
        4. image-2022-09-28-14-55-09-386.png
          image-2022-09-28-14-55-09-386.png
          33 kB
        5. image-2022-09-28-14-55-26-079.png
          image-2022-09-28-14-55-26-079.png
          36 kB
        6. image-2022-09-28-14-55-35-156.png
          image-2022-09-28-14-55-35-156.png
          36 kB
        7. image-2022-09-28-14-55-48-108.png
          image-2022-09-28-14-55-48-108.png
          60 kB
        8. image-2022-09-28-14-56-28-376.png
          image-2022-09-28-14-56-28-376.png
          60 kB
        9. mta069-zabbix_agent2.log
          10.77 MB
        10. new.log
          11 kB
        11. Screenshot 2023-09-13 at 16.04.47.png
          Screenshot 2023-09-13 at 16.04.47.png
          77 kB
        12. web012-ctr.zabbix_agent2.log
          10.21 MB
        13. zabbix_agent2.log
          8.49 MB
        14. zabbix_agent2-6.0.13rc1-windows-amd64-openssl-static.zip
          9.28 MB
        15. zabbix_agent2-x64-v64-dbg1-reopen-query.7z
          11.55 MB
        16. zabbix_agent2-x64-v64-dbg2-reopen-query_timeout-impr.7z
          11.56 MB
        17. zabbix_agent2-x64-v64-dbg3-mutex-split.7z
          11.62 MB
        18. zabbix_agent2-x64-v64-dbg4-global_mutex_remove.7z
          11.96 MB
        19. zabbix_agent2-x64-v64-dbg5-Errorlogs.7z
          11.95 MB
        20. zabbix_agent2-x64-v64-dbg6-removePdhPath.7z
          11.97 MB
        21. zabbix_server.log.5apr2023.gz
          10.00 MB
        22. zabbix-agent2.log
          51 kB
        23. zbx21703_all_hosts_problems.png
          zbx21703_all_hosts_problems.png
          180 kB
        24. ZBX-21703_app140_agent2_status_detail.png
          ZBX-21703_app140_agent2_status_detail.png
          279 kB
        25. ZBX-21703_app140_agent2_status_global.png
          ZBX-21703_app140_agent2_status_global.png
          184 kB
        26. ZBX-21703_app140_problems.png
          ZBX-21703_app140_problems.png
          39 kB
        27. ZBX-21703_app140_unsupported.png
          ZBX-21703_app140_unsupported.png
          56 kB
        28. zbx21703_db021_perflib_eventviewer.png
          zbx21703_db021_perflib_eventviewer.png
          7 kB
        29. zbx21703_db021_problems.png
          zbx21703_db021_problems.png
          39 kB
        30. zbx21703_db021_troubleshooting_items.png
          zbx21703_db021_troubleshooting_items.png
          19 kB
        31. zbx21703_db021_unsupported_items.png
          zbx21703_db021_unsupported_items.png
          42 kB
        32. zbx21703_hosts_error_sept.png
          zbx21703_hosts_error_sept.png
          89 kB
        33. zbx21703_mta069_unsupported.png
          zbx21703_mta069_unsupported.png
          42 kB
        34. ZBX-21703_web131_agent2_status_detail.png
          ZBX-21703_web131_agent2_status_detail.png
          211 kB
        35. ZBX-21703_web131_agent2_status_globa.png
          ZBX-21703_web131_agent2_status_globa.png
          207 kB
        36. ZBX-21703_web131_problems.png
          ZBX-21703_web131_problems.png
          41 kB
        37. ZBX-21703_web367_agent2_status_detail.png
          ZBX-21703_web367_agent2_status_detail.png
          43 kB
        38. ZBX-21703_web367_agent2_status_global.png
          ZBX-21703_web367_agent2_status_global.png
          235 kB
        39. ZBX-21703_web367_problems.png
          ZBX-21703_web367_problems.png
          38 kB
        40. ZBX-21703_web367_unsupported.png
          ZBX-21703_web367_unsupported.png
          43 kB

            ak Andrejs Kozlovs
            stijndd Stijn De Doncker
            Team C
            Votes:
            44 Vote for this issue
            Watchers:
            57 Start watching this issue

              Created:
              Updated: