Windows agent 2 leaks one process handle on every system.run execution on HostMetadataItem

XMLWordPrintable

    • Support backlog
    • 1
    • 1

      I found out a windows agent 2 bug where a process handle is leaked on every system.run execution on HostMetadataItem (Could possibly affect all system.run executions - not only HostMetadataItem).

      Specifically, we have a fleet of approximately 3000 Windows servers and all share common configuration like below:

      HostMetadataItem=system.run["powershell.exe -NoProfile -NonInteractive -ExecutionPolicy Bypass -Command \"$CF=Join-Path $env:TEMP 'zabbixmeta.cache'; $MaxAge=43200; if(Test-Path $CF){$Age=(Get-Date)-(Get-Item $CF).LastWriteTime; if($Age.TotalSeconds -lt $MaxAge){Get-Content $CF; exit}}; $P=Get-Process -ErrorAction SilentlyContinue | Select-Object -ExpandProperty Name; $S=Get-Service -ErrorAction SilentlyContinue | Where-Object {$_.Status -eq 'Running'} | Select-Object -ExpandProperty Name; $J=Get-CimInstance Win32_Process -Filter 'Name=''java.exe''' -ErrorAction SilentlyContinue; $R='OS=windows'; if($P -match '^(postgres|postmaster)'){$R+=';DB=postgres'}; if($P -match '^(mysqld|mariadbd)'){$R+=';DB=mysql'}; if($P -match '^(mongod|mongos)'){$R+=';DB=mongodb'}; if(($P -match '^sqlservr') -or ($S -match '^MSSQL')){$R+=';DB=mssql'}; if(($S -match '^(W3SVC|WAS)') -or ($P -match '^(w3wp|iisexpress|inetinfo)')){$R+=';APP=iis'}; if($P -match '^nginx'){$R+=';APP=nginx'}; if($P -match '^(httpd|apache2)'){$R+=';APP=apache'}; if($J){if($J | Where-Object {$_.CommandLine -match 'catalina'}){$R+=';APP=tomcat'}; if($J | Where-Object {$_.CommandLine -match '(jboss-modules\.jar|org\.jboss\.as\.standalone|standalone\.bat|domain\.bat)'}){$R+=';APP=jboss'}; if($J | Where-Object {$_.CommandLine -match '(weblogic\.Server|startWebLogic\.cmd)'}){$R+=';APP=weblogic'}; $R+=';APP=java'}; if($P -match '^dockerd'){$R+=';APP=docker'}; $R | Set-Content -NoNewline $CF; $R\""]
      Timeout=15
      RefreshActiveChecks=60

      This runs a powershell command that checks for known services and prints out a string for using on autoregistration. The script itself caches the result and has been measured accross different machines. It takes some hundrens milliseconds to run, possible something close to a second on a busy machine.

      The customer noticed on machines that were idle (e.g. high availability failover machines that pretty much do nothing) very high memory usage (more than 98%) which came back to normal numbers after rebooting the server. Graphs showed a typical memory leak but Windows Process monitor did not show anything with such high memory usage, which made me deep dive to more metrics that will give us what is actually happening under the hood.

      I observed \Process(zabbix_agent2)\Handle Count and \Memory\Pool Nonpaged Bytes rising monotonically and in lockstep (~one handle per system.run call), while \Process(zabbix_agent2)\Private Bytes stayed flat. Physical memory in use climbs gradually and only resets on reboot. In Process Explorer the agent accumulates Process-type handles referencing PIDs of already-exited cmd.exe processes (the handle keeps each terminated child's process object resident in non-paged pool).

      Root cause: In src/go/pkg/zbxcmd/zbxcmd_windows.go, function (*ZBXExec).execute():

      79:  procHandle, err := windows.OpenProcess(
      80:      windows.PROCESS_SET_QUOTA|windows.PROCESS_TERMINATE, false, uint32(cmd.Process.Pid))
      ...
      94:  err = windows.AssignProcessToJobObject(job, procHandle)

      procHandle is never released. The function's only windows.CloseHandle (line 150, in timeoutListener) closes the job handle. So procHandle leaks on every successful execution, and also on the assignment-error path (lines 95–102).

      (Secondary: on early error returns that occur before timeoutListener is started — cmd.Start/OpenProcess failure, lines 74–92 — the job handle created at line 55 is likewise not released.)

      Suggested fix: Close procHandle after the process is assigned to the job. A job object keeps its own reference to a member process until that process exits, so the handle is no longer needed once AssignProcessToJobObjectsucceeds. Adding, immediately after the OpenProcess error check (after line 92):

      defer windows.CloseHandle(procHandle)

      covers both the success path and the assignment-error path. For completeness, the job handle should also be released on the early error-return paths.

        1. zabbix_agent2-win64-v70-dev1.7z
          13.89 MB
          Michael Veksler
        2. image-2026-06-19-10-51-49-809.png
          154 kB
          Christos Diamantis
        3. image-2026-06-19-10-51-49-793.png
          221 kB
          Christos Diamantis
        4. image-2026-06-19-10-51-49-772.png
          137 kB
          Christos Diamantis
        5. image-2026-06-19-10-51-49-748.png
          204 kB
          Christos Diamantis

            Assignee:
            Nikita Gogolevs
            Reporter:
            Bartosz Nems
            Team INT
            Votes:
            3 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - Not Specified
                Not Specified
                Logged:
                Time Spent - 20h
                20h