High CPU overhead in zbx_proc_get_processes on host with 54k+ PIDs (Agent C & Agent 2)

XMLWordPrintable

    • Sprint candidates

      I am reporting a persistent high CPU usage issue (100%) on a host with a massive number of processes (54,000+), including many in a <defunct> (zombie) state. This issue has been observed in both Zabbix Agent 7.0.8 (C version) and Zabbix Agent 2, even though the Docker monitoring plugin is NOT being used.

      Environment:

      • Zabbix Version: 7.0.8
      • Host Context: High-density container environment (Docker) creating a large process table, though Zabbix is only configured to monitor specific system processes via Discovery (LLD).
      • Status: Over 54,000 PIDs active in the OS.

      Technical Findings: The agent is configured to discover and monitor only a few specific processes. However, the logs indicate that the internal collector still spends excessive resources scanning the entire /proc directory to update CPU statistics and process counts.

      In both agents, we see the collector getting "stuck" or taking a significant amount of time to complete zbx_proc_get_processes():

      716967:20260128:113824.772 End of zbx_proc_get_processes(): SUCCEED, processes:54717
      716967:20260128:113827.204 End of zbx_proc_get_processes(): SUCCEED, processes:54717
      716967:20260128:113829.545 End of zbx_proc_get_processes(): SUCCEED, processes:54720

      Key Points for Support:

      1. Parity of Issue: The problem is independent of the agent version (happens in both C and Go implementations).
      2. Collection Overhead: Even when monitoring only a few specific processes, the agent's core seems to iterate through the entire process table (54k+ entries), causing the CPU to hit 100% due to the overhead of reading /proc at scale.
      3. Impact of Zombies: There is a high count of <defunct> processes. We suspect the agent might be struggling to handle or skip these entries during the scan, leading to format_metric_results():FAIL errors.
      4. No Docker Plugin: To clarify, we are not using the Docker template/plugin. We are using standard process monitoring (proc.num, proc.get, or LLD for processes).

            Assignee:
            Zabbix Development Team
            Reporter:
            Lucas Frade
            Team B
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: